Showing preview only (4,366K chars total). Download the full file or copy to clipboard to get everything.
Repository: myshell-ai/MeloTTS
Branch: main
Commit: 209145371cff
Files: 90
Total size: 15.4 MB
Directory structure:
gitextract_toln448z/
├── .github/
│ └── workflows/
│ └── pypi.yml
├── .gitignore
├── Dockerfile
├── LICENSE
├── README.md
├── docs/
│ ├── install.md
│ ├── quick_use.md
│ └── training.md
├── melo/
│ ├── __init__.py
│ ├── api.py
│ ├── app.py
│ ├── attentions.py
│ ├── commons.py
│ ├── configs/
│ │ └── config.json
│ ├── data/
│ │ └── example/
│ │ └── metadata.list
│ ├── data_utils.py
│ ├── download_utils.py
│ ├── infer.py
│ ├── init_downloads.py
│ ├── losses.py
│ ├── main.py
│ ├── mel_processing.py
│ ├── models.py
│ ├── modules.py
│ ├── monotonic_align/
│ │ ├── __init__.py
│ │ └── core.py
│ ├── preprocess_text.py
│ ├── split_utils.py
│ ├── text/
│ │ ├── __init__.py
│ │ ├── chinese.py
│ │ ├── chinese_bert.py
│ │ ├── chinese_mix.py
│ │ ├── cleaner.py
│ │ ├── cleaner_multiling.py
│ │ ├── cmudict.rep
│ │ ├── cmudict_cache.pickle
│ │ ├── english.py
│ │ ├── english_bert.py
│ │ ├── english_utils/
│ │ │ ├── __init__.py
│ │ │ ├── abbreviations.py
│ │ │ ├── number_norm.py
│ │ │ └── time_norm.py
│ │ ├── es_phonemizer/
│ │ │ ├── __init__.py
│ │ │ ├── base.py
│ │ │ ├── cleaner.py
│ │ │ ├── es_symbols.json
│ │ │ ├── es_symbols.txt
│ │ │ ├── es_symbols_v2.json
│ │ │ ├── es_to_ipa.py
│ │ │ ├── example_ipa.txt
│ │ │ ├── gruut_wrapper.py
│ │ │ ├── punctuation.py
│ │ │ ├── spanish_symbols.txt
│ │ │ └── test.ipynb
│ │ ├── fr_phonemizer/
│ │ │ ├── __init__.py
│ │ │ ├── base.py
│ │ │ ├── cleaner.py
│ │ │ ├── en_symbols.json
│ │ │ ├── example_ipa.txt
│ │ │ ├── fr_symbols.json
│ │ │ ├── fr_to_ipa.py
│ │ │ ├── french_abbreviations.py
│ │ │ ├── french_symbols.txt
│ │ │ ├── gruut_wrapper.py
│ │ │ └── punctuation.py
│ │ ├── french.py
│ │ ├── french_bert.py
│ │ ├── japanese.py
│ │ ├── japanese_bert.py
│ │ ├── ko_dictionary.py
│ │ ├── korean.py
│ │ ├── opencpop-strict.txt
│ │ ├── spanish.py
│ │ ├── spanish_bert.py
│ │ ├── symbols.py
│ │ └── tone_sandhi.py
│ ├── train.py
│ ├── train.sh
│ ├── transforms.py
│ └── utils.py
├── requirements.txt
├── setup.py
└── test/
├── basetts_test_resources/
│ ├── en_egs_text.txt
│ ├── es_egs_text.txt
│ ├── fr_egs_text.txt
│ ├── jp_egs_text.txt
│ ├── kr_egs_text.txt
│ └── zh_mix_en_egs_text.txt
├── test_base_model_tts_package.py
└── test_base_model_tts_package_from_S3.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .github/workflows/pypi.yml
================================================
# This workflow will upload a Python Package using Twine when a release is created
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries
# This workflow uses actions that are not certified by GitHub.
# They are provided by a third-party and are governed by
# separate terms of service, privacy policy, and support
# documentation.
name: Upload Python Package
on:
release:
types: [published]
permissions:
contents: read
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: '3.x'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m ensurepip --upgrade
pip install build
- name: Build package
run: python -m build
- name: Publish package
uses: pypa/gh-action-pypi-publish@release/v1.8
with:
user: __token__
password: ${{ secrets.PYPI_API_TOKEN }}
================================================
FILE: .gitignore
================================================
__pycache__/
.ipynb_checkpoints/
basetts_outputs_use_bert/
basetts_outputs/
multilingual_ckpts
basetts_outputs_package/
build/
*.egg-info/
*.zip
*.wav
================================================
FILE: Dockerfile
================================================
FROM python:3.9-slim
WORKDIR /app
COPY . /app
RUN apt-get update && apt-get install -y \
build-essential libsndfile1 \
&& rm -rf /var/lib/apt/lists/*
RUN pip install -e .
RUN python -m unidic download
RUN python melo/init_downloads.py
CMD ["python", "./melo/app.py", "--host", "0.0.0.0", "--port", "8888"]
================================================
FILE: LICENSE
================================================
Copyright (c) 2024 MyShell.ai
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
<div align="center">
<div> </div>
<img src="logo.png" width="300"/> <br>
<a href="https://trendshift.io/repositories/8133" target="_blank"><img src="https://trendshift.io/api/badge/repositories/8133" alt="myshell-ai%2FMeloTTS | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>
</div>
## Introduction
MeloTTS is a **high-quality multi-lingual** text-to-speech library by [MIT](https://www.mit.edu/) and [MyShell.ai](https://myshell.ai). Supported languages include:
| Language | Example |
| --- | --- |
| English (American) | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN-US/speed_1.0/sent_000.wav) |
| English (British) | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN-BR/speed_1.0/sent_000.wav) |
| English (Indian) | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN_INDIA/speed_1.0/sent_000.wav) |
| English (Australian) | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN-AU/speed_1.0/sent_000.wav) |
| English (Default) | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/en/EN-Default/speed_1.0/sent_000.wav) |
| Spanish | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/es/ES/speed_1.0/sent_000.wav) |
| French | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/fr/FR/speed_1.0/sent_000.wav) |
| Chinese (mix EN) | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/zh/ZH/speed_1.0/sent_008.wav) |
| Japanese | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/jp/JP/speed_1.0/sent_000.wav) |
| Korean | [Link](https://myshell-public-repo-host.s3.amazonaws.com/myshellttsbase/examples/kr/KR/speed_1.0/sent_000.wav) |
Some other features include:
- The Chinese speaker supports `mixed Chinese and English`.
- Fast enough for `CPU real-time inference`.
## Usage
- [Use without Installation](docs/quick_use.md)
- [Install and Use Locally](docs/install.md)
- [Training on Custom Dataset](docs/training.md)
The Python API and model cards can be found in [this repo](https://github.com/myshell-ai/MeloTTS/blob/main/docs/install.md#python-api) or on [HuggingFace](https://huggingface.co/myshell-ai).
**Contributing**
If you find this work useful, please consider contributing to this repo.
- Many thanks to [@fakerybakery](https://github.com/fakerybakery) for adding the Web UI and CLI part.
## Authors
- [Wenliang Zhao](https://wl-zhao.github.io) at Tsinghua University
- [Xumin Yu](https://yuxumin.github.io) at Tsinghua University
- [Zengyi Qin](https://www.qinzy.tech) (project lead) at MIT and MyShell
**Citation**
```
@software{zhao2024melo,
author={Zhao, Wenliang and Yu, Xumin and Qin, Zengyi},
title = {MeloTTS: High-quality Multi-lingual Multi-accent Text-to-Speech},
url = {https://github.com/myshell-ai/MeloTTS},
year = {2023}
}
```
## License
This library is under MIT License, which means it is free for both commercial and non-commercial use.
## Acknowledgements
This implementation is based on [TTS](https://github.com/coqui-ai/TTS), [VITS](https://github.com/jaywalnut310/vits), [VITS2](https://github.com/daniilrobnikov/vits2) and [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2). We appreciate their awesome work.
================================================
FILE: docs/install.md
================================================
## Install and Use Locally
### Table of Content
- [Linux and macOS Install](#linux-and-macos-install)
- [Docker Install for Windows and macOS](#docker-install)
- [Usage](#usage)
- [Web UI](#webui)
- [CLI](#cli)
- [Python API](#python-api)
### Linux and macOS Install
The repo is developed and tested on `Ubuntu 20.04` and `Python 3.9`.
```bash
git clone https://github.com/myshell-ai/MeloTTS.git
cd MeloTTS
pip install -e .
python -m unidic download
```
If you encountered issues in macOS install, try the [Docker Install](#docker-install)
### Docker Install
To avoid compatibility issues, for Windows users and some macOS users, we suggest to run via Docker. Ensure that [you have Docker installed](https://docs.docker.com/engine/install/).
**Build Docker**
This could take a few minutes.
```bash
git clone https://github.com/myshell-ai/MeloTTS.git
cd MeloTTS
docker build -t melotts .
```
**Run Docker**
```bash
docker run -it -p 8888:8888 melotts
```
If your local machine has GPU, then you can choose to run:
```bash
docker run --gpus all -it -p 8888:8888 melotts
```
Then open [http://localhost:8888](http://localhost:8888) in your browser to use the app.
## Usage
### WebUI
The WebUI supports muliple languages and voices. First, follow the installation steps. Then, simply run:
```bash
melo-ui
# Or: python melo/app.py
```
### CLI
You may use the MeloTTS CLI to interact with MeloTTS. The CLI may be invoked using either `melotts` or `melo`. Here are some examples:
**Read English text:**
```bash
melo "Text to read" output.wav
```
**Specify a language:**
```bash
melo "Text to read" output.wav --language EN
```
**Specify a speaker:**
```bash
melo "Text to read" output.wav --language EN --speaker EN-US
melo "Text to read" output.wav --language EN --speaker EN-AU
```
The available speakers are: `EN-Default`, `EN-US`, `EN-BR`, `EN_INDIA` `EN-AU`.
**Specify a speed:**
```bash
melo "Text to read" output.wav --language EN --speaker EN-US --speed 1.5
melo "Text to read" output.wav --speed 1.5
```
**Use a different language:**
```bash
melo "text-to-speech 领域近年来发展迅速" zh.wav -l ZH
```
**Load from a file:**
```bash
melo file.txt out.wav --file
```
The full API documentation may be found using:
```bash
melo --help
```
### Python API
#### English with Multiple Accents
```python
from melo.api import TTS
# Speed is adjustable
speed = 1.0
# CPU is sufficient for real-time inference.
# You can set it manually to 'cpu' or 'cuda' or 'cuda:0' or 'mps'
device = 'auto' # Will automatically use GPU if available
# English
text = "Did you ever hear a folk tale about a giant turtle?"
model = TTS(language='EN', device=device)
speaker_ids = model.hps.data.spk2id
# American accent
output_path = 'en-us.wav'
model.tts_to_file(text, speaker_ids['EN-US'], output_path, speed=speed)
# British accent
output_path = 'en-br.wav'
model.tts_to_file(text, speaker_ids['EN-BR'], output_path, speed=speed)
# Indian accent
output_path = 'en-india.wav'
model.tts_to_file(text, speaker_ids['EN_INDIA'], output_path, speed=speed)
# Australian accent
output_path = 'en-au.wav'
model.tts_to_file(text, speaker_ids['EN-AU'], output_path, speed=speed)
# Default accent
output_path = 'en-default.wav'
model.tts_to_file(text, speaker_ids['EN-Default'], output_path, speed=speed)
```
#### Spanish
```python
from melo.api import TTS
# Speed is adjustable
speed = 1.0
# CPU is sufficient for real-time inference.
# You can also change to cuda:0
device = 'cpu'
text = "El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante."
model = TTS(language='ES', device=device)
speaker_ids = model.hps.data.spk2id
output_path = 'es.wav'
model.tts_to_file(text, speaker_ids['ES'], output_path, speed=speed)
```
#### French
```python
from melo.api import TTS
# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0
text = "La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante."
model = TTS(language='FR', device=device)
speaker_ids = model.hps.data.spk2id
output_path = 'fr.wav'
model.tts_to_file(text, speaker_ids['FR'], output_path, speed=speed)
```
#### Chinese
```python
from melo.api import TTS
# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0
text = "我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。"
model = TTS(language='ZH', device=device)
speaker_ids = model.hps.data.spk2id
output_path = 'zh.wav'
model.tts_to_file(text, speaker_ids['ZH'], output_path, speed=speed)
```
#### Japanese
```python
from melo.api import TTS
# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0
text = "彼は毎朝ジョギングをして体を健康に保っています。"
model = TTS(language='JP', device=device)
speaker_ids = model.hps.data.spk2id
output_path = 'jp.wav'
model.tts_to_file(text, speaker_ids['JP'], output_path, speed=speed)
```
#### Korean
```python
from melo.api import TTS
# Speed is adjustable
speed = 1.0
device = 'cpu' # or cuda:0
text = "안녕하세요! 오늘은 날씨가 정말 좋네요."
model = TTS(language='KR', device=device)
speaker_ids = model.hps.data.spk2id
output_path = 'kr.wav'
model.tts_to_file(text, speaker_ids['KR'], output_path, speed=speed)
```
================================================
FILE: docs/quick_use.md
================================================
## Use MeloTTS without Installation
**Quick Demo**
- [Official live demo](https://app.myshell.ai/bot/UN77N3/1709094629) on Myshell.
- Hugging Face Space [live demo](https://huggingface.co/spaces/mrfakename/MeloTTS).
**Use on MyShell**
There are hundreds of TTS models on MyShell, much more than MeloTTS. For example:
English
- [gentle British male voice](https://app.myshell.ai/widget/nIfamm)
- [cheerful young female voice](https://app.myshell.ai/widget/AjIjqy)
- [sultry and robust male voice](https://app.myshell.ai/widget/zQJJN3)
Spanish
- [voz femenina adorable](https://app.myshell.ai/widget/buIZBf)
- [voz masculina joven](https://app.myshell.ai/widget/rayuiy)
- [voz de niña inmadura](https://app.myshell.ai/widget/mYFV3e)
French
- [voix adorable de fille](https://app.myshell.ai/widget/3IfEfy)
- [voix douce masculine](https://app.myshell.ai/widget/IRR3M3)
- [voix douce féminine](https://app.myshell.ai/widget/NRbaUj)
German
- [sanfte Männerstimme](https://app.myshell.ai/widget/JFnAn2)
- [sanfte Frauenstimme](https://app.myshell.ai/widget/MrU7Nb)
- [unreife Mädchenstimme](https://app.myshell.ai/widget/UFbYBj)
Portuguese
- [voz feminina nítida](https://app.myshell.ai/widget/VzMb6j)
- [voz de menino imaturo](https://app.myshell.ai/widget/nAzeei)
- [voz masculina sóbria](https://app.myshell.ai/widget/JZRNJz)
Russian
- [зрелый женский голос](https://app.myshell.ai/widget/6byMZ3)
- [зрелый мужской голос](https://app.myshell.ai/widget/NB7jmm)
Chinese
- [甜美女声](https://app.myshell.ai/widget/ymeUjm)
- [青年男声](https://app.myshell.ai/widget/NZnERb)
More can be found at the widget center of [MyShell.ai](https://app.myshell.ai/robot-workshop).
================================================
FILE: docs/training.md
================================================
## Training
Before training, please install MeloTTS in dev mode and go to the `melo` folder.
```
pip install -e .
cd melo
```
### Data Preparation
To train a TTS model, we need to prepare the audio files and a metadata file. We recommend using 44100Hz audio files and the metadata file should have the following format:
```
path/to/audio_001.wav |<speaker_name>|<language_code>|<text_001>
path/to/audio_002.wav |<speaker_name>|<language_code>|<text_002>
```
The transcribed text can be obtained by ASR model, (e.g., [whisper](https://github.com/openai/whisper)). An example metadata can be found in `data/example/metadata.list`
We can then run the preprocessing code:
```
python preprocess_text.py --metadata data/example/metadata.list
```
A config file `data/example/config.json` will be generated. Feel free to edit some hyper-parameters in that config file (for example, you may decrease the batch size if you have encountered the CUDA out-of-memory issue).
### Training
The training can be launched by:
```
bash train.sh <path/to/config.json> <num_of_gpus>
```
We have found for some machine the training will sometimes crash due to an [issue](https://github.com/pytorch/pytorch/issues/2530) of gloo. Therefore, we add an auto-resume wrapper in the `train.sh`.
### Inference
Simply run:
```
python infer.py --text "<some text here>" -m /path/to/checkpoint/G_<iter>.pth -o <output_dir>
```
================================================
FILE: melo/__init__.py
================================================
================================================
FILE: melo/api.py
================================================
import os
import re
import json
import torch
import librosa
import soundfile
import torchaudio
import numpy as np
import torch.nn as nn
from tqdm import tqdm
import torch
from . import utils
from . import commons
from .models import SynthesizerTrn
from .split_utils import split_sentence
from .mel_processing import spectrogram_torch, spectrogram_torch_conv
from .download_utils import load_or_download_config, load_or_download_model
class TTS(nn.Module):
def __init__(self,
language,
device='auto',
use_hf=True,
config_path=None,
ckpt_path=None):
super().__init__()
if device == 'auto':
device = 'cpu'
if torch.cuda.is_available(): device = 'cuda'
if torch.backends.mps.is_available(): device = 'mps'
if 'cuda' in device:
assert torch.cuda.is_available()
# config_path =
hps = load_or_download_config(language, use_hf=use_hf, config_path=config_path)
num_languages = hps.num_languages
num_tones = hps.num_tones
symbols = hps.symbols
model = SynthesizerTrn(
len(symbols),
hps.data.filter_length // 2 + 1,
hps.train.segment_size // hps.data.hop_length,
n_speakers=hps.data.n_speakers,
num_tones=num_tones,
num_languages=num_languages,
**hps.model,
).to(device)
model.eval()
self.model = model
self.symbol_to_id = {s: i for i, s in enumerate(symbols)}
self.hps = hps
self.device = device
# load state_dict
checkpoint_dict = load_or_download_model(language, device, use_hf=use_hf, ckpt_path=ckpt_path)
self.model.load_state_dict(checkpoint_dict['model'], strict=True)
language = language.split('_')[0]
self.language = 'ZH_MIX_EN' if language == 'ZH' else language # we support a ZH_MIX_EN model
@staticmethod
def audio_numpy_concat(segment_data_list, sr, speed=1.):
audio_segments = []
for segment_data in segment_data_list:
audio_segments += segment_data.reshape(-1).tolist()
audio_segments += [0] * int((sr * 0.05) / speed)
audio_segments = np.array(audio_segments).astype(np.float32)
return audio_segments
@staticmethod
def split_sentences_into_pieces(text, language, quiet=False):
texts = split_sentence(text, language_str=language)
if not quiet:
print(" > Text split to sentences.")
print('\n'.join(texts))
print(" > ===========================")
return texts
def tts_to_file(self, text, speaker_id, output_path=None, sdp_ratio=0.2, noise_scale=0.6, noise_scale_w=0.8, speed=1.0, pbar=None, format=None, position=None, quiet=False,):
language = self.language
texts = self.split_sentences_into_pieces(text, language, quiet)
audio_list = []
if pbar:
tx = pbar(texts)
else:
if position:
tx = tqdm(texts, position=position)
elif quiet:
tx = texts
else:
tx = tqdm(texts)
for t in tx:
if language in ['EN', 'ZH_MIX_EN']:
t = re.sub(r'([a-z])([A-Z])', r'\1 \2', t)
device = self.device
bert, ja_bert, phones, tones, lang_ids = utils.get_text_for_tts_infer(t, language, self.hps, device, self.symbol_to_id)
with torch.no_grad():
x_tst = phones.to(device).unsqueeze(0)
tones = tones.to(device).unsqueeze(0)
lang_ids = lang_ids.to(device).unsqueeze(0)
bert = bert.to(device).unsqueeze(0)
ja_bert = ja_bert.to(device).unsqueeze(0)
x_tst_lengths = torch.LongTensor([phones.size(0)]).to(device)
del phones
speakers = torch.LongTensor([speaker_id]).to(device)
audio = self.model.infer(
x_tst,
x_tst_lengths,
speakers,
tones,
lang_ids,
bert,
ja_bert,
sdp_ratio=sdp_ratio,
noise_scale=noise_scale,
noise_scale_w=noise_scale_w,
length_scale=1. / speed,
)[0][0, 0].data.cpu().float().numpy()
del x_tst, tones, lang_ids, bert, ja_bert, x_tst_lengths, speakers
#
audio_list.append(audio)
torch.cuda.empty_cache()
audio = self.audio_numpy_concat(audio_list, sr=self.hps.data.sampling_rate, speed=speed)
if output_path is None:
return audio
else:
if format:
soundfile.write(output_path, audio, self.hps.data.sampling_rate, format=format)
else:
soundfile.write(output_path, audio, self.hps.data.sampling_rate)
================================================
FILE: melo/app.py
================================================
# WebUI by mrfakename <X @realmrfakename / HF @mrfakename>
# Demo also available on HF Spaces: https://huggingface.co/spaces/mrfakename/MeloTTS
import gradio as gr
import os, torch, io
# os.system('python -m unidic download')
print("Make sure you've downloaded unidic (python -m unidic download) for this WebUI to work.")
from melo.api import TTS
speed = 1.0
import tempfile
import click
device = 'auto'
models = {
'EN': TTS(language='EN', device=device),
'ES': TTS(language='ES', device=device),
'FR': TTS(language='FR', device=device),
'ZH': TTS(language='ZH', device=device),
'JP': TTS(language='JP', device=device),
'KR': TTS(language='KR', device=device),
}
speaker_ids = models['EN'].hps.data.spk2id
default_text_dict = {
'EN': 'The field of text-to-speech has seen rapid development recently.',
'ES': 'El campo de la conversión de texto a voz ha experimentado un rápido desarrollo recientemente.',
'FR': 'Le domaine de la synthèse vocale a connu un développement rapide récemment',
'ZH': 'text-to-speech 领域近年来发展迅速',
'JP': 'テキスト読み上げの分野は最近急速な発展を遂げています',
'KR': '최근 텍스트 음성 변환 분야가 급속도로 발전하고 있습니다.',
}
def synthesize(speaker, text, speed, language, progress=gr.Progress()):
bio = io.BytesIO()
models[language].tts_to_file(text, models[language].hps.data.spk2id[speaker], bio, speed=speed, pbar=progress.tqdm, format='wav')
return bio.getvalue()
def load_speakers(language, text):
if text in list(default_text_dict.values()):
newtext = default_text_dict[language]
else:
newtext = text
return gr.update(value=list(models[language].hps.data.spk2id.keys())[0], choices=list(models[language].hps.data.spk2id.keys())), newtext
with gr.Blocks() as demo:
gr.Markdown('# MeloTTS WebUI\n\nA WebUI for MeloTTS.')
with gr.Group():
speaker = gr.Dropdown(speaker_ids.keys(), interactive=True, value='EN-US', label='Speaker')
language = gr.Radio(['EN', 'ES', 'FR', 'ZH', 'JP', 'KR'], label='Language', value='EN')
speed = gr.Slider(label='Speed', minimum=0.1, maximum=10.0, value=1.0, interactive=True, step=0.1)
text = gr.Textbox(label="Text to speak", value=default_text_dict['EN'])
language.input(load_speakers, inputs=[language, text], outputs=[speaker, text])
btn = gr.Button('Synthesize', variant='primary')
aud = gr.Audio(interactive=False)
btn.click(synthesize, inputs=[speaker, text, speed, language], outputs=[aud])
gr.Markdown('WebUI by [mrfakename](https://twitter.com/realmrfakename).')
@click.command()
@click.option('--share', '-s', is_flag=True, show_default=True, default=False, help="Expose a publicly-accessible shared Gradio link usable by anyone with the link. Only share the link with people you trust.")
@click.option('--host', '-h', default=None)
@click.option('--port', '-p', type=int, default=None)
def main(share, host, port):
demo.queue(api_open=False).launch(show_api=False, share=share, server_name=host, server_port=port)
if __name__ == "__main__":
main()
================================================
FILE: melo/attentions.py
================================================
import math
import torch
from torch import nn
from torch.nn import functional as F
from . import commons
import logging
logger = logging.getLogger(__name__)
class LayerNorm(nn.Module):
def __init__(self, channels, eps=1e-5):
super().__init__()
self.channels = channels
self.eps = eps
self.gamma = nn.Parameter(torch.ones(channels))
self.beta = nn.Parameter(torch.zeros(channels))
def forward(self, x):
x = x.transpose(1, -1)
x = F.layer_norm(x, (self.channels,), self.gamma, self.beta, self.eps)
return x.transpose(1, -1)
@torch.jit.script
def fused_add_tanh_sigmoid_multiply(input_a, input_b, n_channels):
n_channels_int = n_channels[0]
in_act = input_a + input_b
t_act = torch.tanh(in_act[:, :n_channels_int, :])
s_act = torch.sigmoid(in_act[:, n_channels_int:, :])
acts = t_act * s_act
return acts
class Encoder(nn.Module):
def __init__(
self,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size=1,
p_dropout=0.0,
window_size=4,
isflow=True,
**kwargs
):
super().__init__()
self.hidden_channels = hidden_channels
self.filter_channels = filter_channels
self.n_heads = n_heads
self.n_layers = n_layers
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.window_size = window_size
self.cond_layer_idx = self.n_layers
if "gin_channels" in kwargs:
self.gin_channels = kwargs["gin_channels"]
if self.gin_channels != 0:
self.spk_emb_linear = nn.Linear(self.gin_channels, self.hidden_channels)
self.cond_layer_idx = (
kwargs["cond_layer_idx"] if "cond_layer_idx" in kwargs else 2
)
assert (
self.cond_layer_idx < self.n_layers
), "cond_layer_idx should be less than n_layers"
self.drop = nn.Dropout(p_dropout)
self.attn_layers = nn.ModuleList()
self.norm_layers_1 = nn.ModuleList()
self.ffn_layers = nn.ModuleList()
self.norm_layers_2 = nn.ModuleList()
for i in range(self.n_layers):
self.attn_layers.append(
MultiHeadAttention(
hidden_channels,
hidden_channels,
n_heads,
p_dropout=p_dropout,
window_size=window_size,
)
)
self.norm_layers_1.append(LayerNorm(hidden_channels))
self.ffn_layers.append(
FFN(
hidden_channels,
hidden_channels,
filter_channels,
kernel_size,
p_dropout=p_dropout,
)
)
self.norm_layers_2.append(LayerNorm(hidden_channels))
def forward(self, x, x_mask, g=None):
attn_mask = x_mask.unsqueeze(2) * x_mask.unsqueeze(-1)
x = x * x_mask
for i in range(self.n_layers):
if i == self.cond_layer_idx and g is not None:
g = self.spk_emb_linear(g.transpose(1, 2))
g = g.transpose(1, 2)
x = x + g
x = x * x_mask
y = self.attn_layers[i](x, x, attn_mask)
y = self.drop(y)
x = self.norm_layers_1[i](x + y)
y = self.ffn_layers[i](x, x_mask)
y = self.drop(y)
x = self.norm_layers_2[i](x + y)
x = x * x_mask
return x
class Decoder(nn.Module):
def __init__(
self,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size=1,
p_dropout=0.0,
proximal_bias=False,
proximal_init=True,
**kwargs
):
super().__init__()
self.hidden_channels = hidden_channels
self.filter_channels = filter_channels
self.n_heads = n_heads
self.n_layers = n_layers
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.proximal_bias = proximal_bias
self.proximal_init = proximal_init
self.drop = nn.Dropout(p_dropout)
self.self_attn_layers = nn.ModuleList()
self.norm_layers_0 = nn.ModuleList()
self.encdec_attn_layers = nn.ModuleList()
self.norm_layers_1 = nn.ModuleList()
self.ffn_layers = nn.ModuleList()
self.norm_layers_2 = nn.ModuleList()
for i in range(self.n_layers):
self.self_attn_layers.append(
MultiHeadAttention(
hidden_channels,
hidden_channels,
n_heads,
p_dropout=p_dropout,
proximal_bias=proximal_bias,
proximal_init=proximal_init,
)
)
self.norm_layers_0.append(LayerNorm(hidden_channels))
self.encdec_attn_layers.append(
MultiHeadAttention(
hidden_channels, hidden_channels, n_heads, p_dropout=p_dropout
)
)
self.norm_layers_1.append(LayerNorm(hidden_channels))
self.ffn_layers.append(
FFN(
hidden_channels,
hidden_channels,
filter_channels,
kernel_size,
p_dropout=p_dropout,
causal=True,
)
)
self.norm_layers_2.append(LayerNorm(hidden_channels))
def forward(self, x, x_mask, h, h_mask):
"""
x: decoder input
h: encoder output
"""
self_attn_mask = commons.subsequent_mask(x_mask.size(2)).to(
device=x.device, dtype=x.dtype
)
encdec_attn_mask = h_mask.unsqueeze(2) * x_mask.unsqueeze(-1)
x = x * x_mask
for i in range(self.n_layers):
y = self.self_attn_layers[i](x, x, self_attn_mask)
y = self.drop(y)
x = self.norm_layers_0[i](x + y)
y = self.encdec_attn_layers[i](x, h, encdec_attn_mask)
y = self.drop(y)
x = self.norm_layers_1[i](x + y)
y = self.ffn_layers[i](x, x_mask)
y = self.drop(y)
x = self.norm_layers_2[i](x + y)
x = x * x_mask
return x
class MultiHeadAttention(nn.Module):
def __init__(
self,
channels,
out_channels,
n_heads,
p_dropout=0.0,
window_size=None,
heads_share=True,
block_length=None,
proximal_bias=False,
proximal_init=False,
):
super().__init__()
assert channels % n_heads == 0
self.channels = channels
self.out_channels = out_channels
self.n_heads = n_heads
self.p_dropout = p_dropout
self.window_size = window_size
self.heads_share = heads_share
self.block_length = block_length
self.proximal_bias = proximal_bias
self.proximal_init = proximal_init
self.attn = None
self.k_channels = channels // n_heads
self.conv_q = nn.Conv1d(channels, channels, 1)
self.conv_k = nn.Conv1d(channels, channels, 1)
self.conv_v = nn.Conv1d(channels, channels, 1)
self.conv_o = nn.Conv1d(channels, out_channels, 1)
self.drop = nn.Dropout(p_dropout)
if window_size is not None:
n_heads_rel = 1 if heads_share else n_heads
rel_stddev = self.k_channels**-0.5
self.emb_rel_k = nn.Parameter(
torch.randn(n_heads_rel, window_size * 2 + 1, self.k_channels)
* rel_stddev
)
self.emb_rel_v = nn.Parameter(
torch.randn(n_heads_rel, window_size * 2 + 1, self.k_channels)
* rel_stddev
)
nn.init.xavier_uniform_(self.conv_q.weight)
nn.init.xavier_uniform_(self.conv_k.weight)
nn.init.xavier_uniform_(self.conv_v.weight)
if proximal_init:
with torch.no_grad():
self.conv_k.weight.copy_(self.conv_q.weight)
self.conv_k.bias.copy_(self.conv_q.bias)
def forward(self, x, c, attn_mask=None):
q = self.conv_q(x)
k = self.conv_k(c)
v = self.conv_v(c)
x, self.attn = self.attention(q, k, v, mask=attn_mask)
x = self.conv_o(x)
return x
def attention(self, query, key, value, mask=None):
# reshape [b, d, t] -> [b, n_h, t, d_k]
b, d, t_s, t_t = (*key.size(), query.size(2))
query = query.view(b, self.n_heads, self.k_channels, t_t).transpose(2, 3)
key = key.view(b, self.n_heads, self.k_channels, t_s).transpose(2, 3)
value = value.view(b, self.n_heads, self.k_channels, t_s).transpose(2, 3)
scores = torch.matmul(query / math.sqrt(self.k_channels), key.transpose(-2, -1))
if self.window_size is not None:
assert (
t_s == t_t
), "Relative attention is only available for self-attention."
key_relative_embeddings = self._get_relative_embeddings(self.emb_rel_k, t_s)
rel_logits = self._matmul_with_relative_keys(
query / math.sqrt(self.k_channels), key_relative_embeddings
)
scores_local = self._relative_position_to_absolute_position(rel_logits)
scores = scores + scores_local
if self.proximal_bias:
assert t_s == t_t, "Proximal bias is only available for self-attention."
scores = scores + self._attention_bias_proximal(t_s).to(
device=scores.device, dtype=scores.dtype
)
if mask is not None:
scores = scores.masked_fill(mask == 0, -1e4)
if self.block_length is not None:
assert (
t_s == t_t
), "Local attention is only available for self-attention."
block_mask = (
torch.ones_like(scores)
.triu(-self.block_length)
.tril(self.block_length)
)
scores = scores.masked_fill(block_mask == 0, -1e4)
p_attn = F.softmax(scores, dim=-1) # [b, n_h, t_t, t_s]
p_attn = self.drop(p_attn)
output = torch.matmul(p_attn, value)
if self.window_size is not None:
relative_weights = self._absolute_position_to_relative_position(p_attn)
value_relative_embeddings = self._get_relative_embeddings(
self.emb_rel_v, t_s
)
output = output + self._matmul_with_relative_values(
relative_weights, value_relative_embeddings
)
output = (
output.transpose(2, 3).contiguous().view(b, d, t_t)
) # [b, n_h, t_t, d_k] -> [b, d, t_t]
return output, p_attn
def _matmul_with_relative_values(self, x, y):
"""
x: [b, h, l, m]
y: [h or 1, m, d]
ret: [b, h, l, d]
"""
ret = torch.matmul(x, y.unsqueeze(0))
return ret
def _matmul_with_relative_keys(self, x, y):
"""
x: [b, h, l, d]
y: [h or 1, m, d]
ret: [b, h, l, m]
"""
ret = torch.matmul(x, y.unsqueeze(0).transpose(-2, -1))
return ret
def _get_relative_embeddings(self, relative_embeddings, length):
2 * self.window_size + 1
# Pad first before slice to avoid using cond ops.
pad_length = max(length - (self.window_size + 1), 0)
slice_start_position = max((self.window_size + 1) - length, 0)
slice_end_position = slice_start_position + 2 * length - 1
if pad_length > 0:
padded_relative_embeddings = F.pad(
relative_embeddings,
commons.convert_pad_shape([[0, 0], [pad_length, pad_length], [0, 0]]),
)
else:
padded_relative_embeddings = relative_embeddings
used_relative_embeddings = padded_relative_embeddings[
:, slice_start_position:slice_end_position
]
return used_relative_embeddings
def _relative_position_to_absolute_position(self, x):
"""
x: [b, h, l, 2*l-1]
ret: [b, h, l, l]
"""
batch, heads, length, _ = x.size()
# Concat columns of pad to shift from relative to absolute indexing.
x = F.pad(x, commons.convert_pad_shape([[0, 0], [0, 0], [0, 0], [0, 1]]))
# Concat extra elements so to add up to shape (len+1, 2*len-1).
x_flat = x.view([batch, heads, length * 2 * length])
x_flat = F.pad(
x_flat, commons.convert_pad_shape([[0, 0], [0, 0], [0, length - 1]])
)
# Reshape and slice out the padded elements.
x_final = x_flat.view([batch, heads, length + 1, 2 * length - 1])[
:, :, :length, length - 1 :
]
return x_final
def _absolute_position_to_relative_position(self, x):
"""
x: [b, h, l, l]
ret: [b, h, l, 2*l-1]
"""
batch, heads, length, _ = x.size()
# pad along column
x = F.pad(
x, commons.convert_pad_shape([[0, 0], [0, 0], [0, 0], [0, length - 1]])
)
x_flat = x.view([batch, heads, length**2 + length * (length - 1)])
# add 0's in the beginning that will skew the elements after reshape
x_flat = F.pad(x_flat, commons.convert_pad_shape([[0, 0], [0, 0], [length, 0]]))
x_final = x_flat.view([batch, heads, length, 2 * length])[:, :, :, 1:]
return x_final
def _attention_bias_proximal(self, length):
"""Bias for self-attention to encourage attention to close positions.
Args:
length: an integer scalar.
Returns:
a Tensor with shape [1, 1, length, length]
"""
r = torch.arange(length, dtype=torch.float32)
diff = torch.unsqueeze(r, 0) - torch.unsqueeze(r, 1)
return torch.unsqueeze(torch.unsqueeze(-torch.log1p(torch.abs(diff)), 0), 0)
class FFN(nn.Module):
def __init__(
self,
in_channels,
out_channels,
filter_channels,
kernel_size,
p_dropout=0.0,
activation=None,
causal=False,
):
super().__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.filter_channels = filter_channels
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.activation = activation
self.causal = causal
if causal:
self.padding = self._causal_padding
else:
self.padding = self._same_padding
self.conv_1 = nn.Conv1d(in_channels, filter_channels, kernel_size)
self.conv_2 = nn.Conv1d(filter_channels, out_channels, kernel_size)
self.drop = nn.Dropout(p_dropout)
def forward(self, x, x_mask):
x = self.conv_1(self.padding(x * x_mask))
if self.activation == "gelu":
x = x * torch.sigmoid(1.702 * x)
else:
x = torch.relu(x)
x = self.drop(x)
x = self.conv_2(self.padding(x * x_mask))
return x * x_mask
def _causal_padding(self, x):
if self.kernel_size == 1:
return x
pad_l = self.kernel_size - 1
pad_r = 0
padding = [[0, 0], [0, 0], [pad_l, pad_r]]
x = F.pad(x, commons.convert_pad_shape(padding))
return x
def _same_padding(self, x):
if self.kernel_size == 1:
return x
pad_l = (self.kernel_size - 1) // 2
pad_r = self.kernel_size // 2
padding = [[0, 0], [0, 0], [pad_l, pad_r]]
x = F.pad(x, commons.convert_pad_shape(padding))
return x
================================================
FILE: melo/commons.py
================================================
import math
import torch
from torch.nn import functional as F
def init_weights(m, mean=0.0, std=0.01):
classname = m.__class__.__name__
if classname.find("Conv") != -1:
m.weight.data.normal_(mean, std)
def get_padding(kernel_size, dilation=1):
return int((kernel_size * dilation - dilation) / 2)
def convert_pad_shape(pad_shape):
layer = pad_shape[::-1]
pad_shape = [item for sublist in layer for item in sublist]
return pad_shape
def intersperse(lst, item):
result = [item] * (len(lst) * 2 + 1)
result[1::2] = lst
return result
def kl_divergence(m_p, logs_p, m_q, logs_q):
"""KL(P||Q)"""
kl = (logs_q - logs_p) - 0.5
kl += (
0.5 * (torch.exp(2.0 * logs_p) + ((m_p - m_q) ** 2)) * torch.exp(-2.0 * logs_q)
)
return kl
def rand_gumbel(shape):
"""Sample from the Gumbel distribution, protect from overflows."""
uniform_samples = torch.rand(shape) * 0.99998 + 0.00001
return -torch.log(-torch.log(uniform_samples))
def rand_gumbel_like(x):
g = rand_gumbel(x.size()).to(dtype=x.dtype, device=x.device)
return g
def slice_segments(x, ids_str, segment_size=4):
ret = torch.zeros_like(x[:, :, :segment_size])
for i in range(x.size(0)):
idx_str = ids_str[i]
idx_end = idx_str + segment_size
ret[i] = x[i, :, idx_str:idx_end]
return ret
def rand_slice_segments(x, x_lengths=None, segment_size=4):
b, d, t = x.size()
if x_lengths is None:
x_lengths = t
ids_str_max = x_lengths - segment_size + 1
ids_str = (torch.rand([b]).to(device=x.device) * ids_str_max).to(dtype=torch.long)
ret = slice_segments(x, ids_str, segment_size)
return ret, ids_str
def get_timing_signal_1d(length, channels, min_timescale=1.0, max_timescale=1.0e4):
position = torch.arange(length, dtype=torch.float)
num_timescales = channels // 2
log_timescale_increment = math.log(float(max_timescale) / float(min_timescale)) / (
num_timescales - 1
)
inv_timescales = min_timescale * torch.exp(
torch.arange(num_timescales, dtype=torch.float) * -log_timescale_increment
)
scaled_time = position.unsqueeze(0) * inv_timescales.unsqueeze(1)
signal = torch.cat([torch.sin(scaled_time), torch.cos(scaled_time)], 0)
signal = F.pad(signal, [0, 0, 0, channels % 2])
signal = signal.view(1, channels, length)
return signal
def add_timing_signal_1d(x, min_timescale=1.0, max_timescale=1.0e4):
b, channels, length = x.size()
signal = get_timing_signal_1d(length, channels, min_timescale, max_timescale)
return x + signal.to(dtype=x.dtype, device=x.device)
def cat_timing_signal_1d(x, min_timescale=1.0, max_timescale=1.0e4, axis=1):
b, channels, length = x.size()
signal = get_timing_signal_1d(length, channels, min_timescale, max_timescale)
return torch.cat([x, signal.to(dtype=x.dtype, device=x.device)], axis)
def subsequent_mask(length):
mask = torch.tril(torch.ones(length, length)).unsqueeze(0).unsqueeze(0)
return mask
@torch.jit.script
def fused_add_tanh_sigmoid_multiply(input_a, input_b, n_channels):
n_channels_int = n_channels[0]
in_act = input_a + input_b
t_act = torch.tanh(in_act[:, :n_channels_int, :])
s_act = torch.sigmoid(in_act[:, n_channels_int:, :])
acts = t_act * s_act
return acts
def convert_pad_shape(pad_shape):
layer = pad_shape[::-1]
pad_shape = [item for sublist in layer for item in sublist]
return pad_shape
def shift_1d(x):
x = F.pad(x, convert_pad_shape([[0, 0], [0, 0], [1, 0]]))[:, :, :-1]
return x
def sequence_mask(length, max_length=None):
if max_length is None:
max_length = length.max()
x = torch.arange(max_length, dtype=length.dtype, device=length.device)
return x.unsqueeze(0) < length.unsqueeze(1)
def generate_path(duration, mask):
"""
duration: [b, 1, t_x]
mask: [b, 1, t_y, t_x]
"""
b, _, t_y, t_x = mask.shape
cum_duration = torch.cumsum(duration, -1)
cum_duration_flat = cum_duration.view(b * t_x)
path = sequence_mask(cum_duration_flat, t_y).to(mask.dtype)
path = path.view(b, t_x, t_y)
path = path - F.pad(path, convert_pad_shape([[0, 0], [1, 0], [0, 0]]))[:, :-1]
path = path.unsqueeze(1).transpose(2, 3) * mask
return path
def clip_grad_value_(parameters, clip_value, norm_type=2):
if isinstance(parameters, torch.Tensor):
parameters = [parameters]
parameters = list(filter(lambda p: p.grad is not None, parameters))
norm_type = float(norm_type)
if clip_value is not None:
clip_value = float(clip_value)
total_norm = 0
for p in parameters:
param_norm = p.grad.data.norm(norm_type)
total_norm += param_norm.item() ** norm_type
if clip_value is not None:
p.grad.data.clamp_(min=-clip_value, max=clip_value)
total_norm = total_norm ** (1.0 / norm_type)
return total_norm
================================================
FILE: melo/configs/config.json
================================================
{
"train": {
"log_interval": 200,
"eval_interval": 1000,
"seed": 52,
"epochs": 10000,
"learning_rate": 0.0003,
"betas": [
0.8,
0.99
],
"eps": 1e-09,
"batch_size": 6,
"fp16_run": false,
"lr_decay": 0.999875,
"segment_size": 16384,
"init_lr_ratio": 1,
"warmup_epochs": 0,
"c_mel": 45,
"c_kl": 1.0,
"skip_optimizer": true
},
"data": {
"training_files": "",
"validation_files": "",
"max_wav_value": 32768.0,
"sampling_rate": 44100,
"filter_length": 2048,
"hop_length": 512,
"win_length": 2048,
"n_mel_channels": 128,
"mel_fmin": 0.0,
"mel_fmax": null,
"add_blank": true,
"n_speakers": 256,
"cleaned_text": true,
"spk2id": {}
},
"model": {
"use_spk_conditioned_encoder": true,
"use_noise_scaled_mas": true,
"use_mel_posterior_encoder": false,
"use_duration_discriminator": true,
"inter_channels": 192,
"hidden_channels": 192,
"filter_channels": 768,
"n_heads": 2,
"n_layers": 6,
"n_layers_trans_flow": 3,
"kernel_size": 3,
"p_dropout": 0.1,
"resblock": "1",
"resblock_kernel_sizes": [
3,
7,
11
],
"resblock_dilation_sizes": [
[
1,
3,
5
],
[
1,
3,
5
],
[
1,
3,
5
]
],
"upsample_rates": [
8,
8,
2,
2,
2
],
"upsample_initial_channel": 512,
"upsample_kernel_sizes": [
16,
16,
8,
2,
2
],
"n_layers_q": 3,
"use_spectral_norm": false,
"gin_channels": 256
}
}
================================================
FILE: melo/data/example/metadata.list
================================================
data/example/wavs/000.wav|EN-default|EN|Well, there are always new trends and styles emerging in the fashion world, but I think some of the biggest trends at the moment include sustainability and ethical fashion, streetwear and athleisure, and oversized and deconstructed silhouettes.
data/example/wavs/001.wav|EN-default|EN|Many designers and brands are focusing on creating more environmentally-friendly and socially responsible clothing, while others are incorporating elements of sportswear and casual wear into their collections.
data/example/wavs/002.wav|EN-default|EN|And there's a growing interest in looser, more relaxed shapes and unconventional materials and finishes.
data/example/wavs/003.wav|EN-default|EN|That's really insightful.
data/example/wavs/004.wav|EN-default|EN|What do you think are some of the benefits of following fashion trends?
data/example/wavs/005.wav|EN-default|EN|Well, I think one of the main benefits of following fashion trends is that it can be a way to express your creativity, personality, and individuality.
data/example/wavs/006.wav|EN-default|EN|Fashion can be a powerful tool for self-expression and can help you feel more confident and comfortable in your own skin.
data/example/wavs/007.wav|EN-default|EN|Additionally, staying up-to-date with fashion trends can help you develop your own sense of style and learn how to put together outfits that make you look and feel great.
data/example/wavs/008.wav|EN-default|EN|That's a great point.
data/example/wavs/009.wav|EN-default|EN|Do you think it's important to stay on top of the latest fashion trends, or is it more important to focus on timeless style?
data/example/wavs/010.wav|EN-default|EN|I think it's really up to each individual to decide what approach to fashion works best for them.
data/example/wavs/011.wav|EN-default|EN|Some people prefer to stick with classic, timeless styles that never go out of fashion, while others enjoy experimenting with new and innovative trends.
data/example/wavs/012.wav|EN-default|EN|Ultimately, fashion is about personal expression and there's no right or wrong way to approach it.
data/example/wavs/013.wav|EN-default|EN|The most important thing is to wear what makes you feel good and confident.
data/example/wavs/014.wav|EN-default|EN|I completely agree.
data/example/wavs/015.wav|EN-default|EN|Some popular ones that come to mind are oversized blazers, statement sleeves, printed maxi dresses, and chunky sneakers.
data/example/wavs/016.wav|EN-default|EN|It's been really interesting chatting with you about fashion.
data/example/wavs/017.wav|EN-default|EN|That's a good point.
data/example/wavs/018.wav|EN-default|EN|What do you think are some current fashion trends that are popular right now?
data/example/wavs/019.wav|EN-default|EN|There are so many trends happening right now, it's hard to keep track of them all!
================================================
FILE: melo/data_utils.py
================================================
import os
import random
import torch
import torch.utils.data
from tqdm import tqdm
from loguru import logger
import commons
from mel_processing import spectrogram_torch, mel_spectrogram_torch
from utils import load_filepaths_and_text
from utils import load_wav_to_torch_librosa as load_wav_to_torch
from text import cleaned_text_to_sequence, get_bert
import numpy as np
"""Multi speaker version"""
class TextAudioSpeakerLoader(torch.utils.data.Dataset):
"""
1) loads audio, speaker_id, text pairs
2) normalizes text and converts them to sequences of integers
3) computes spectrograms from audio files.
"""
def __init__(self, audiopaths_sid_text, hparams):
self.audiopaths_sid_text = load_filepaths_and_text(audiopaths_sid_text)
self.max_wav_value = hparams.max_wav_value
self.sampling_rate = hparams.sampling_rate
self.filter_length = hparams.filter_length
self.hop_length = hparams.hop_length
self.win_length = hparams.win_length
self.sampling_rate = hparams.sampling_rate
self.spk_map = hparams.spk2id
self.hparams = hparams
self.disable_bert = getattr(hparams, "disable_bert", False)
self.use_mel_spec_posterior = getattr(
hparams, "use_mel_posterior_encoder", False
)
if self.use_mel_spec_posterior:
self.n_mel_channels = getattr(hparams, "n_mel_channels", 80)
self.cleaned_text = getattr(hparams, "cleaned_text", False)
self.add_blank = hparams.add_blank
self.min_text_len = getattr(hparams, "min_text_len", 1)
self.max_text_len = getattr(hparams, "max_text_len", 300)
random.seed(1234)
random.shuffle(self.audiopaths_sid_text)
self._filter()
def _filter(self):
"""
Filter text & store spec lengths
"""
# Store spectrogram lengths for Bucketing
# wav_length ~= file_size / (wav_channels * Bytes per dim) = file_size / (1 * 2)
# spec_length = wav_length // hop_length
audiopaths_sid_text_new = []
lengths = []
skipped = 0
logger.info("Init dataset...")
for item in tqdm(
self.audiopaths_sid_text
):
try:
_id, spk, language, text, phones, tone, word2ph = item
except:
print(item)
raise
audiopath = f"{_id}"
if self.min_text_len <= len(phones) and len(phones) <= self.max_text_len:
phones = phones.split(" ")
tone = [int(i) for i in tone.split(" ")]
word2ph = [int(i) for i in word2ph.split(" ")]
audiopaths_sid_text_new.append(
[audiopath, spk, language, text, phones, tone, word2ph]
)
lengths.append(os.path.getsize(audiopath) // (2 * self.hop_length))
else:
skipped += 1
logger.info(f'min: {min(lengths)}; max: {max(lengths)}' )
logger.info(
"skipped: "
+ str(skipped)
+ ", total: "
+ str(len(self.audiopaths_sid_text))
)
self.audiopaths_sid_text = audiopaths_sid_text_new
self.lengths = lengths
def get_audio_text_speaker_pair(self, audiopath_sid_text):
# separate filename, speaker_id and text
audiopath, sid, language, text, phones, tone, word2ph = audiopath_sid_text
bert, ja_bert, phones, tone, language = self.get_text(
text, word2ph, phones, tone, language, audiopath
)
spec, wav = self.get_audio(audiopath)
sid = int(getattr(self.spk_map, sid, '0'))
sid = torch.LongTensor([sid])
return (phones, spec, wav, sid, tone, language, bert, ja_bert)
def get_audio(self, filename):
audio_norm, sampling_rate = load_wav_to_torch(filename, self.sampling_rate)
if sampling_rate != self.sampling_rate:
raise ValueError(
"{} {} SR doesn't match target {} SR".format(
filename, sampling_rate, self.sampling_rate
)
)
# NOTE: normalize has been achieved by torchaudio
# audio_norm = audio / self.max_wav_value
audio_norm = audio_norm.unsqueeze(0)
spec_filename = filename.replace(".wav", ".spec.pt")
if self.use_mel_spec_posterior:
spec_filename = spec_filename.replace(".spec.pt", ".mel.pt")
try:
spec = torch.load(spec_filename)
assert False
except:
if self.use_mel_spec_posterior:
spec = mel_spectrogram_torch(
audio_norm,
self.filter_length,
self.n_mel_channels,
self.sampling_rate,
self.hop_length,
self.win_length,
self.hparams.mel_fmin,
self.hparams.mel_fmax,
center=False,
)
else:
spec = spectrogram_torch(
audio_norm,
self.filter_length,
self.sampling_rate,
self.hop_length,
self.win_length,
center=False,
)
spec = torch.squeeze(spec, 0)
torch.save(spec, spec_filename)
return spec, audio_norm
def get_text(self, text, word2ph, phone, tone, language_str, wav_path):
phone, tone, language = cleaned_text_to_sequence(phone, tone, language_str)
if self.add_blank:
phone = commons.intersperse(phone, 0)
tone = commons.intersperse(tone, 0)
language = commons.intersperse(language, 0)
for i in range(len(word2ph)):
word2ph[i] = word2ph[i] * 2
word2ph[0] += 1
bert_path = wav_path.replace(".wav", ".bert.pt")
try:
bert = torch.load(bert_path)
assert bert.shape[-1] == len(phone)
except Exception as e:
print(e, wav_path, bert_path, bert.shape, len(phone))
bert = get_bert(text, word2ph, language_str)
torch.save(bert, bert_path)
assert bert.shape[-1] == len(phone), phone
if self.disable_bert:
bert = torch.zeros(1024, len(phone))
ja_bert = torch.zeros(768, len(phone))
else:
if language_str in ["ZH"]:
bert = bert
ja_bert = torch.zeros(768, len(phone))
elif language_str in ["JP", "EN", "ZH_MIX_EN", "KR", 'SP', 'ES', 'FR', 'DE', 'RU']:
ja_bert = bert
bert = torch.zeros(1024, len(phone))
else:
raise
bert = torch.zeros(1024, len(phone))
ja_bert = torch.zeros(768, len(phone))
assert bert.shape[-1] == len(phone)
phone = torch.LongTensor(phone)
tone = torch.LongTensor(tone)
language = torch.LongTensor(language)
return bert, ja_bert, phone, tone, language
def get_sid(self, sid):
sid = torch.LongTensor([int(sid)])
return sid
def __getitem__(self, index):
return self.get_audio_text_speaker_pair(self.audiopaths_sid_text[index])
def __len__(self):
return len(self.audiopaths_sid_text)
class TextAudioSpeakerCollate:
"""Zero-pads model inputs and targets"""
def __init__(self, return_ids=False):
self.return_ids = return_ids
def __call__(self, batch):
"""Collate's training batch from normalized text, audio and speaker identities
PARAMS
------
batch: [text_normalized, spec_normalized, wav_normalized, sid]
"""
# Right zero-pad all one-hot text sequences to max input length
_, ids_sorted_decreasing = torch.sort(
torch.LongTensor([x[1].size(1) for x in batch]), dim=0, descending=True
)
max_text_len = max([len(x[0]) for x in batch])
max_spec_len = max([x[1].size(1) for x in batch])
max_wav_len = max([x[2].size(1) for x in batch])
text_lengths = torch.LongTensor(len(batch))
spec_lengths = torch.LongTensor(len(batch))
wav_lengths = torch.LongTensor(len(batch))
sid = torch.LongTensor(len(batch))
text_padded = torch.LongTensor(len(batch), max_text_len)
tone_padded = torch.LongTensor(len(batch), max_text_len)
language_padded = torch.LongTensor(len(batch), max_text_len)
bert_padded = torch.FloatTensor(len(batch), 1024, max_text_len)
ja_bert_padded = torch.FloatTensor(len(batch), 768, max_text_len)
spec_padded = torch.FloatTensor(len(batch), batch[0][1].size(0), max_spec_len)
wav_padded = torch.FloatTensor(len(batch), 1, max_wav_len)
text_padded.zero_()
tone_padded.zero_()
language_padded.zero_()
spec_padded.zero_()
wav_padded.zero_()
bert_padded.zero_()
ja_bert_padded.zero_()
for i in range(len(ids_sorted_decreasing)):
row = batch[ids_sorted_decreasing[i]]
text = row[0]
text_padded[i, : text.size(0)] = text
text_lengths[i] = text.size(0)
spec = row[1]
spec_padded[i, :, : spec.size(1)] = spec
spec_lengths[i] = spec.size(1)
wav = row[2]
wav_padded[i, :, : wav.size(1)] = wav
wav_lengths[i] = wav.size(1)
sid[i] = row[3]
tone = row[4]
tone_padded[i, : tone.size(0)] = tone
language = row[5]
language_padded[i, : language.size(0)] = language
bert = row[6]
bert_padded[i, :, : bert.size(1)] = bert
ja_bert = row[7]
ja_bert_padded[i, :, : ja_bert.size(1)] = ja_bert
return (
text_padded,
text_lengths,
spec_padded,
spec_lengths,
wav_padded,
wav_lengths,
sid,
tone_padded,
language_padded,
bert_padded,
ja_bert_padded,
)
class DistributedBucketSampler(torch.utils.data.distributed.DistributedSampler):
"""
Maintain similar input lengths in a batch.
Length groups are specified by boundaries.
Ex) boundaries = [b1, b2, b3] -> any batch is included either {x | b1 < length(x) <=b2} or {x | b2 < length(x) <= b3}.
It removes samples which are not included in the boundaries.
Ex) boundaries = [b1, b2, b3] -> any x s.t. length(x) <= b1 or length(x) > b3 are discarded.
"""
def __init__(
self,
dataset,
batch_size,
boundaries,
num_replicas=None,
rank=None,
shuffle=True,
):
super().__init__(dataset, num_replicas=num_replicas, rank=rank, shuffle=shuffle)
self.lengths = dataset.lengths
self.batch_size = batch_size
self.boundaries = boundaries
self.buckets, self.num_samples_per_bucket = self._create_buckets()
self.total_size = sum(self.num_samples_per_bucket)
self.num_samples = self.total_size // self.num_replicas
print('buckets:', self.num_samples_per_bucket)
def _create_buckets(self):
buckets = [[] for _ in range(len(self.boundaries) - 1)]
for i in range(len(self.lengths)):
length = self.lengths[i]
idx_bucket = self._bisect(length)
if idx_bucket != -1:
buckets[idx_bucket].append(i)
try:
for i in range(len(buckets) - 1, 0, -1):
if len(buckets[i]) == 0:
buckets.pop(i)
self.boundaries.pop(i + 1)
assert all(len(bucket) > 0 for bucket in buckets)
# When one bucket is not traversed
except Exception as e:
print("Bucket warning ", e)
for i in range(len(buckets) - 1, -1, -1):
if len(buckets[i]) == 0:
buckets.pop(i)
self.boundaries.pop(i + 1)
num_samples_per_bucket = []
for i in range(len(buckets)):
len_bucket = len(buckets[i])
total_batch_size = self.num_replicas * self.batch_size
rem = (
total_batch_size - (len_bucket % total_batch_size)
) % total_batch_size
num_samples_per_bucket.append(len_bucket + rem)
return buckets, num_samples_per_bucket
def __iter__(self):
# deterministically shuffle based on epoch
g = torch.Generator()
g.manual_seed(self.epoch)
indices = []
if self.shuffle:
for bucket in self.buckets:
indices.append(torch.randperm(len(bucket), generator=g).tolist())
else:
for bucket in self.buckets:
indices.append(list(range(len(bucket))))
batches = []
for i in range(len(self.buckets)):
bucket = self.buckets[i]
len_bucket = len(bucket)
if len_bucket == 0:
continue
ids_bucket = indices[i]
num_samples_bucket = self.num_samples_per_bucket[i]
# add extra samples to make it evenly divisible
rem = num_samples_bucket - len_bucket
ids_bucket = (
ids_bucket
+ ids_bucket * (rem // len_bucket)
+ ids_bucket[: (rem % len_bucket)]
)
# subsample
ids_bucket = ids_bucket[self.rank :: self.num_replicas]
# batching
for j in range(len(ids_bucket) // self.batch_size):
batch = [
bucket[idx]
for idx in ids_bucket[
j * self.batch_size : (j + 1) * self.batch_size
]
]
batches.append(batch)
if self.shuffle:
batch_ids = torch.randperm(len(batches), generator=g).tolist()
batches = [batches[i] for i in batch_ids]
self.batches = batches
assert len(self.batches) * self.batch_size == self.num_samples
return iter(self.batches)
def _bisect(self, x, lo=0, hi=None):
if hi is None:
hi = len(self.boundaries) - 1
if hi > lo:
mid = (hi + lo) // 2
if self.boundaries[mid] < x and x <= self.boundaries[mid + 1]:
return mid
elif x <= self.boundaries[mid]:
return self._bisect(x, lo, mid)
else:
return self._bisect(x, mid + 1, hi)
else:
return -1
def __len__(self):
return self.num_samples // self.batch_size
================================================
FILE: melo/download_utils.py
================================================
import torch
import os
from . import utils
from cached_path import cached_path
from huggingface_hub import hf_hub_download
DOWNLOAD_CKPT_URLS = {
'EN': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/EN/checkpoint.pth',
'EN_V2': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/EN_V2/checkpoint.pth',
'FR': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/FR/checkpoint.pth',
'JP': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/JP/checkpoint.pth',
'ES': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/ES/checkpoint.pth',
'ZH': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/ZH/checkpoint.pth',
'KR': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/KR/checkpoint.pth',
}
DOWNLOAD_CONFIG_URLS = {
'EN': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/EN/config.json',
'EN_V2': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/EN_V2/config.json',
'FR': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/FR/config.json',
'JP': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/JP/config.json',
'ES': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/ES/config.json',
'ZH': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/ZH/config.json',
'KR': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/KR/config.json',
}
PRETRAINED_MODELS = {
'G.pth': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/pretrained/G.pth',
'D.pth': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/pretrained/D.pth',
'DUR.pth': 'https://myshell-public-repo-host.s3.amazonaws.com/openvoice/basespeakers/pretrained/DUR.pth',
}
LANG_TO_HF_REPO_ID = {
'EN': 'myshell-ai/MeloTTS-English',
'EN_V2': 'myshell-ai/MeloTTS-English-v2',
'EN_NEWEST': 'myshell-ai/MeloTTS-English-v3',
'FR': 'myshell-ai/MeloTTS-French',
'JP': 'myshell-ai/MeloTTS-Japanese',
'ES': 'myshell-ai/MeloTTS-Spanish',
'ZH': 'myshell-ai/MeloTTS-Chinese',
'KR': 'myshell-ai/MeloTTS-Korean',
}
def load_or_download_config(locale, use_hf=True, config_path=None):
if config_path is None:
language = locale.split('-')[0].upper()
if use_hf:
assert language in LANG_TO_HF_REPO_ID
config_path = hf_hub_download(repo_id=LANG_TO_HF_REPO_ID[language], filename="config.json")
else:
assert language in DOWNLOAD_CONFIG_URLS
config_path = cached_path(DOWNLOAD_CONFIG_URLS[language])
return utils.get_hparams_from_file(config_path)
def load_or_download_model(locale, device, use_hf=True, ckpt_path=None):
if ckpt_path is None:
language = locale.split('-')[0].upper()
if use_hf:
assert language in LANG_TO_HF_REPO_ID
ckpt_path = hf_hub_download(repo_id=LANG_TO_HF_REPO_ID[language], filename="checkpoint.pth")
else:
assert language in DOWNLOAD_CKPT_URLS
ckpt_path = cached_path(DOWNLOAD_CKPT_URLS[language])
return torch.load(ckpt_path, map_location=device)
def load_pretrain_model():
return [cached_path(url) for url in PRETRAINED_MODELS.values()]
================================================
FILE: melo/infer.py
================================================
import os
import click
from melo.api import TTS
@click.command()
@click.option('--ckpt_path', '-m', type=str, default=None, help="Path to the checkpoint file")
@click.option('--text', '-t', type=str, default=None, help="Text to speak")
@click.option('--language', '-l', type=str, default="EN", help="Language of the model")
@click.option('--output_dir', '-o', type=str, default="outputs", help="Path to the output")
def main(ckpt_path, text, language, output_dir):
if ckpt_path is None:
raise ValueError("The model_path must be specified")
config_path = os.path.join(os.path.dirname(ckpt_path), 'config.json')
model = TTS(language=language, config_path=config_path, ckpt_path=ckpt_path)
for spk_name, spk_id in model.hps.data.spk2id.items():
save_path = f'{output_dir}/{spk_name}/output.wav'
os.makedirs(os.path.dirname(save_path), exist_ok=True)
model.tts_to_file(text, spk_id, save_path)
if __name__ == "__main__":
main()
================================================
FILE: melo/init_downloads.py
================================================
if __name__ == '__main__':
from melo.api import TTS
device = 'auto'
models = {
'EN': TTS(language='EN', device=device),
'ES': TTS(language='ES', device=device),
'FR': TTS(language='FR', device=device),
'ZH': TTS(language='ZH', device=device),
'JP': TTS(language='JP', device=device),
'KR': TTS(language='KR', device=device),
}
================================================
FILE: melo/losses.py
================================================
import torch
def feature_loss(fmap_r, fmap_g):
loss = 0
for dr, dg in zip(fmap_r, fmap_g):
for rl, gl in zip(dr, dg):
rl = rl.float().detach()
gl = gl.float()
loss += torch.mean(torch.abs(rl - gl))
return loss * 2
def discriminator_loss(disc_real_outputs, disc_generated_outputs):
loss = 0
r_losses = []
g_losses = []
for dr, dg in zip(disc_real_outputs, disc_generated_outputs):
dr = dr.float()
dg = dg.float()
r_loss = torch.mean((1 - dr) ** 2)
g_loss = torch.mean(dg**2)
loss += r_loss + g_loss
r_losses.append(r_loss.item())
g_losses.append(g_loss.item())
return loss, r_losses, g_losses
def generator_loss(disc_outputs):
loss = 0
gen_losses = []
for dg in disc_outputs:
dg = dg.float()
l = torch.mean((1 - dg) ** 2)
gen_losses.append(l)
loss += l
return loss, gen_losses
def kl_loss(z_p, logs_q, m_p, logs_p, z_mask):
"""
z_p, logs_q: [b, h, t_t]
m_p, logs_p: [b, h, t_t]
"""
z_p = z_p.float()
logs_q = logs_q.float()
m_p = m_p.float()
logs_p = logs_p.float()
z_mask = z_mask.float()
kl = logs_p - logs_q - 0.5
kl += 0.5 * ((z_p - m_p) ** 2) * torch.exp(-2.0 * logs_p)
kl = torch.sum(kl * z_mask)
l = kl / torch.sum(z_mask)
return l
================================================
FILE: melo/main.py
================================================
import click
import warnings
import os
@click.command
@click.argument('text')
@click.argument('output_path')
@click.option("--file", '-f', is_flag=True, show_default=True, default=False, help="Text is a file")
@click.option('--language', '-l', default='EN', help='Language, defaults to English', type=click.Choice(['EN', 'ES', 'FR', 'ZH', 'JP', 'KR'], case_sensitive=False))
@click.option('--speaker', '-spk', default='EN-Default', help='Speaker ID, only for English, leave empty for default, ignored if not English. If English, defaults to "EN-Default"', type=click.Choice(['EN-Default', 'EN-US', 'EN-BR', 'EN_INDIA', 'EN-AU']))
@click.option('--speed', '-s', default=1.0, help='Speed, defaults to 1.0', type=float)
@click.option('--device', '-d', default='auto', help='Device, defaults to auto')
def main(text, file, output_path, language, speaker, speed, device):
if file:
if not os.path.exists(text):
raise FileNotFoundError(f'Trying to load text from file due to --file/-f flag, but file not found. Remove the --file/-f flag to pass a string.')
else:
with open(text) as f:
text = f.read().strip()
if text == '':
raise ValueError('You entered empty text or the file you passed was empty.')
language = language.upper()
if language == '': language = 'EN'
if speaker == '': speaker = None
if (not language == 'EN') and speaker:
warnings.warn('You specified a speaker but the language is English.')
from melo.api import TTS
model = TTS(language=language, device=device)
speaker_ids = model.hps.data.spk2id
if language == 'EN':
if not speaker: speaker = 'EN-Default'
spkr = speaker_ids[speaker]
else:
spkr = speaker_ids[list(speaker_ids.keys())[0]]
model.tts_to_file(text, spkr, output_path, speed=speed)
================================================
FILE: melo/mel_processing.py
================================================
import torch
import torch.utils.data
import librosa
from librosa.filters import mel as librosa_mel_fn
MAX_WAV_VALUE = 32768.0
def dynamic_range_compression_torch(x, C=1, clip_val=1e-5):
"""
PARAMS
------
C: compression factor
"""
return torch.log(torch.clamp(x, min=clip_val) * C)
def dynamic_range_decompression_torch(x, C=1):
"""
PARAMS
------
C: compression factor used to compress
"""
return torch.exp(x) / C
def spectral_normalize_torch(magnitudes):
output = dynamic_range_compression_torch(magnitudes)
return output
def spectral_de_normalize_torch(magnitudes):
output = dynamic_range_decompression_torch(magnitudes)
return output
mel_basis = {}
hann_window = {}
def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, center=False):
if torch.min(y) < -1.1:
print("min value is ", torch.min(y))
if torch.max(y) > 1.1:
print("max value is ", torch.max(y))
global hann_window
dtype_device = str(y.dtype) + "_" + str(y.device)
wnsize_dtype_device = str(win_size) + "_" + dtype_device
if wnsize_dtype_device not in hann_window:
hann_window[wnsize_dtype_device] = torch.hann_window(win_size).to(
dtype=y.dtype, device=y.device
)
y = torch.nn.functional.pad(
y.unsqueeze(1),
(int((n_fft - hop_size) / 2), int((n_fft - hop_size) / 2)),
mode="reflect",
)
y = y.squeeze(1)
spec = torch.stft(
y,
n_fft,
hop_length=hop_size,
win_length=win_size,
window=hann_window[wnsize_dtype_device],
center=center,
pad_mode="reflect",
normalized=False,
onesided=True,
return_complex=False,
)
spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-6)
return spec
def spectrogram_torch_conv(y, n_fft, sampling_rate, hop_size, win_size, center=False):
global hann_window
dtype_device = str(y.dtype) + '_' + str(y.device)
wnsize_dtype_device = str(win_size) + '_' + dtype_device
if wnsize_dtype_device not in hann_window:
hann_window[wnsize_dtype_device] = torch.hann_window(win_size).to(dtype=y.dtype, device=y.device)
y = torch.nn.functional.pad(y.unsqueeze(1), (int((n_fft-hop_size)/2), int((n_fft-hop_size)/2)), mode='reflect')
# ******************** original ************************#
# y = y.squeeze(1)
# spec1 = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[wnsize_dtype_device],
# center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=False)
# ******************** ConvSTFT ************************#
freq_cutoff = n_fft // 2 + 1
fourier_basis = torch.view_as_real(torch.fft.fft(torch.eye(n_fft)))
forward_basis = fourier_basis[:freq_cutoff].permute(2, 0, 1).reshape(-1, 1, fourier_basis.shape[1])
forward_basis = forward_basis * torch.as_tensor(librosa.util.pad_center(torch.hann_window(win_size), size=n_fft)).float()
import torch.nn.functional as F
# if center:
# signal = F.pad(y[:, None, None, :], (n_fft // 2, n_fft // 2, 0, 0), mode = 'reflect').squeeze(1)
assert center is False
forward_transform_squared = F.conv1d(y, forward_basis.to(y.device), stride = hop_size)
spec2 = torch.stack([forward_transform_squared[:, :freq_cutoff, :], forward_transform_squared[:, freq_cutoff:, :]], dim = -1)
# ******************** Verification ************************#
spec1 = torch.stft(y.squeeze(1), n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[wnsize_dtype_device],
center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=False)
assert torch.allclose(spec1, spec2, atol=1e-4)
spec = torch.sqrt(spec2.pow(2).sum(-1) + 1e-6)
return spec
def spec_to_mel_torch(spec, n_fft, num_mels, sampling_rate, fmin, fmax):
global mel_basis
dtype_device = str(spec.dtype) + "_" + str(spec.device)
fmax_dtype_device = str(fmax) + "_" + dtype_device
if fmax_dtype_device not in mel_basis:
mel = librosa_mel_fn(sr=sampling_rate, n_fft=n_fft, n_mels=num_mels, fmin=fmin, fmax=fmax)
mel_basis[fmax_dtype_device] = torch.from_numpy(mel).to(
dtype=spec.dtype, device=spec.device
)
spec = torch.matmul(mel_basis[fmax_dtype_device], spec)
spec = spectral_normalize_torch(spec)
return spec
def mel_spectrogram_torch(
y, n_fft, num_mels, sampling_rate, hop_size, win_size, fmin, fmax, center=False
):
global mel_basis, hann_window
dtype_device = str(y.dtype) + "_" + str(y.device)
fmax_dtype_device = str(fmax) + "_" + dtype_device
wnsize_dtype_device = str(win_size) + "_" + dtype_device
if fmax_dtype_device not in mel_basis:
mel = librosa_mel_fn(sr=sampling_rate, n_fft=n_fft, n_mels=num_mels, fmin=fmin, fmax=fmax)
mel_basis[fmax_dtype_device] = torch.from_numpy(mel).to(
dtype=y.dtype, device=y.device
)
if wnsize_dtype_device not in hann_window:
hann_window[wnsize_dtype_device] = torch.hann_window(win_size).to(
dtype=y.dtype, device=y.device
)
y = torch.nn.functional.pad(
y.unsqueeze(1),
(int((n_fft - hop_size) / 2), int((n_fft - hop_size) / 2)),
mode="reflect",
)
y = y.squeeze(1)
spec = torch.stft(
y,
n_fft,
hop_length=hop_size,
win_length=win_size,
window=hann_window[wnsize_dtype_device],
center=center,
pad_mode="reflect",
normalized=False,
onesided=True,
return_complex=False,
)
spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-6)
spec = torch.matmul(mel_basis[fmax_dtype_device], spec)
spec = spectral_normalize_torch(spec)
return spec
================================================
FILE: melo/models.py
================================================
import math
import torch
from torch import nn
from torch.nn import functional as F
from melo import commons
from melo import modules
from melo import attentions
from torch.nn import Conv1d, ConvTranspose1d, Conv2d
from torch.nn.utils import weight_norm, remove_weight_norm, spectral_norm
from melo.commons import init_weights, get_padding
import melo.monotonic_align as monotonic_align
class DurationDiscriminator(nn.Module): # vits2
def __init__(
self, in_channels, filter_channels, kernel_size, p_dropout, gin_channels=0
):
super().__init__()
self.in_channels = in_channels
self.filter_channels = filter_channels
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.gin_channels = gin_channels
self.drop = nn.Dropout(p_dropout)
self.conv_1 = nn.Conv1d(
in_channels, filter_channels, kernel_size, padding=kernel_size // 2
)
self.norm_1 = modules.LayerNorm(filter_channels)
self.conv_2 = nn.Conv1d(
filter_channels, filter_channels, kernel_size, padding=kernel_size // 2
)
self.norm_2 = modules.LayerNorm(filter_channels)
self.dur_proj = nn.Conv1d(1, filter_channels, 1)
self.pre_out_conv_1 = nn.Conv1d(
2 * filter_channels, filter_channels, kernel_size, padding=kernel_size // 2
)
self.pre_out_norm_1 = modules.LayerNorm(filter_channels)
self.pre_out_conv_2 = nn.Conv1d(
filter_channels, filter_channels, kernel_size, padding=kernel_size // 2
)
self.pre_out_norm_2 = modules.LayerNorm(filter_channels)
if gin_channels != 0:
self.cond = nn.Conv1d(gin_channels, in_channels, 1)
self.output_layer = nn.Sequential(nn.Linear(filter_channels, 1), nn.Sigmoid())
def forward_probability(self, x, x_mask, dur, g=None):
dur = self.dur_proj(dur)
x = torch.cat([x, dur], dim=1)
x = self.pre_out_conv_1(x * x_mask)
x = torch.relu(x)
x = self.pre_out_norm_1(x)
x = self.drop(x)
x = self.pre_out_conv_2(x * x_mask)
x = torch.relu(x)
x = self.pre_out_norm_2(x)
x = self.drop(x)
x = x * x_mask
x = x.transpose(1, 2)
output_prob = self.output_layer(x)
return output_prob
def forward(self, x, x_mask, dur_r, dur_hat, g=None):
x = torch.detach(x)
if g is not None:
g = torch.detach(g)
x = x + self.cond(g)
x = self.conv_1(x * x_mask)
x = torch.relu(x)
x = self.norm_1(x)
x = self.drop(x)
x = self.conv_2(x * x_mask)
x = torch.relu(x)
x = self.norm_2(x)
x = self.drop(x)
output_probs = []
for dur in [dur_r, dur_hat]:
output_prob = self.forward_probability(x, x_mask, dur, g)
output_probs.append(output_prob)
return output_probs
class TransformerCouplingBlock(nn.Module):
def __init__(
self,
channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
n_flows=4,
gin_channels=0,
share_parameter=False,
):
super().__init__()
self.channels = channels
self.hidden_channels = hidden_channels
self.kernel_size = kernel_size
self.n_layers = n_layers
self.n_flows = n_flows
self.gin_channels = gin_channels
self.flows = nn.ModuleList()
self.wn = (
attentions.FFT(
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
isflow=True,
gin_channels=self.gin_channels,
)
if share_parameter
else None
)
for i in range(n_flows):
self.flows.append(
modules.TransformerCouplingLayer(
channels,
hidden_channels,
kernel_size,
n_layers,
n_heads,
p_dropout,
filter_channels,
mean_only=True,
wn_sharing_parameter=self.wn,
gin_channels=self.gin_channels,
)
)
self.flows.append(modules.Flip())
def forward(self, x, x_mask, g=None, reverse=False):
if not reverse:
for flow in self.flows:
x, _ = flow(x, x_mask, g=g, reverse=reverse)
else:
for flow in reversed(self.flows):
x = flow(x, x_mask, g=g, reverse=reverse)
return x
class StochasticDurationPredictor(nn.Module):
def __init__(
self,
in_channels,
filter_channels,
kernel_size,
p_dropout,
n_flows=4,
gin_channels=0,
):
super().__init__()
filter_channels = in_channels # it needs to be removed from future version.
self.in_channels = in_channels
self.filter_channels = filter_channels
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.n_flows = n_flows
self.gin_channels = gin_channels
self.log_flow = modules.Log()
self.flows = nn.ModuleList()
self.flows.append(modules.ElementwiseAffine(2))
for i in range(n_flows):
self.flows.append(
modules.ConvFlow(2, filter_channels, kernel_size, n_layers=3)
)
self.flows.append(modules.Flip())
self.post_pre = nn.Conv1d(1, filter_channels, 1)
self.post_proj = nn.Conv1d(filter_channels, filter_channels, 1)
self.post_convs = modules.DDSConv(
filter_channels, kernel_size, n_layers=3, p_dropout=p_dropout
)
self.post_flows = nn.ModuleList()
self.post_flows.append(modules.ElementwiseAffine(2))
for i in range(4):
self.post_flows.append(
modules.ConvFlow(2, filter_channels, kernel_size, n_layers=3)
)
self.post_flows.append(modules.Flip())
self.pre = nn.Conv1d(in_channels, filter_channels, 1)
self.proj = nn.Conv1d(filter_channels, filter_channels, 1)
self.convs = modules.DDSConv(
filter_channels, kernel_size, n_layers=3, p_dropout=p_dropout
)
if gin_channels != 0:
self.cond = nn.Conv1d(gin_channels, filter_channels, 1)
def forward(self, x, x_mask, w=None, g=None, reverse=False, noise_scale=1.0):
x = torch.detach(x)
x = self.pre(x)
if g is not None:
g = torch.detach(g)
x = x + self.cond(g)
x = self.convs(x, x_mask)
x = self.proj(x) * x_mask
if not reverse:
flows = self.flows
assert w is not None
logdet_tot_q = 0
h_w = self.post_pre(w)
h_w = self.post_convs(h_w, x_mask)
h_w = self.post_proj(h_w) * x_mask
e_q = (
torch.randn(w.size(0), 2, w.size(2)).to(device=x.device, dtype=x.dtype)
* x_mask
)
z_q = e_q
for flow in self.post_flows:
z_q, logdet_q = flow(z_q, x_mask, g=(x + h_w))
logdet_tot_q += logdet_q
z_u, z1 = torch.split(z_q, [1, 1], 1)
u = torch.sigmoid(z_u) * x_mask
z0 = (w - u) * x_mask
logdet_tot_q += torch.sum(
(F.logsigmoid(z_u) + F.logsigmoid(-z_u)) * x_mask, [1, 2]
)
logq = (
torch.sum(-0.5 * (math.log(2 * math.pi) + (e_q**2)) * x_mask, [1, 2])
- logdet_tot_q
)
logdet_tot = 0
z0, logdet = self.log_flow(z0, x_mask)
logdet_tot += logdet
z = torch.cat([z0, z1], 1)
for flow in flows:
z, logdet = flow(z, x_mask, g=x, reverse=reverse)
logdet_tot = logdet_tot + logdet
nll = (
torch.sum(0.5 * (math.log(2 * math.pi) + (z**2)) * x_mask, [1, 2])
- logdet_tot
)
return nll + logq # [b]
else:
flows = list(reversed(self.flows))
flows = flows[:-2] + [flows[-1]] # remove a useless vflow
z = (
torch.randn(x.size(0), 2, x.size(2)).to(device=x.device, dtype=x.dtype)
* noise_scale
)
for flow in flows:
z = flow(z, x_mask, g=x, reverse=reverse)
z0, z1 = torch.split(z, [1, 1], 1)
logw = z0
return logw
class DurationPredictor(nn.Module):
def __init__(
self, in_channels, filter_channels, kernel_size, p_dropout, gin_channels=0
):
super().__init__()
self.in_channels = in_channels
self.filter_channels = filter_channels
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.gin_channels = gin_channels
self.drop = nn.Dropout(p_dropout)
self.conv_1 = nn.Conv1d(
in_channels, filter_channels, kernel_size, padding=kernel_size // 2
)
self.norm_1 = modules.LayerNorm(filter_channels)
self.conv_2 = nn.Conv1d(
filter_channels, filter_channels, kernel_size, padding=kernel_size // 2
)
self.norm_2 = modules.LayerNorm(filter_channels)
self.proj = nn.Conv1d(filter_channels, 1, 1)
if gin_channels != 0:
self.cond = nn.Conv1d(gin_channels, in_channels, 1)
def forward(self, x, x_mask, g=None):
x = torch.detach(x)
if g is not None:
g = torch.detach(g)
x = x + self.cond(g)
x = self.conv_1(x * x_mask)
x = torch.relu(x)
x = self.norm_1(x)
x = self.drop(x)
x = self.conv_2(x * x_mask)
x = torch.relu(x)
x = self.norm_2(x)
x = self.drop(x)
x = self.proj(x * x_mask)
return x * x_mask
class TextEncoder(nn.Module):
def __init__(
self,
n_vocab,
out_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
gin_channels=0,
num_languages=None,
num_tones=None,
):
super().__init__()
if num_languages is None:
from text import num_languages
if num_tones is None:
from text import num_tones
self.n_vocab = n_vocab
self.out_channels = out_channels
self.hidden_channels = hidden_channels
self.filter_channels = filter_channels
self.n_heads = n_heads
self.n_layers = n_layers
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.gin_channels = gin_channels
self.emb = nn.Embedding(n_vocab, hidden_channels)
nn.init.normal_(self.emb.weight, 0.0, hidden_channels**-0.5)
self.tone_emb = nn.Embedding(num_tones, hidden_channels)
nn.init.normal_(self.tone_emb.weight, 0.0, hidden_channels**-0.5)
self.language_emb = nn.Embedding(num_languages, hidden_channels)
nn.init.normal_(self.language_emb.weight, 0.0, hidden_channels**-0.5)
self.bert_proj = nn.Conv1d(1024, hidden_channels, 1)
self.ja_bert_proj = nn.Conv1d(768, hidden_channels, 1)
self.encoder = attentions.Encoder(
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
gin_channels=self.gin_channels,
)
self.proj = nn.Conv1d(hidden_channels, out_channels * 2, 1)
def forward(self, x, x_lengths, tone, language, bert, ja_bert, g=None):
bert_emb = self.bert_proj(bert).transpose(1, 2)
ja_bert_emb = self.ja_bert_proj(ja_bert).transpose(1, 2)
x = (
self.emb(x)
+ self.tone_emb(tone)
+ self.language_emb(language)
+ bert_emb
+ ja_bert_emb
) * math.sqrt(
self.hidden_channels
) # [b, t, h]
x = torch.transpose(x, 1, -1) # [b, h, t]
x_mask = torch.unsqueeze(commons.sequence_mask(x_lengths, x.size(2)), 1).to(
x.dtype
)
x = self.encoder(x * x_mask, x_mask, g=g)
stats = self.proj(x) * x_mask
m, logs = torch.split(stats, self.out_channels, dim=1)
return x, m, logs, x_mask
class ResidualCouplingBlock(nn.Module):
def __init__(
self,
channels,
hidden_channels,
kernel_size,
dilation_rate,
n_layers,
n_flows=4,
gin_channels=0,
):
super().__init__()
self.channels = channels
self.hidden_channels = hidden_channels
self.kernel_size = kernel_size
self.dilation_rate = dilation_rate
self.n_layers = n_layers
self.n_flows = n_flows
self.gin_channels = gin_channels
self.flows = nn.ModuleList()
for i in range(n_flows):
self.flows.append(
modules.ResidualCouplingLayer(
channels,
hidden_channels,
kernel_size,
dilation_rate,
n_layers,
gin_channels=gin_channels,
mean_only=True,
)
)
self.flows.append(modules.Flip())
def forward(self, x, x_mask, g=None, reverse=False):
if not reverse:
for flow in self.flows:
x, _ = flow(x, x_mask, g=g, reverse=reverse)
else:
for flow in reversed(self.flows):
x = flow(x, x_mask, g=g, reverse=reverse)
return x
class PosteriorEncoder(nn.Module):
def __init__(
self,
in_channels,
out_channels,
hidden_channels,
kernel_size,
dilation_rate,
n_layers,
gin_channels=0,
):
super().__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.hidden_channels = hidden_channels
self.kernel_size = kernel_size
self.dilation_rate = dilation_rate
self.n_layers = n_layers
self.gin_channels = gin_channels
self.pre = nn.Conv1d(in_channels, hidden_channels, 1)
self.enc = modules.WN(
hidden_channels,
kernel_size,
dilation_rate,
n_layers,
gin_channels=gin_channels,
)
self.proj = nn.Conv1d(hidden_channels, out_channels * 2, 1)
def forward(self, x, x_lengths, g=None, tau=1.0):
x_mask = torch.unsqueeze(commons.sequence_mask(x_lengths, x.size(2)), 1).to(
x.dtype
)
x = self.pre(x) * x_mask
x = self.enc(x, x_mask, g=g)
stats = self.proj(x) * x_mask
m, logs = torch.split(stats, self.out_channels, dim=1)
z = (m + torch.randn_like(m) * tau * torch.exp(logs)) * x_mask
return z, m, logs, x_mask
class Generator(torch.nn.Module):
def __init__(
self,
initial_channel,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
gin_channels=0,
):
super(Generator, self).__init__()
self.num_kernels = len(resblock_kernel_sizes)
self.num_upsamples = len(upsample_rates)
self.conv_pre = Conv1d(
initial_channel, upsample_initial_channel, 7, 1, padding=3
)
resblock = modules.ResBlock1 if resblock == "1" else modules.ResBlock2
self.ups = nn.ModuleList()
for i, (u, k) in enumerate(zip(upsample_rates, upsample_kernel_sizes)):
self.ups.append(
weight_norm(
ConvTranspose1d(
upsample_initial_channel // (2**i),
upsample_initial_channel // (2 ** (i + 1)),
k,
u,
padding=(k - u) // 2,
)
)
)
self.resblocks = nn.ModuleList()
for i in range(len(self.ups)):
ch = upsample_initial_channel // (2 ** (i + 1))
for j, (k, d) in enumerate(
zip(resblock_kernel_sizes, resblock_dilation_sizes)
):
self.resblocks.append(resblock(ch, k, d))
self.conv_post = Conv1d(ch, 1, 7, 1, padding=3, bias=False)
self.ups.apply(init_weights)
if gin_channels != 0:
self.cond = nn.Conv1d(gin_channels, upsample_initial_channel, 1)
def forward(self, x, g=None):
x = self.conv_pre(x)
if g is not None:
x = x + self.cond(g)
for i in range(self.num_upsamples):
x = F.leaky_relu(x, modules.LRELU_SLOPE)
x = self.ups[i](x)
xs = None
for j in range(self.num_kernels):
if xs is None:
xs = self.resblocks[i * self.num_kernels + j](x)
else:
xs += self.resblocks[i * self.num_kernels + j](x)
x = xs / self.num_kernels
x = F.leaky_relu(x)
x = self.conv_post(x)
x = torch.tanh(x)
return x
def remove_weight_norm(self):
print("Removing weight norm...")
for layer in self.ups:
remove_weight_norm(layer)
for layer in self.resblocks:
layer.remove_weight_norm()
class DiscriminatorP(torch.nn.Module):
def __init__(self, period, kernel_size=5, stride=3, use_spectral_norm=False):
super(DiscriminatorP, self).__init__()
self.period = period
self.use_spectral_norm = use_spectral_norm
norm_f = weight_norm if use_spectral_norm is False else spectral_norm
self.convs = nn.ModuleList(
[
norm_f(
Conv2d(
1,
32,
(kernel_size, 1),
(stride, 1),
padding=(get_padding(kernel_size, 1), 0),
)
),
norm_f(
Conv2d(
32,
128,
(kernel_size, 1),
(stride, 1),
padding=(get_padding(kernel_size, 1), 0),
)
),
norm_f(
Conv2d(
128,
512,
(kernel_size, 1),
(stride, 1),
padding=(get_padding(kernel_size, 1), 0),
)
),
norm_f(
Conv2d(
512,
1024,
(kernel_size, 1),
(stride, 1),
padding=(get_padding(kernel_size, 1), 0),
)
),
norm_f(
Conv2d(
1024,
1024,
(kernel_size, 1),
1,
padding=(get_padding(kernel_size, 1), 0),
)
),
]
)
self.conv_post = norm_f(Conv2d(1024, 1, (3, 1), 1, padding=(1, 0)))
def forward(self, x):
fmap = []
# 1d to 2d
b, c, t = x.shape
if t % self.period != 0: # pad first
n_pad = self.period - (t % self.period)
x = F.pad(x, (0, n_pad), "reflect")
t = t + n_pad
x = x.view(b, c, t // self.period, self.period)
for layer in self.convs:
x = layer(x)
x = F.leaky_relu(x, modules.LRELU_SLOPE)
fmap.append(x)
x = self.conv_post(x)
fmap.append(x)
x = torch.flatten(x, 1, -1)
return x, fmap
class DiscriminatorS(torch.nn.Module):
def __init__(self, use_spectral_norm=False):
super(DiscriminatorS, self).__init__()
norm_f = weight_norm if use_spectral_norm is False else spectral_norm
self.convs = nn.ModuleList(
[
norm_f(Conv1d(1, 16, 15, 1, padding=7)),
norm_f(Conv1d(16, 64, 41, 4, groups=4, padding=20)),
norm_f(Conv1d(64, 256, 41, 4, groups=16, padding=20)),
norm_f(Conv1d(256, 1024, 41, 4, groups=64, padding=20)),
norm_f(Conv1d(1024, 1024, 41, 4, groups=256, padding=20)),
norm_f(Conv1d(1024, 1024, 5, 1, padding=2)),
]
)
self.conv_post = norm_f(Conv1d(1024, 1, 3, 1, padding=1))
def forward(self, x):
fmap = []
for layer in self.convs:
x = layer(x)
x = F.leaky_relu(x, modules.LRELU_SLOPE)
fmap.append(x)
x = self.conv_post(x)
fmap.append(x)
x = torch.flatten(x, 1, -1)
return x, fmap
class MultiPeriodDiscriminator(torch.nn.Module):
def __init__(self, use_spectral_norm=False):
super(MultiPeriodDiscriminator, self).__init__()
periods = [2, 3, 5, 7, 11]
discs = [DiscriminatorS(use_spectral_norm=use_spectral_norm)]
discs = discs + [
DiscriminatorP(i, use_spectral_norm=use_spectral_norm) for i in periods
]
self.discriminators = nn.ModuleList(discs)
def forward(self, y, y_hat):
y_d_rs = []
y_d_gs = []
fmap_rs = []
fmap_gs = []
for i, d in enumerate(self.discriminators):
y_d_r, fmap_r = d(y)
y_d_g, fmap_g = d(y_hat)
y_d_rs.append(y_d_r)
y_d_gs.append(y_d_g)
fmap_rs.append(fmap_r)
fmap_gs.append(fmap_g)
return y_d_rs, y_d_gs, fmap_rs, fmap_gs
class ReferenceEncoder(nn.Module):
"""
inputs --- [N, Ty/r, n_mels*r] mels
outputs --- [N, ref_enc_gru_size]
"""
def __init__(self, spec_channels, gin_channels=0, layernorm=False):
super().__init__()
self.spec_channels = spec_channels
ref_enc_filters = [32, 32, 64, 64, 128, 128]
K = len(ref_enc_filters)
filters = [1] + ref_enc_filters
convs = [
weight_norm(
nn.Conv2d(
in_channels=filters[i],
out_channels=filters[i + 1],
kernel_size=(3, 3),
stride=(2, 2),
padding=(1, 1),
)
)
for i in range(K)
]
self.convs = nn.ModuleList(convs)
# self.wns = nn.ModuleList([weight_norm(num_features=ref_enc_filters[i]) for i in range(K)]) # noqa: E501
out_channels = self.calculate_channels(spec_channels, 3, 2, 1, K)
self.gru = nn.GRU(
input_size=ref_enc_filters[-1] * out_channels,
hidden_size=256 // 2,
batch_first=True,
)
self.proj = nn.Linear(128, gin_channels)
if layernorm:
self.layernorm = nn.LayerNorm(self.spec_channels)
print('[Ref Enc]: using layer norm')
else:
self.layernorm = None
def forward(self, inputs, mask=None):
N = inputs.size(0)
out = inputs.view(N, 1, -1, self.spec_channels) # [N, 1, Ty, n_freqs]
if self.layernorm is not None:
out = self.layernorm(out)
for conv in self.convs:
out = conv(out)
# out = wn(out)
out = F.relu(out) # [N, 128, Ty//2^K, n_mels//2^K]
out = out.transpose(1, 2) # [N, Ty//2^K, 128, n_mels//2^K]
T = out.size(1)
N = out.size(0)
out = out.contiguous().view(N, T, -1) # [N, Ty//2^K, 128*n_mels//2^K]
self.gru.flatten_parameters()
memory, out = self.gru(out) # out --- [1, N, 128]
return self.proj(out.squeeze(0))
def calculate_channels(self, L, kernel_size, stride, pad, n_convs):
for i in range(n_convs):
L = (L - kernel_size + 2 * pad) // stride + 1
return L
class SynthesizerTrn(nn.Module):
"""
Synthesizer for Training
"""
def __init__(
self,
n_vocab,
spec_channels,
segment_size,
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
n_speakers=256,
gin_channels=256,
use_sdp=True,
n_flow_layer=4,
n_layers_trans_flow=6,
flow_share_parameter=False,
use_transformer_flow=True,
use_vc=False,
num_languages=None,
num_tones=None,
norm_refenc=False,
**kwargs
):
super().__init__()
self.n_vocab = n_vocab
self.spec_channels = spec_channels
self.inter_channels = inter_channels
self.hidden_channels = hidden_channels
self.filter_channels = filter_channels
self.n_heads = n_heads
self.n_layers = n_layers
self.kernel_size = kernel_size
self.p_dropout = p_dropout
self.resblock = resblock
self.resblock_kernel_sizes = resblock_kernel_sizes
self.resblock_dilation_sizes = resblock_dilation_sizes
self.upsample_rates = upsample_rates
self.upsample_initial_channel = upsample_initial_channel
self.upsample_kernel_sizes = upsample_kernel_sizes
self.segment_size = segment_size
self.n_speakers = n_speakers
self.gin_channels = gin_channels
self.n_layers_trans_flow = n_layers_trans_flow
self.use_spk_conditioned_encoder = kwargs.get(
"use_spk_conditioned_encoder", True
)
self.use_sdp = use_sdp
self.use_noise_scaled_mas = kwargs.get("use_noise_scaled_mas", False)
self.mas_noise_scale_initial = kwargs.get("mas_noise_scale_initial", 0.01)
self.noise_scale_delta = kwargs.get("noise_scale_delta", 2e-6)
self.current_mas_noise_scale = self.mas_noise_scale_initial
if self.use_spk_conditioned_encoder and gin_channels > 0:
self.enc_gin_channels = gin_channels
else:
self.enc_gin_channels = 0
self.enc_p = TextEncoder(
n_vocab,
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
gin_channels=self.enc_gin_channels,
num_languages=num_languages,
num_tones=num_tones,
)
self.dec = Generator(
inter_channels,
resblock,
resblock_kernel_sizes,
resblock_dilation_sizes,
upsample_rates,
upsample_initial_channel,
upsample_kernel_sizes,
gin_channels=gin_channels,
)
self.enc_q = PosteriorEncoder(
spec_channels,
inter_channels,
hidden_channels,
5,
1,
16,
gin_channels=gin_channels,
)
if use_transformer_flow:
self.flow = TransformerCouplingBlock(
inter_channels,
hidden_channels,
filter_channels,
n_heads,
n_layers_trans_flow,
5,
p_dropout,
n_flow_layer,
gin_channels=gin_channels,
share_parameter=flow_share_parameter,
)
else:
self.flow = ResidualCouplingBlock(
inter_channels,
hidden_channels,
5,
1,
n_flow_layer,
gin_channels=gin_channels,
)
self.sdp = StochasticDurationPredictor(
hidden_channels, 192, 3, 0.5, 4, gin_channels=gin_channels
)
self.dp = DurationPredictor(
hidden_channels, 256, 3, 0.5, gin_channels=gin_channels
)
if n_speakers > 0:
self.emb_g = nn.Embedding(n_speakers, gin_channels)
else:
self.ref_enc = ReferenceEncoder(spec_channels, gin_channels, layernorm=norm_refenc)
self.use_vc = use_vc
def forward(self, x, x_lengths, y, y_lengths, sid, tone, language, bert, ja_bert):
if self.n_speakers > 0:
g = self.emb_g(sid).unsqueeze(-1) # [b, h, 1]
else:
g = self.ref_enc(y.transpose(1, 2)).unsqueeze(-1)
if self.use_vc:
g_p = None
else:
g_p = g
x, m_p, logs_p, x_mask = self.enc_p(
x, x_lengths, tone, language, bert, ja_bert, g=g_p
)
z, m_q, logs_q, y_mask = self.enc_q(y, y_lengths, g=g)
z_p = self.flow(z, y_mask, g=g)
with torch.no_grad():
# negative cross-entropy
s_p_sq_r = torch.exp(-2 * logs_p) # [b, d, t]
neg_cent1 = torch.sum(
-0.5 * math.log(2 * math.pi) - logs_p, [1], keepdim=True
) # [b, 1, t_s]
neg_cent2 = torch.matmul(
-0.5 * (z_p**2).transpose(1, 2), s_p_sq_r
) # [b, t_t, d] x [b, d, t_s] = [b, t_t, t_s]
neg_cent3 = torch.matmul(
z_p.transpose(1, 2), (m_p * s_p_sq_r)
) # [b, t_t, d] x [b, d, t_s] = [b, t_t, t_s]
neg_cent4 = torch.sum(
-0.5 * (m_p**2) * s_p_sq_r, [1], keepdim=True
) # [b, 1, t_s]
neg_cent = neg_cent1 + neg_cent2 + neg_cent3 + neg_cent4
if self.use_noise_scaled_mas:
epsilon = (
torch.std(neg_cent)
* torch.randn_like(neg_cent)
* self.current_mas_noise_scale
)
neg_cent = neg_cent + epsilon
attn_mask = torch.unsqueeze(x_mask, 2) * torch.unsqueeze(y_mask, -1)
attn = (
monotonic_align.maximum_path(neg_cent, attn_mask.squeeze(1))
.unsqueeze(1)
.detach()
)
w = attn.sum(2)
l_length_sdp = self.sdp(x, x_mask, w, g=g)
l_length_sdp = l_length_sdp / torch.sum(x_mask)
logw_ = torch.log(w + 1e-6) * x_mask
logw = self.dp(x, x_mask, g=g)
l_length_dp = torch.sum((logw - logw_) ** 2, [1, 2]) / torch.sum(
x_mask
) # for averaging
l_length = l_length_dp + l_length_sdp
# expand prior
m_p = torch.matmul(attn.squeeze(1), m_p.transpose(1, 2)).transpose(1, 2)
logs_p = torch.matmul(attn.squeeze(1), logs_p.transpose(1, 2)).transpose(1, 2)
z_slice, ids_slice = commons.rand_slice_segments(
z, y_lengths, self.segment_size
)
o = self.dec(z_slice, g=g)
return (
o,
l_length,
attn,
ids_slice,
x_mask,
y_mask,
(z, z_p, m_p, logs_p, m_q, logs_q),
(x, logw, logw_),
)
def infer(
self,
x,
x_lengths,
sid,
tone,
language,
bert,
ja_bert,
noise_scale=0.667,
length_scale=1,
noise_scale_w=0.8,
max_len=None,
sdp_ratio=0,
y=None,
g=None,
):
# x, m_p, logs_p, x_mask = self.enc_p(x, x_lengths, tone, language, bert)
# g = self.gst(y)
if g is None:
if self.n_speakers > 0:
g = self.emb_g(sid).unsqueeze(-1) # [b, h, 1]
else:
g = self.ref_enc(y.transpose(1, 2)).unsqueeze(-1)
if self.use_vc:
g_p = None
else:
g_p = g
x, m_p, logs_p, x_mask = self.enc_p(
x, x_lengths, tone, language, bert, ja_bert, g=g_p
)
logw = self.sdp(x, x_mask, g=g, reverse=True, noise_scale=noise_scale_w) * (
sdp_ratio
) + self.dp(x, x_mask, g=g) * (1 - sdp_ratio)
w = torch.exp(logw) * x_mask * length_scale
w_ceil = torch.ceil(w)
y_lengths = torch.clamp_min(torch.sum(w_ceil, [1, 2]), 1).long()
y_mask = torch.unsqueeze(commons.sequence_mask(y_lengths, None), 1).to(
x_mask.dtype
)
attn_mask = torch.unsqueeze(x_mask, 2) * torch.unsqueeze(y_mask, -1)
attn = commons.generate_path(w_ceil, attn_mask)
m_p = torch.matmul(attn.squeeze(1), m_p.transpose(1, 2)).transpose(
1, 2
) # [b, t', t], [b, t, d] -> [b, d, t']
logs_p = torch.matmul(attn.squeeze(1), logs_p.transpose(1, 2)).transpose(
1, 2
) # [b, t', t], [b, t, d] -> [b, d, t']
z_p = m_p + torch.randn_like(m_p) * torch.exp(logs_p) * noise_scale
z = self.flow(z_p, y_mask, g=g, reverse=True)
o = self.dec((z * y_mask)[:, :, :max_len], g=g)
# print('max/min of o:', o.max(), o.min())
return o, attn, y_mask, (z, z_p, m_p, logs_p)
def voice_conversion(self, y, y_lengths, sid_src, sid_tgt, tau=1.0):
g_src = sid_src
g_tgt = sid_tgt
z, m_q, logs_q, y_mask = self.enc_q(y, y_lengths, g=g_src, tau=tau)
z_p = self.flow(z, y_mask, g=g_src)
z_hat = self.flow(z_p, y_mask, g=g_tgt, reverse=True)
o_hat = self.dec(z_hat * y_mask, g=g_tgt)
return o_hat, y_mask, (z, z_p, z_hat)
================================================
FILE: melo/modules.py
================================================
import math
import torch
from torch import nn
from torch.nn import functional as F
from torch.nn import Conv1d
from torch.nn.utils import weight_norm, remove_weight_norm
from . import commons
from .commons import init_weights, get_padding
from .transforms import piecewise_rational_quadratic_transform
from .attentions import Encoder
LRELU_SLOPE = 0.1
class LayerNorm(nn.Module):
def __init__(self, channels, eps=1e-5):
super().__init__()
self.channels = channels
self.eps = eps
self.gamma = nn.Parameter(torch.ones(channels))
self.beta = nn.Parameter(torch.zeros(channels))
def forward(self, x):
x = x.transpose(1, -1)
x = F.layer_norm(x, (self.channels,), self.gamma, self.beta, self.eps)
return x.transpose(1, -1)
class ConvReluNorm(nn.Module):
def __init__(
self,
in_channels,
hidden_channels,
out_channels,
kernel_size,
n_layers,
p_dropout,
):
super().__init__()
self.in_channels = in_channels
self.hidden_channels = hidden_channels
self.out_channels = out_channels
self.kernel_size = kernel_size
self.n_layers = n_layers
self.p_dropout = p_dropout
assert n_layers > 1, "Number of layers should be larger than 0."
self.conv_layers = nn.ModuleList()
self.norm_layers = nn.ModuleList()
self.conv_layers.append(
nn.Conv1d(
in_channels, hidden_channels, kernel_size, padding=kernel_size // 2
)
)
self.norm_layers.append(LayerNorm(hidden_channels))
self.relu_drop = nn.Sequential(nn.ReLU(), nn.Dropout(p_dropout))
for _ in range(n_layers - 1):
self.conv_layers.append(
nn.Conv1d(
hidden_channels,
hidden_channels,
kernel_size,
padding=kernel_size // 2,
)
)
self.norm_layers.append(LayerNorm(hidden_channels))
self.proj = nn.Conv1d(hidden_channels, out_channels, 1)
self.proj.weight.data.zero_()
self.proj.bias.data.zero_()
def forward(self, x, x_mask):
x_org = x
for i in range(self.n_layers):
x = self.conv_layers[i](x * x_mask)
x = self.norm_layers[i](x)
x = self.relu_drop(x)
x = x_org + self.proj(x)
return x * x_mask
class DDSConv(nn.Module):
"""
Dialted and Depth-Separable Convolution
"""
def __init__(self, channels, kernel_size, n_layers, p_dropout=0.0):
super().__init__()
self.channels = channels
self.kernel_size = kernel_size
self.n_layers = n_layers
self.p_dropout = p_dropout
self.drop = nn.Dropout(p_dropout)
self.convs_sep = nn.ModuleList()
self.convs_1x1 = nn.ModuleList()
self.norms_1 = nn.ModuleList()
self.norms_2 = nn.ModuleList()
for i in range(n_layers):
dilation = kernel_size**i
padding = (kernel_size * dilation - dilation) // 2
self.convs_sep.append(
nn.Conv1d(
channels,
channels,
kernel_size,
groups=channels,
dilation=dilation,
padding=padding,
)
)
self.convs_1x1.append(nn.Conv1d(channels, channels, 1))
self.norms_1.append(LayerNorm(channels))
self.norms_2.append(LayerNorm(channels))
def forward(self, x, x_mask, g=None):
if g is not None:
x = x + g
for i in range(self.n_layers):
y = self.convs_sep[i](x * x_mask)
y = self.norms_1[i](y)
y = F.gelu(y)
y = self.convs_1x1[i](y)
y = self.norms_2[i](y)
y = F.gelu(y)
y = self.drop(y)
x = x + y
return x * x_mask
class WN(torch.nn.Module):
def __init__(
self,
hidden_channels,
kernel_size,
dilation_rate,
n_layers,
gin_channels=0,
p_dropout=0,
):
super(WN, self).__init__()
assert kernel_size % 2 == 1
self.hidden_channels = hidden_channels
self.kernel_size = (kernel_size,)
self.dilation_rate = dilation_rate
self.n_layers = n_layers
self.gin_channels = gin_channels
self.p_dropout = p_dropout
self.in_layers = torch.nn.ModuleList()
self.res_skip_layers = torch.nn.ModuleList()
self.drop = nn.Dropout(p_dropout)
if gin_channels != 0:
cond_layer = torch.nn.Conv1d(
gin_channels, 2 * hidden_channels * n_layers, 1
)
self.cond_layer = torch.nn.utils.weight_norm(cond_layer, name="weight")
for i in range(n_layers):
dilation = dilation_rate**i
padding = int((kernel_size * dilation - dilation) / 2)
in_layer = torch.nn.Conv1d(
hidden_channels,
2 * hidden_channels,
kernel_size,
dilation=dilation,
padding=padding,
)
in_layer = torch.nn.utils.weight_norm(in_layer, name="weight")
self.in_layers.append(in_layer)
# last one is not necessary
if i < n_layers - 1:
res_skip_channels = 2 * hidden_channels
else:
res_skip_channels = hidden_channels
res_skip_layer = torch.nn.Conv1d(hidden_channels, res_skip_channels, 1)
res_skip_layer = torch.nn.utils.weight_norm(res_skip_layer, name="weight")
self.res_skip_layers.append(res_skip_layer)
def forward(self, x, x_mask, g=None, **kwargs):
output = torch.zeros_like(x)
n_channels_tensor = torch.IntTensor([self.hidden_channels])
if g is not None:
g = self.cond_layer(g)
for i in range(self.n_layers):
x_in = self.in_layers[i](x)
if g is not None:
cond_offset = i * 2 * self.hidden_channels
g_l = g[:, cond_offset : cond_offset + 2 * self.hidden_channels, :]
else:
g_l = torch.zeros_like(x_in)
acts = commons.fused_add_tanh_sigmoid_multiply(x_in, g_l, n_channels_tensor)
acts = self.drop(acts)
res_skip_acts = self.res_skip_layers[i](acts)
if i < self.n_layers - 1:
res_acts = res_skip_acts[:, : self.hidden_channels, :]
x = (x + res_acts) * x_mask
output = output + res_skip_acts[:, self.hidden_channels :, :]
else:
output = output + res_skip_acts
return output * x_mask
def remove_weight_norm(self):
if self.gin_channels != 0:
torch.nn.utils.remove_weight_norm(self.cond_layer)
for l in self.in_layers:
torch.nn.utils.remove_weight_norm(l)
for l in self.res_skip_layers:
torch.nn.utils.remove_weight_norm(l)
class ResBlock1(torch.nn.Module):
def __init__(self, channels, kernel_size=3, dilation=(1, 3, 5)):
super(ResBlock1, self).__init__()
self.convs1 = nn.ModuleList(
[
weight_norm(
Conv1d(
channels,
channels,
kernel_size,
1,
dilation=dilation[0],
padding=get_padding(kernel_size, dilation[0]),
)
),
weight_norm(
Conv1d(
channels,
channels,
kernel_size,
1,
dilation=dilation[1],
padding=get_padding(kernel_size, dilation[1]),
)
),
weight_norm(
Conv1d(
channels,
channels,
kernel_size,
1,
dilation=dilation[2],
padding=get_padding(kernel_size, dilation[2]),
)
),
]
)
self.convs1.apply(init_weights)
self.convs2 = nn.ModuleList(
[
weight_norm(
Conv1d(
channels,
channels,
kernel_size,
1,
dilation=1,
padding=get_padding(kernel_size, 1),
)
),
weight_norm(
Conv1d(
channels,
channels,
kernel_size,
1,
dilation=1,
padding=get_padding(kernel_size, 1),
)
),
weight_norm(
Conv1d(
channels,
channels,
kernel_size,
1,
dilation=1,
padding=get_padding(kernel_size, 1),
)
),
]
)
self.convs2.apply(init_weights)
def forward(self, x, x_mask=None):
for c1, c2 in zip(self.convs1, self.convs2):
xt = F.leaky_relu(x, LRELU_SLOPE)
if x_mask is not None:
xt = xt * x_mask
xt = c1(xt)
xt = F.leaky_relu(xt, LRELU_SLOPE)
if x_mask is not None:
xt = xt * x_mask
xt = c2(xt)
x = xt + x
if x_mask is not None:
x = x * x_mask
return x
def remove_weight_norm(self):
for l in self.convs1:
remove_weight_norm(l)
for l in self.convs2:
remove_weight_norm(l)
class ResBlock2(torch.nn.Module):
def __init__(self, channels, kernel_size=3, dilation=(1, 3)):
super(ResBlock2, self).__init__()
self.convs = nn.ModuleList(
[
weight_norm(
Conv1d(
channels,
channels,
kernel_size,
1,
dilation=dilation[0],
padding=get_padding(kernel_size, dilation[0]),
)
),
weight_norm(
Conv1d(
channels,
channels,
kernel_size,
1,
dilation=dilation[1],
padding=get_padding(kernel_size, dilation[1]),
)
),
]
)
self.convs.apply(init_weights)
def forward(self, x, x_mask=None):
for c in self.convs:
xt = F.leaky_relu(x, LRELU_SLOPE)
if x_mask is not None:
xt = xt * x_mask
xt = c(xt)
x = xt + x
if x_mask is not None:
x = x * x_mask
return x
def remove_weight_norm(self):
for l in self.convs:
remove_weight_norm(l)
class Log(nn.Module):
def forward(self, x, x_mask, reverse=False, **kwargs):
if not reverse:
y = torch.log(torch.clamp_min(x, 1e-5)) * x_mask
logdet = torch.sum(-y, [1, 2])
return y, logdet
else:
x = torch.exp(x) * x_mask
return x
class Flip(nn.Module):
def forward(self, x, *args, reverse=False, **kwargs):
x = torch.flip(x, [1])
if not reverse:
logdet = torch.zeros(x.size(0)).to(dtype=x.dtype, device=x.device)
return x, logdet
else:
return x
class ElementwiseAffine(nn.Module):
def __init__(self, channels):
super().__init__()
self.channels = channels
self.m = nn.Parameter(torch.zeros(channels, 1))
self.logs = nn.Parameter(torch.zeros(channels, 1))
def forward(self, x, x_mask, reverse=False, **kwargs):
if not reverse:
y = self.m + torch.exp(self.logs) * x
y = y * x_mask
logdet = torch.sum(self.logs * x_mask, [1, 2])
return y, logdet
else:
x = (x - self.m) * torch.exp(-self.logs) * x_mask
return x
class ResidualCouplingLayer(nn.Module):
def __init__(
self,
channels,
hidden_channels,
kernel_size,
dilation_rate,
n_layers,
p_dropout=0,
gin_channels=0,
mean_only=False,
):
assert channels % 2 == 0, "channels should be divisible by 2"
super().__init__()
self.channels = channels
self.hidden_channels = hidden_channels
self.kernel_size = kernel_size
self.dilation_rate = dilation_rate
self.n_layers = n_layers
self.half_channels = channels // 2
self.mean_only = mean_only
self.pre = nn.Conv1d(self.half_channels, hidden_channels, 1)
self.enc = WN(
hidden_channels,
kernel_size,
dilation_rate,
n_layers,
p_dropout=p_dropout,
gin_channels=gin_channels,
)
self.post = nn.Conv1d(hidden_channels, self.half_channels * (2 - mean_only), 1)
self.post.weight.data.zero_()
self.post.bias.data.zero_()
def forward(self, x, x_mask, g=None, reverse=False):
x0, x1 = torch.split(x, [self.half_channels] * 2, 1)
h = self.pre(x0) * x_mask
h = self.enc(h, x_mask, g=g)
stats = self.post(h) * x_mask
if not self.mean_only:
m, logs = torch.split(stats, [self.half_channels] * 2, 1)
else:
m = stats
logs = torch.zeros_like(m)
if not reverse:
x1 = m + x1 * torch.exp(logs) * x_mask
x = torch.cat([x0, x1], 1)
logdet = torch.sum(logs, [1, 2])
return x, logdet
else:
x1 = (x1 - m) * torch.exp(-logs) * x_mask
x = torch.cat([x0, x1], 1)
return x
class ConvFlow(nn.Module):
def __init__(
self,
in_channels,
filter_channels,
kernel_size,
n_layers,
num_bins=10,
tail_bound=5.0,
):
super().__init__()
self.in_channels = in_channels
self.filter_channels = filter_channels
self.kernel_size = kernel_size
self.n_layers = n_layers
self.num_bins = num_bins
self.tail_bound = tail_bound
self.half_channels = in_channels // 2
self.pre = nn.Conv1d(self.half_channels, filter_channels, 1)
self.convs = DDSConv(filter_channels, kernel_size, n_layers, p_dropout=0.0)
self.proj = nn.Conv1d(
filter_channels, self.half_channels * (num_bins * 3 - 1), 1
)
self.proj.weight.data.zero_()
self.proj.bias.data.zero_()
def forward(self, x, x_mask, g=None, reverse=False):
x0, x1 = torch.split(x, [self.half_channels] * 2, 1)
h = self.pre(x0)
h = self.convs(h, x_mask, g=g)
h = self.proj(h) * x_mask
b, c, t = x0.shape
h = h.reshape(b, c, -1, t).permute(0, 1, 3, 2) # [b, cx?, t] -> [b, c, t, ?]
unnormalized_widths = h[..., : self.num_bins] / math.sqrt(self.filter_channels)
unnormalized_heights = h[..., self.num_bins : 2 * self.num_bins] / math.sqrt(
self.filter_channels
)
unnormalized_derivatives = h[..., 2 * self.num_bins :]
x1, logabsdet = piecewise_rational_quadratic_transform(
x1,
unnormalized_widths,
unnormalized_heights,
unnormalized_derivatives,
inverse=reverse,
tails="linear",
tail_bound=self.tail_bound,
)
x = torch.cat([x0, x1], 1) * x_mask
logdet = torch.sum(logabsdet * x_mask, [1, 2])
if not reverse:
return x, logdet
else:
return x
class TransformerCouplingLayer(nn.Module):
def __init__(
self,
channels,
hidden_channels,
kernel_size,
n_layers,
n_heads,
p_dropout=0,
filter_channels=0,
mean_only=False,
wn_sharing_parameter=None,
gin_channels=0,
):
assert n_layers == 3, n_layers
assert channels % 2 == 0, "channels should be divisible by 2"
super().__init__()
self.channels = channels
self.hidden_channels = hidden_channels
self.kernel_size = kernel_size
self.n_layers = n_layers
self.half_channels = channels // 2
self.mean_only = mean_only
self.pre = nn.Conv1d(self.half_channels, hidden_channels, 1)
self.enc = (
Encoder(
hidden_channels,
filter_channels,
n_heads,
n_layers,
kernel_size,
p_dropout,
isflow=True,
gin_channels=gin_channels,
)
if wn_sharing_parameter is None
else wn_sharing_parameter
)
self.post = nn.Conv1d(hidden_channels, self.half_channels * (2 - mean_only), 1)
self.post.weight.data.zero_()
self.post.bias.data.zero_()
def forward(self, x, x_mask, g=None, reverse=False):
x0, x1 = torch.split(x, [self.half_channels] * 2, 1)
h = self.pre(x0) * x_mask
h = self.enc(h, x_mask, g=g)
stats = self.post(h) * x_mask
if not self.mean_only:
m, logs = torch.split(stats, [self.half_channels] * 2, 1)
else:
m = stats
logs = torch.zeros_like(m)
if not reverse:
x1 = m + x1 * torch.exp(logs) * x_mask
x = torch.cat([x0, x1], 1)
logdet = torch.sum(logs, [1, 2])
return x, logdet
else:
x1 = (x1 - m) * torch.exp(-logs) * x_mask
x = torch.cat([x0, x1], 1)
return x
x1, logabsdet = piecewise_rational_quadratic_transform(
x1,
unnormalized_widths,
unnormalized_heights,
unnormalized_derivatives,
inverse=reverse,
tails="linear",
tail_bound=self.tail_bound,
)
x = torch.cat([x0, x1], 1) * x_mask
logdet = torch.sum(logabsdet * x_mask, [1, 2])
if not reverse:
return x, logdet
else:
return x
================================================
FILE: melo/monotonic_align/__init__.py
================================================
from numpy import zeros, int32, float32
from torch import from_numpy
from .core import maximum_path_jit
def maximum_path(neg_cent, mask):
device = neg_cent.device
dtype = neg_cent.dtype
neg_cent = neg_cent.data.cpu().numpy().astype(float32)
path = zeros(neg_cent.shape, dtype=int32)
t_t_max = mask.sum(1)[:, 0].data.cpu().numpy().astype(int32)
t_s_max = mask.sum(2)[:, 0].data.cpu().numpy().astype(int32)
maximum_path_jit(path, neg_cent, t_t_max, t_s_max)
return from_numpy(path).to(device=device, dtype=dtype)
================================================
FILE: melo/monotonic_align/core.py
================================================
import numba
@numba.jit(
numba.void(
numba.int32[:, :, ::1],
numba.float32[:, :, ::1],
numba.int32[::1],
numba.int32[::1],
),
nopython=True,
nogil=True,
)
def maximum_path_jit(paths, values, t_ys, t_xs):
b = paths.shape[0]
max_neg_val = -1e9
for i in range(int(b)):
path = paths[i]
value = values[i]
t_y = t_ys[i]
t_x = t_xs[i]
v_prev = v_cur = 0.0
index = t_x - 1
for y in range(t_y):
for x in range(max(0, t_x + y - t_y), min(t_x, y + 1)):
if x == y:
v_cur = max_neg_val
else:
v_cur = value[y - 1, x]
if x == 0:
if y == 0:
v_prev = 0.0
else:
v_prev = max_neg_val
else:
v_prev = value[y - 1, x - 1]
value[y, x] += max(v_prev, v_cur)
for y in range(t_y - 1, -1, -1):
path[y, index] = 1
if index != 0 and (
index == y or value[y - 1, index] < value[y - 1, index - 1]
):
index = index - 1
================================================
FILE: melo/preprocess_text.py
================================================
import json
from collections import defaultdict
from random import shuffle
from typing import Optional
from tqdm import tqdm
import click
from text.cleaner import clean_text_bert
import os
import torch
from text.symbols import symbols, num_languages, num_tones
@click.command()
@click.option(
"--metadata",
default="data/example/metadata.list",
type=click.Path(exists=True, file_okay=True, dir_okay=False),
)
@click.option("--cleaned-path", default=None)
@click.option("--train-path", default=None)
@click.option("--val-path", default=None)
@click.option(
"--config_path",
default="configs/config.json",
type=click.Path(exists=True, file_okay=True, dir_okay=False),
)
@click.option("--val-per-spk", default=4)
@click.option("--max-val-total", default=8)
@click.option("--clean/--no-clean", default=True)
def main(
metadata: str,
cleaned_path: Optional[str],
train_path: str,
val_path: str,
config_path: str,
val_per_spk: int,
max_val_total: int,
clean: bool,
):
if train_path is None:
train_path = os.path.join(os.path.dirname(metadata), 'train.list')
if val_path is None:
val_path = os.path.join(os.path.dirname(metadata), 'val.list')
out_config_path = os.path.join(os.path.dirname(metadata), 'config.json')
if cleaned_path is None:
cleaned_path = metadata + ".cleaned"
if clean:
out_file = open(cleaned_path, "w", encoding="utf-8")
new_symbols = []
for line in tqdm(open(metadata, encoding="utf-8").readlines()):
try:
utt, spk, language, text = line.strip().split("|")
norm_text, phones, tones, word2ph, bert = clean_text_bert(text, language, device='cuda:0')
for ph in phones:
if ph not in symbols and ph not in new_symbols:
new_symbols.append(ph)
print('update!, now symbols:')
print(new_symbols)
with open(f'{language}_symbol.txt', 'w') as f:
f.write(f'{new_symbols}')
assert len(phones) == len(tones)
assert len(phones) == sum(word2ph)
out_file.write(
"{}|{}|{}|{}|{}|{}|{}\n".format(
utt,
spk,
language,
norm_text,
" ".join(phones),
" ".join([str(i) for i in tones]),
" ".join([str(i) for i in word2ph]),
)
)
bert_path = utt.replace(".wav", ".bert.pt")
os.makedirs(os.path.dirname(bert_path), exist_ok=True)
torch.save(bert.cpu(), bert_path)
except Exception as error:
print("err!", line, error)
out_file.close()
metadata = cleaned_path
spk_utt_map = defaultdict(list)
spk_id_map = {}
current_sid = 0
with open(metadata, encoding="utf-8") as f:
for line in f.readlines():
utt, spk, language, text, phones, tones, word2ph = line.strip().split("|")
spk_utt_map[spk].append(line)
if spk not in spk_id_map.keys():
spk_id_map[spk] = current_sid
current_sid += 1
train_list = []
val_list = []
for spk, utts in spk_utt_map.items():
shuffle(utts)
val_list += utts[:val_per_spk]
train_list += utts[val_per_spk:]
if len(val_list) > max_val_total:
train_list += val_list[max_val_total:]
val_list = val_list[:max_val_total]
with open(train_path, "w", encoding="utf-8") as f:
for line in train_list:
f.write(line)
with open(val_path, "w", encoding="utf-8") as f:
for line in val_list:
f.write(line)
config = json.load(open(config_path, encoding="utf-8"))
config["data"]["spk2id"] = spk_id_map
config["data"]["training_files"] = train_path
config["data"]["validation_files"] = val_path
config["data"]["n_speakers"] = len(spk_id_map)
config["num_languages"] = num_languages
config["num_tones"] = num_tones
config["symbols"] = symbols
with open(out_config_path, "w", encoding="utf-8") as f:
json.dump(config, f, indent=2, ensure_ascii=False)
if __name__ == "__main__":
main()
================================================
FILE: melo/split_utils.py
================================================
import re
import os
import glob
import numpy as np
import soundfile as sf
import torchaudio
import re
def split_sentence(text, min_len=10, language_str='EN'):
if language_str in ['EN', 'FR', 'ES', 'SP']:
sentences = split_sentences_latin(text, min_len=min_len)
else:
sentences = split_sentences_zh(text, min_len=min_len)
return sentences
def split_sentences_latin(text, min_len=10):
text = re.sub('[。!?;]', '.', text)
text = re.sub('[,]', ',', text)
text = re.sub('[“”]', '"', text)
text = re.sub('[‘’]', "'", text)
text = re.sub(r"[\<\>\(\)\[\]\"\«\»]+", "", text)
return [item.strip() for item in txtsplit(text, 256, 512) if item.strip()]
def split_sentences_zh(text, min_len=10):
text = re.sub('[。!?;]', '.', text)
text = re.sub('[,]', ',', text)
# 将文本中的换行符、空格和制表符替换为空格
text = re.sub('[\n\t ]+', ' ', text)
# 在标点符号后添加一个空格
text = re.sub('([,.!?;])', r'\1 $#!', text)
# 分隔句子并去除前后空格
# sentences = [s.strip() for s in re.split('(。|!|?|;)', text)]
sentences = [s.strip() for s in text.split('$#!')]
if len(sentences[-1]) == 0: del sentences[-1]
new_sentences = []
new_sent = []
count_len = 0
for ind, sent in enumerate(sentences):
new_sent.append(sent)
count_len += len(sent)
if count_len > min_len or ind == len(sentences) - 1:
count_len = 0
new_sentences.append(' '.join(new_sent))
new_sent = []
return merge_short_sentences_zh(new_sentences)
def merge_short_sentences_en(sens):
"""Avoid short sentences by merging them with the following sentence.
Args:
List[str]: list of input sentences.
Returns:
List[str]: list of output sentences.
"""
sens_out = []
for s in sens:
# If the previous sentense is too short, merge them with
# the current sentence.
if len(sens_out) > 0 and len(sens_out[-1].split(" ")) <= 2:
sens_out[-1] = sens_out[-1] + " " + s
else:
sens_out.append(s)
try:
if len(sens_out[-1].split(" ")) <= 2:
sens_out[-2] = sens_out[-2] + " " + sens_out[-1]
sens_out.pop(-1)
except:
pass
return sens_out
def merge_short_sentences_zh(sens):
# return sens
"""Avoid short sentences by merging them with the following sentence.
Args:
List[str]: list of input sentences.
Returns:
List[str]: list of output sentences.
"""
sens_out = []
for s in sens:
# If the previous sentense is too short, merge them with
# the current sentence.
if len(sens_out) > 0 and len(sens_out[-1]) <= 2:
sens_out[-1] = sens_out[-1] + " " + s
else:
sens_out.append(s)
try:
if len(sens_out[-1]) <= 2:
sens_out[-2] = sens_out[-2] + " " + sens_out[-1]
sens_out.pop(-1)
except:
pass
return sens_out
def txtsplit(text, desired_length=100, max_length=200):
"""Split text it into chunks of a desired length trying to keep sentences intact."""
text = re.sub(r'\n\n+', '\n', text)
text = re.sub(r'\s+', ' ', text)
text = re.sub(r'[""]', '"', text)
text = re.sub(r'([,.?!])', r'\1 ', text)
text = re.sub(r'\s+', ' ', text)
rv = []
in_quote = False
current = ""
split_pos = []
pos = -1
end_pos = len(text) - 1
def seek(delta):
nonlocal pos, in_quote, current
is_neg = delta < 0
for _ in range(abs(delta)):
if is_neg:
pos -= 1
current = current[:-1]
else:
pos += 1
current += text[pos]
if text[pos] == '"':
in_quote = not in_quote
return text[pos]
def peek(delta):
p = pos + delta
return text[p] if p < end_pos and p >= 0 else ""
def commit():
nonlocal rv, current, split_pos
rv.append(current)
current = ""
split_pos = []
while pos < end_pos:
c = seek(1)
if len(current) >= max_length:
if len(split_pos) > 0 and len(current) > (desired_length / 2):
d = pos - split_pos[-1]
seek(-d)
else:
while c not in '!?.\n ' and pos > 0 and len(current) > desired_length:
c = seek(-1)
commit()
elif not in_quote and (c in '!?\n' or (c in '.,' and peek(1) in '\n ')):
while pos < len(text) - 1 and len(current) < max_length and peek(1) in '!?.':
c = seek(1)
split_pos.append(pos)
if len(current) >= desired_length:
commit()
elif in_quote and peek(1) == '"' and peek(2) in '\n ':
seek(2)
split_pos.append(pos)
rv.append(current)
rv = [s.strip() for s in rv]
rv = [s for s in rv if len(s) > 0 and not re.match(r'^[\s\.,;:!?]*$', s)]
return rv
if __name__ == '__main__':
zh_text = "好的,我来给你讲一个故事吧。从前有一个小姑娘,她叫做小红。小红非常喜欢在森林里玩耍,她经常会和她的小伙伴们一起去探险。有一天,小红和她的小伙伴们走到了森林深处,突然遇到了一只凶猛的野兽。小红的小伙伴们都吓得不敢动弹,但是小红并没有被吓倒,她勇敢地走向野兽,用她的智慧和勇气成功地制服了野兽,保护了她的小伙伴们。从那以后,小红变得更加勇敢和自信,成为了她小伙伴们心中的英雄。"
en_text = "I didn’t know what to do. I said please kill her because it would be better than being kidnapped,” Ben, whose surname CNN is not using for security concerns, said on Wednesday. “It’s a nightmare. I said ‘please kill her, don’t take her there.’"
sp_text = "¡Claro! ¿En qué tema te gustaría que te hable en español? Puedo proporcionarte información o conversar contigo sobre una amplia variedad de temas, desde cultura y comida hasta viajes y tecnología. ¿Tienes alguna preferencia en particular?"
fr_text = "Bien sûr ! En quelle matière voudriez-vous que je vous parle en français ? Je peux vous fournir des informations ou discuter avec vous sur une grande variété de sujets, que ce soit la culture, la nourriture, les voyages ou la technologie. Avez-vous une préférence particulière ?"
print(split_sentence(zh_text, language_str='ZH'))
print(split_sentence(en_text, language_str='EN'))
print(split_sentence(sp_text, language_str='SP'))
print(split_sentence(fr_text, language_str='FR'))
================================================
FILE: melo/text/__init__.py
================================================
from .symbols import *
_symbol_to_id = {s: i for i, s in enumerate(symbols)}
def cleaned_text_to_sequence(cleaned_text, tones, language, symbol_to_id=None):
"""Converts a string of text to a sequence of IDs corresponding to the symbols in the text.
Args:
text: string to convert to a sequence
Returns:
List of integers corresponding to the symbols in the text
"""
symbol_to_id_map = symbol_to_id if symbol_to_id else _symbol_to_id
phones = [symbol_to_id_map[symbol] for symbol in cleaned_text]
tone_start = language_tone_start_map[language]
tones = [i + tone_start for i in tones]
lang_id = language_id_map[language]
lang_ids = [lang_id for i in phones]
return phones, tones, lang_ids
def get_bert(norm_text, word2ph, language, device):
from .chinese_bert import get_bert_feature as zh_bert
from .english_bert import get_bert_feature as en_bert
from .japanese_bert import get_bert_feature as jp_bert
from .chinese_mix import get_bert_feature as zh_mix_en_bert
from .spanish_bert import get_bert_feature as sp_bert
from .french_bert import get_bert_feature as fr_bert
from .korean import get_bert_feature as kr_bert
lang_bert_func_map = {"ZH": zh_bert, "EN": en_bert, "JP": jp_bert, 'ZH_MIX_EN': zh_mix_en_bert,
'FR': fr_bert, 'SP': sp_bert, 'ES': sp_bert, "KR": kr_bert}
bert = lang_bert_func_map[language](norm_text, word2ph, device)
return bert
================================================
FILE: melo/text/chinese.py
================================================
import os
import re
import cn2an
from pypinyin import lazy_pinyin, Style
from .symbols import punctuation
from .tone_sandhi import ToneSandhi
current_file_path = os.path.dirname(__file__)
pinyin_to_symbol_map = {
line.split("\t")[0]: line.strip().split("\t")[1]
for line in open(os.path.join(current_file_path, "opencpop-strict.txt")).readlines()
}
import jieba.posseg as psg
rep_map = {
":": ",",
";": ",",
",": ",",
"。": ".",
"!": "!",
"?": "?",
"\n": ".",
"·": ",",
"、": ",",
"...": "…",
"$": ".",
"“": "'",
"”": "'",
"‘": "'",
"’": "'",
"(": "'",
")": "'",
"(": "'",
")": "'",
"《": "'",
"》": "'",
"【": "'",
"】": "'",
"[": "'",
"]": "'",
"—": "-",
"~": "-",
"~": "-",
"「": "'",
"」": "'",
}
tone_modifier = ToneSandhi()
def replace_punctuation(text):
text = text.replace("嗯", "恩").replace("呣", "母")
pattern = re.compile("|".join(re.escape(p) for p in rep_map.keys()))
replaced_text = pattern.sub(lambda x: rep_map[x.group()], text)
replaced_text = re.sub(
r"[^\u4e00-\u9fa5" + "".join(punctuation) + r"]+", "", replaced_text
)
return replaced_text
def g2p(text):
pattern = r"(?<=[{0}])\s*".format("".join(punctuation))
sentences = [i for i in re.split(pattern, text) if i.strip() != ""]
phones, tones, word2ph = _g2p(sentences)
assert sum(word2ph) == len(phones)
assert len(word2ph) == len(text) # Sometimes it will crash,you can add a try-catch.
phones = ["_"] + phones + ["_"]
tones = [0] + tones + [0]
word2ph = [1] + word2ph + [1]
return phones, tones, word2ph
def _get_initials_finals(word):
initials = []
finals = []
orig_initials = lazy_pinyin(word, neutral_tone_with_five=True, style=Style.INITIALS)
orig_finals = lazy_pinyin(
word, neutral_tone_with_five=True, style=Style.FINALS_TONE3
)
for c, v in zip(orig_initials, orig_finals):
initials.append(c)
finals.append(v)
return initials, finals
def _g2p(segments):
phones_list = []
tones_list = []
word2ph = []
for seg in segments:
# Replace all English words in the sentence
seg = re.sub("[a-zA-Z]+", "", seg)
seg_cut = psg.lcut(seg)
initials = []
finals = []
seg_cut = tone_modifier.pre_merge_for_modify(seg_cut)
for word, pos in seg_cut:
if pos == "eng":
import pdb; pdb.set_trace()
continue
sub_initials, sub_finals = _get_initials_finals(word)
sub_finals = tone_modifier.modified_tone(word, pos, sub_finals)
initials.append(sub_initials)
finals.append(sub_finals)
# assert len(sub_initials) == len(sub_finals) == len(word)
initials = sum(initials, [])
finals = sum(finals, [])
#
for c, v in zip(initials, finals):
raw_pinyin = c + v
# NOTE: post process for pypinyin outputs
# we discriminate i, ii and iii
if c == v:
assert c in punctuation
phone = [c]
tone = "0"
word2ph.append(1)
else:
v_without_tone = v[:-1]
tone = v[-1]
pinyin = c + v_without_tone
assert tone in "12345"
if c:
# 多音节
v_rep_map = {
"uei": "ui",
"iou": "iu",
"uen": "un",
}
if v_without_tone in v_rep_map.keys():
pinyin = c + v_rep_map[v_without_tone]
else:
# 单音节
pinyin_rep_map = {
"ing": "ying",
"i": "yi",
"in": "yin",
"u": "wu",
}
if pinyin in pinyin_rep_map.keys():
pinyin = pinyin_rep_map[pinyin]
else:
single_rep_map = {
"v": "yu",
"e": "e",
"i": "y",
"u": "w",
}
if pinyin[0] in single_rep_map.keys():
pinyin = single_rep_map[pinyin[0]] + pinyin[1:]
assert pinyin in pinyin_to_symbol_map.keys(), (pinyin, seg, raw_pinyin)
phone = pinyin_to_symbol_map[pinyin].split(" ")
word2ph.append(len(phone))
phones_list += phone
tones_list += [int(tone)] * len(phone)
return phones_list, tones_list, word2ph
def text_normalize(text):
numbers = re.findall(r"\d+(?:\.?\d+)?", text)
for number in numbers:
text = text.replace(number, cn2an.an2cn(number), 1)
text = replace_punctuation(text)
return text
def get_bert_feature(text, word2ph, device=None):
from text import chinese_bert
return chinese_bert.get_bert_feature(text, word2ph, device=device)
if __name__ == "__main__":
from text.chinese_bert import get_bert_feature
text = "啊!chemistry 但是《原神》是由,米哈\游自主, [研发]的一款全.新开放世界.冒险游戏"
text = text_normalize(text)
print(text)
phones, tones, word2ph = g2p(text)
bert = get_bert_feature(text, word2ph)
print(phones, tones, word2ph, bert.shape)
# # 示例用法
# text = "这是一个示例文本:,你好!这是一个测试...."
# print(g2p_paddle(text)) # 输出: 这是一个示例文本你好这是一个测试
================================================
FILE: melo/text/chinese_bert.py
================================================
import torch
import sys
from transformers import AutoTokenizer, AutoModelForMaskedLM
# model_id = 'hfl/chinese-roberta-wwm-ext-large'
local_path = "./bert/chinese-roberta-wwm-ext-large"
tokenizers = {}
models = {}
def get_bert_feature(text, word2ph, device=None, model_id='hfl/chinese-roberta-wwm-ext-large'):
if model_id not in models:
models[model_id] = AutoModelForMaskedLM.from_pretrained(
model_id
).to(device)
tokenizers[model_id] = AutoTokenizer.from_pretrained(model_id)
model = models[model_id]
tokenizer = tokenizers[model_id]
if (
sys.platform == "darwin"
and torch.backends.mps.is_available()
and device == "cpu"
):
device = "mps"
if not device:
device = "cuda"
with torch.no_grad():
inputs = tokenizer(text, return_tensors="pt")
for i in inputs:
inputs[i] = inputs[i].to(device)
res = model(**inputs, output_hidden_states=True)
res = torch.cat(res["hidden_states"][-3:-2], -1)[0].cpu()
# import pdb; pdb.set_trace()
# assert len(word2ph) == len(text) + 2
word2phone = word2ph
phone_level_feature = []
for i in range(len(word2phone)):
repeat_feature = res[i].repeat(word2phone[i], 1)
phone_level_feature.append(repeat_feature)
phone_level_feature = torch.cat(phone_level_feature, dim=0)
return phone_level_feature.T
if __name__ == "__main__":
import torch
word_level_feature = torch.rand(38, 1024) # 12个词,每个词1024维特征
word2phone = [
1,
2,
1,
2,
2,
1,
2,
2,
1,
2,
2,
1,
2,
2,
2,
2,
2,
1,
1,
2,
2,
1,
2,
2,
2,
2,
1,
2,
2,
2,
2,
2,
1,
2,
2,
2,
2,
1,
]
# 计算总帧数
total_frames = sum(word2phone)
print(word_level_feature.shape)
print(word2phone)
phone_level_feature = []
for i in range(len(word2phone)):
print(word_level_feature[i].shape)
# 对每个词重复word2phone[i]次
repeat_feature = word_level_feature[i].repeat(word2phone[i], 1)
phone_level_feature.append(repeat_feature)
phone_level_feature = torch.cat(phone_level_feature, dim=0)
print(phone_level_feature.shape) # torch.Size([36, 1024])
================================================
FILE: melo/text/chinese_mix.py
================================================
import os
import re
import cn2an
from pypinyin import lazy_pinyin, Style
# from text.symbols import punctuation
from .symbols import language_tone_start_map
from .tone_sandhi import ToneSandhi
from .english import g2p as g2p_en
from transformers import AutoTokenizer
punctuation = ["!", "?", "…", ",", ".", "'", "-"]
current_file_path = os.path.dirname(__file__)
pinyin_to_symbol_map = {
line.split("\t")[0]: line.strip().split("\t")[1]
for line in open(os.path.join(current_file_path, "opencpop-strict.txt")).readlines()
}
import jieba.posseg as psg
rep_map = {
":": ",",
";": ",",
",": ",",
"。": ".",
"!": "!",
"?": "?",
"\n": ".",
"·": ",",
"、": ",",
"...": "…",
"$": ".",
"“": "'",
"”": "'",
"‘": "'",
"’": "'",
"(": "'",
")": "'",
"(": "'",
")": "'",
"《": "'",
"》": "'",
"【": "'",
"】": "'",
"[": "'",
"]": "'",
"—": "-",
"~": "-",
"~": "-",
"「": "'",
"」": "'",
}
tone_modifier = ToneSandhi()
def replace_punctuation(text):
text = text.replace("嗯", "恩").replace("呣", "母")
pattern = re.compile("|".join(re.escape(p) for p in rep_map.keys()))
replaced_text = pattern.sub(lambda x: rep_map[x.group()], text)
replaced_text = re.sub(r"[^\u4e00-\u9fa5_a-zA-Z\s" + "".join(punctuation) + r"]+", "", replaced_text)
replaced_text = re.sub(r"[\s]+", " ", replaced_text)
return replaced_text
def g2p(text, impl='v2'):
pattern = r"(?<=[{0}])\s*".format("".join(punctuation))
sentences = [i for i in re.split(pattern, text) if i.strip() != ""]
if impl == 'v1':
_func = _g2p
elif impl == 'v2':
_func = _g2p_v2
else:
raise NotImplementedError()
phones, tones, word2ph = _func(sentences)
assert sum(word2ph) == len(phones)
# assert len(word2ph) == len(text) # Sometimes it will crash,you can add a try-catch.
phones = ["_"] + phones + ["_"]
tones = [0] + tones + [0]
word2ph = [1] + word2ph + [1]
return phones, tones, word2ph
def _get_initials_finals(word):
initials = []
finals = []
orig_initials = lazy_pinyin(word, neutral_tone_with_five=True, style=Style.INITIALS)
orig_finals = lazy_pinyin(
word, neutral_tone_with_five=True, style=Style.FINALS_TONE3
)
for c, v in zip(orig_initials, orig_finals):
initials.append(c)
finals.append(v)
return initials, finals
model_id = 'bert-base-multilingual-uncased'
tokenizer = AutoTokenizer.from_pretrained(model_id)
def _g2p(segments):
phones_list = []
tones_list = []
word2ph = []
for seg in segments:
# Replace all English words in the sentence
# seg = re.sub("[a-zA-Z]+", "", seg)
seg_cut = psg.lcut(seg)
initials = []
finals = []
seg_cut = tone_modifier.pre_merge_for_modify(seg_cut)
for word, pos in seg_cut:
if pos == "eng":
initials.append(['EN_WORD'])
finals.append([word])
else:
sub_initials, sub_finals = _get_initials_finals(word)
sub_finals = tone_modifier.modified_tone(word, pos, sub_finals)
initials.append(sub_initials)
finals.append(sub_finals)
# assert len(sub_initials) == len(sub_finals) == len(word)
initials = sum(initials, [])
finals = sum(finals, [])
#
for c, v in zip(initials, finals):
if c == 'EN_WORD':
tokenized_en = tokenizer.tokenize(v)
phones_en, tones_en, word2ph_en = g2p_en(text=None, pad_start_end=False, tokenized=tokenized_en)
# apply offset to tones_en
tones_en = [t + language_tone_start_map['EN'] for t in tones_en]
phones_list += phones_en
tones_list += tones_en
word2ph += word2ph_en
else:
raw_pinyin = c + v
# NOTE: post process for pypinyin outputs
# we discriminate i, ii and iii
if c == v:
assert c in punctuation
phone = [c]
tone = "0"
word2ph.append(1)
else:
v_without_tone = v[:-1]
tone = v[-1]
pinyin = c + v_without_tone
assert tone in "12345"
if c:
# 多音节
v_rep_map = {
"uei": "ui",
"iou": "iu",
"uen": "un",
}
if v_without_tone in v_rep_map.keys():
pinyin = c + v_rep_map[v_without_tone]
else:
# 单音节
pinyin_rep_map = {
"ing": "ying",
"i": "yi",
"in": "yin",
"u": "wu",
}
if pinyin in pinyin_rep_map.keys():
pinyin = pinyin_rep_map[pinyin]
else:
single_rep_map = {
"v": "yu",
"e": "e",
"i": "y",
"u": "w",
}
if pinyin[0] in single_rep_map.keys():
pinyin = single_rep_map[pinyin[0]] + pinyin[1:]
assert pinyin in pinyin_to_symbol_map.keys(), (pinyin, seg, raw_pinyin)
phone = pinyin_to_symbol_map[pinyin].split(" ")
word2ph.append(len(phone))
phones_list += phone
tones_list += [int(tone)] * len(phone)
return phones_list, tones_list, word2ph
def text_normalize(text):
numbers = re.findall(r"\d+(?:\.?\d+)?", text)
for number in numbers:
text = text.replace(number, cn2an.an2cn(number), 1)
text = replace_punctuation(text)
return text
def get_bert_feature(text, word2ph, device):
from . import chinese_bert
return chinese_bert.get_bert_feature(text, word2ph, model_id='bert-base-multilingual-uncased', device=device)
from .chinese import _g2p as _chinese_g2p
def _g2p_v2(segments):
spliter = '#$&^!@'
phones_list = []
tones_list = []
word2ph = []
for text in segments:
assert spliter not in text
# replace all english words
text = re.sub('([a-zA-Z\s]+)', lambda x: f'{spliter}{x.group(1)}{spliter}', text)
texts = text.split(spliter)
texts = [t for t in texts if len(t) > 0]
for text in texts:
if re.match('[a-zA-Z\s]+', text):
# english
tokenized_en = tokenizer.tokenize(text)
phones_en, tones_en, word2ph_en = g2p_en(text=None, pad_start_end=False, tokenized=tokenized_en)
# apply offset to tones_en
tones_en = [t + language_tone_start_map['EN'] for t in tones_en]
phones_list += phones_en
tones_list += tones_en
word2ph += word2ph_en
else:
phones_zh, tones_zh, word2ph_zh = _chinese_g2p([text])
phones_list += phones_zh
tones_list += tones_zh
word2ph += word2ph_zh
return phones_list, tones_list, word2ph
if __name__ == "__main__":
# from text.chinese_bert import get_bert_feature
text = "NFT啊!chemistry 但是《原神》是由,米哈\游自主, [研发]的一款全.新开放世界.冒险游戏"
text = '我最近在学习machine learning,希望能够在未来的artificial intelligence领域有所建树。'
text = '今天下午,我们准备去shopping mall购物,然后晚上去看一场movie。'
text = '我们现在 also 能够 help 很多公司 use some machine learning 的 algorithms 啊!'
text = text_normalize(text)
print(text)
phones, tones, word2ph = g2p(text, impl='v2')
bert = get_bert_feature(text, word2ph, device='cuda:0')
print(phones)
import pdb; pdb.set_trace()
# # 示例用法
# text = "这是一个示例文本:,你好!这是一个测试...."
# print(g2p_paddle(text)) # 输出: 这是一个示例文本你好这是一个测试
================================================
FILE: melo/text/cleaner.py
================================================
from . import chinese, japanese, english, chinese_mix, korean, french, spanish
from . import cleaned_text_to_sequence
import copy
language_module_map = {"ZH": chinese, "JP": japanese, "EN": english, 'ZH_MIX_EN': chinese_mix, 'KR': korean,
'FR': french, 'SP': spanish, 'ES': spanish}
def clean_text(text, language):
language_module = language_module_map[language]
norm_text = language_module.text_normalize(text)
phones, tones, word2ph = language_module.g2p(norm_text)
return norm_text, phones, tones, word2ph
def clean_text_bert(text, language, device=None):
language_module = language_module_map[language]
norm_text = language_module.text_normalize(text)
phones, tones, word2ph = language_module.g2p(norm_text)
word2ph_bak = copy.deepcopy(word2ph)
for i in range(len(word2ph)):
word2ph[i] = word2ph[i] * 2
word2ph[0] += 1
bert = language_module.get_bert_feature(norm_text, word2ph, device=device)
return norm_text, phones, tones, word2ph_bak, bert
def text_to_sequence(text, language):
norm_text, phones, tones, word2ph = clean_text(text, language)
return cleaned_text_to_sequence(phones, tones, language)
if __name__ == "__main__":
pass
================================================
FILE: melo/text/cleaner_multiling.py
================================================
"""Set of default text cleaners"""
# TODO: pick the cleaner for languages dynamically
import re
# Regular expression matching whitespace:
_whitespace_re = re.compile(r"\s+")
rep_map = {
":": ",",
";": ",",
",": ",",
"。": ".",
"!": "!",
"?": "?",
"\n": ".",
"·": ",",
"、": ",",
"...": ".",
"…": ".",
"$": ".",
"“": "'",
"”": "'",
"‘": "'",
"’": "'",
"(": "'",
")": "'",
"(": "'",
")": "'",
"《": "'",
"》": "'",
"【": "'",
"】": "'",
"[": "'",
"]": "'",
"—": "",
"~": "-",
"~": "-",
"「": "'",
"」": "'",
}
def replace_punctuation(text):
pattern = re.compile("|".join(re.escape(p) for p in rep_map.keys()))
replaced_text = pattern.sub(lambda x: rep_map[x.group()], text)
return replaced_text
def lowercase(text):
return text.lower()
def collapse_whitespace(text):
return re.sub(_whitespace_re, " ", text).strip()
def remove_punctuation_at_begin(text):
return re.sub(r'^[,.!?]+', '', text)
def remove_aux_symbols(text):
text = re.sub(r"[\<\>\(\)\[\]\"\«\»\']+", "", text)
return text
def replace_symbols(text, lang="en"):
"""Replace symbols based on the lenguage tag.
Args:
text:
Input text.
lang:
Lenguage identifier. ex: "en", "fr", "pt", "ca".
Returns:
The modified text
example:
input args:
text: "si l'avi cau, diguem-ho"
lang: "ca"
Output:
text: "si lavi cau, diguemho"
"""
text = text.replace(";", ",")
text = text.replace("-", " ") if lang != "ca" else text.replace("-", "")
text = text.replace(":", ",")
if lang == "en":
text = text.replace("&", " and ")
elif lang == "fr":
text = text.replace("&", " et ")
elif lang == "pt":
text = text.replace("&", " e ")
elif lang == "ca":
text = text.replace("&", " i ")
text = text.replace("'", "")
elif lang== "es":
text=text.replace("&","y")
text = text.replace("'", "")
return text
def unicleaners(text, cased=False, lang='en'):
"""Basic pipeline for Portuguese text. There is no need to expand abbreviation and
numbers, phonemizer already does that"""
if not cased:
text = lowercase(text)
text = replace_punctuation(text)
text = replace_symbols(text, lang=lang)
text = remove_aux_symbols(text)
text = remove_punctuation_at_begin(text)
text = collapse_whitespace(text)
text = re.sub(r'([^\.,!\?\-…])$', r'\1.', text)
return text
================================================
FILE: melo/text/cmudict.rep
================================================
## Date: August 8, 1998
##
## The Carnegie Mellon Pronouncing Dictionary [cmudict.0.6] is Copyright 1998
## by Carnegie Mellon University. Use of this dictionary, for any research or
## commercial purpose, is completely unrestricted. If you make use of or
## redistribute this material, we would appreciate acknowlegement of its
## origin.
##
## cmudict.0.6 is the fifth release of cmudict, first released as cmudict.0.1
## in September of 1993. There was no generally available public release
## of version 0.5.
##
## See the README in this directory before you use this dictionary.
##
## Thanks to Bill Huggins at BBN; Bill Fisher at NIST; Alex Hauptman,
## Alex Rudnicky, Jack Mostow, Roni Rosenfeld, Richard Stern,
## Matthew Siegler, Kevin Lenzo, Maxine Eskenazi, Mosur Ravishankar,
## Eric Thayer, Kristie Seymore, and Raj Reddy at CMU; Lin Chase at
## LIMSI; Doug Paul at MIT Lincoln Labs; Ben Serridge at MIT SLS; Murray
## Spiegel at Bellcore; Tony Robinson at Cambridge UK; David Bowness of
## CAE Electronics Ltd. and CRIM; Stephen Hocking; Jerry Quinn at BNR
## Canada, and Marshal Midden for bringing to our attention problems and
## inadequacies with the first releases. Most special thanks to Bob Weide
## for all his work on prior versions of the dictionary.
##
## We welcome input from users and will continue to acknowledge such input
## in subsequent releases. If I failed to acknowledge your input in this
## release, please remind me and I will update these comments. If I failed to
## fix things that you brought to my attention, please remind me and have
## patience. If I actually fixed things that you brought to my attention and
## you appreciate it, I wouldn't mind a pat on the back.
##
## This version differs from previous releases of cmudict most significantly
## in the addition of new words from the common ARPA tasks for 1996 and 1997.
##
## There are undoubtedly still errors and inconsistencies in this dictionary
## so keep your eyes open for problems and mail them to me.
##
## We hope this dictionary is an improvement over cmudict.0.4.
##
## email: cmudict@cs.cmu.edu
## web: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
## ftp: ftp://ftp.cs.cmu.edu/project/speech/dict/
##
## Thank you for your continued interest in the CMU Pronouncing
## Dictionary. Further addictions and improvements are planned
## for forthcoming releases.
##
!EXCLAMATION-POINT EH2 K - S K L AH0 - M EY1 - SH AH0 N - P OY2 N T
"CLOSE-QUOTE K L OW1 Z - K W OW1 T
"DOUBLE-QUOTE D AH1 - B AH0 L - K W OW1 T
"END-OF-QUOTE EH1 N - D AH0 V - K W OW1 T
"END-QUOTE EH1 N D - K W OW1 T
"IN-QUOTES IH1 N - K W OW1 T S
"QUOTE K W OW1 T
"UNQUOTE AH1 N - K W OW1 T
#SHARP-SIGN SH AA1 R P - S AY1 N
%PERCENT P ER0 - S EH1 N T
&ERSAND AE1 M - P ER0 - S AE2 N D
'CAUSE K AH0 Z
'COURSE K AO1 R S
'EM AH0 M
'END-INNER-QUOTE EH1 N - D IH1 - N ER0 - K W OW1 T
'END-QUOTE EH1 N D - K W OW1 T
'INNER-QUOTE IH1 - N ER0 - K W OW1 T
'M AH0 M
'N AH0 N
'QUOTE K W OW1 T
'S EH1 S
'SINGLE-QUOTE S IH1 NG - G AH0 L - K W OW1 T
'TIL T IH1 L
'TIS T IH1 Z
'TWAS T W AH1 Z
(BEGIN-PARENS B IH0 - G IH1 N - P ER0 - EH1 N Z
(IN-PARENTHESES IH1 N - P ER0 - EH1 N - TH AH0 - S IY2 Z
(LEFT-PAREN L EH1 F T - P ER0 - EH1 N
(OPEN-PARENTHESES OW1 - P AH0 N - P ER0 - EH1 N - TH AH0 - S IY2 Z
(PAREN P ER0 - EH1 N
(PARENS P ER0 - EH1 N Z
(PARENTHESES P ER0 - EH1 N - TH AH0 - S IY2 Z
)CLOSE-PAREN K L OW1 Z - P ER0 - EH1 N
)CLOSE-PARENTHESES K L OW1 Z - P ER0 - EH1 N - TH AH0 - S IY2 Z
)END-PAREN EH1 N D - P ER0 - EH1 N
)END-PARENS EH1 N D - P ER0 - EH1 N Z
)END-PARENTHESES EH1 N D - P ER0 - EH1 N - TH AH0 - S IY2 Z
)END-THE-PAREN EH1 N D - DH AH0 - P ER0 - EH1 N
)PAREN P ER0 - EH1 N
)PARENS P ER0 - EH1 N Z
)RIGHT-PAREN R AY1 T - P ER0 - EH1 N
)RIGHT-PAREN(2) R AY1 T - P EH1 - R AH0 N
)UN-PARENTHESES AH1 N - P ER0 - EH1 N - TH AH0 - S IY1 Z
,COMMA K AA1 - M AH0
-DASH D AE1 SH
-HYPHEN HH AY1 - F AH0 N
...ELLIPSIS IH0 - L IH1 P - S IH0 S
.DECIMAL D EH1 - S AH0 - M AH0 L
.DOT D AA1 T
.FULL-STOP F UH1 L - S T AA1 P
.PERIOD P IH1 - R IY0 - AH0 D
.POINT P OY1 N T
/SLASH S L AE1 SH
0MALEFACTORS M AE1 - L AH0 - F AE2 K - T ER0 Z
:COLON K OW1 - L AH0 N
;SEMI-COLON S EH1 - M IY0 - K OW1 - L AH0 N
;SEMI-COLON(2) S EH1 - M IH0 - K OW2 - L AH0 N
?QUESTION-MARK K W EH1 S - CH AH0 N - M AA1 R K
A AH0
A'S EY1 Z
A(2) EY1
A. EY1
A.'S EY1 Z
A.S EY1 Z
A42128 EY1 - F AO1 R - T UW1 - W AH1 N - T UW1 - EY1 T
AAA T R IH2 - P AH0 - L EY1
AABERG AA1 - B ER0 G
AACHEN AA1 - K AH0 N
AAKER AA1 - K ER0
AALSETH AA1 L - S EH0 TH
AAMODT AA1 - M AH0 T
AANCOR AA1 N - K AO2 R
AARDEMA AA0 R - D EH1 - M AH0
AARDVARK AA1 R D - V AA2 R K
AARON EH1 - R AH0 N
AARON'S EH1 - R AH0 N Z
AARONS EH1 - R AH0 N Z
AARONSON EH1 - R AH0 N - S AH0 N
AARONSON'S EH1 - R AH0 N - S AH0 N Z
AARONSON'S(2) AA1 - R AH0 N - S AH0 N Z
AARONSON(2) AA1 - R AH0 N - S AH0 N
AARTI AA1 R - T IY2
AASE AA1 S
AASEN AA1 - S AH0 N
AB AE1 B
AB(2) EY1 - B IY1
ABABA AH0 - B AA1 - B AH0
ABABA(2) AA1 - B AH0 - B AH0
ABACHA AE1 - B AH0 - K AH0
ABACK AH0 - B AE1 K
ABACO AE1 - B AH0 - K OW2
ABACUS AE1 - B AH0 - K AH0 S
ABAD AH0 - B AA1 D
ABADAKA AH0 - B AE1 - D AH0 - K AH0
ABADI AH0 - B AE1 - D IY0
ABADIE AH0 - B AE1 - D IY0
ABAIR AH0 - B EH1 R
ABALKIN AH0 - B AA1 L - K IH0 N
ABALONE AE2 - B AH0 - L OW1 - N IY0
ABALOS AA0 - B AA1 - L OW0 Z
ABANDON AH0 - B AE1 N - D AH0 N
ABANDONED AH0 - B AE1 N - D AH0 N D
ABANDONING AH0 - B AE1 N - D AH0 - N IH0 NG
ABANDONMENT AH0 - B AE1 N - D AH0 N - M AH0 N T
ABANDONMENTS AH0 - B AE1 N - D AH0 N - M AH0 N T S
ABANDONS AH0 - B AE1 N - D AH0 N Z
ABANTO AH0 - B AE1 N - T OW0
ABARCA AH0 - B AA1 R - K AH0
ABARE AA0 - B AA1 - R IY0
ABASCAL AE1 - B AH0 S - K AH0 L
ABASH AH0 - B AE1 SH
ABASHED AH0 - B AE1 SH T
ABATE AH0 - B EY1 T
ABATED AH0 - B EY1 - T IH0 D
ABATEMENT AH0 - B EY1 T - M AH0 N T
ABATEMENTS AH0 - B EY1 T - M AH0 N T S
ABATES AH0 - B EY1 T S
ABATING AH0 - B EY1 - T IH0 NG
ABBA AE1 - B AH0
ABBADO AH0 - B AA1 - D OW0
ABBAS AH0 - B AA1 S
ABBASI AA0 - B AA1 - S IY0
ABBATE AA1 - B EY0 T
ABBATIELLO AA0 - B AA0 - T IY0 - EH1 - L OW0
ABBE AE1 - B IY0
ABBE(2) AE0 - B EY1
ABBENHAUS AE1 - B AH0 N - HH AW2 S
ABBETT AH0 - B EH1 T
ABBEVILLE AE1 B - V IH0 L
ABBEY AE1 - B IY0
ABBEY'S AE1 - B IY0 Z
ABBIE AE1 - B IY0
ABBITT AE1 - B IH0 T
ABBOT AE1 - B AH0 T
ABBOTT AE1 - B AH0 T
ABBOTT'S AE1 - B AH0 T S
ABBOUD AH0 - B UW1 D
ABBOUD(2) AH0 - B AW1 D
ABBREVIATE AH0 - B R IY1 - V IY0 - EY2 T
ABBREVIATED AH0 - B R IY1 - V IY0 - EY2 - T AH0 D
ABBREVIATED(2) AH0 - B R IY1 - V IY0 - EY2 - T IH0 D
ABBREVIATES AH0 - B R IY1 - V IY0 - EY2 T S
ABBREVIATING AH0 - B R IY1 - V IY0 - EY2 - T IH0 NG
ABBREVIATION AH0 - B R IY2 - V IY0 - EY1 - SH AH0 N
ABBREVIATIONS AH0 - B R IY2 - V IY0 - EY1 - SH AH0 N Z
ABBRUZZESE AA0 - B R UW0 T - S EY1 - Z IY0
ABBS AE1 B Z
ABBY AE1 - B IY0
ABCO AE1 B - K OW0
ABCOTEK AE1 B - K OW0 - T EH2 K
ABDALLA AE2 B - D AE1 - L AH0
ABDALLAH AE2 B - D AE1 - L AH0
ABDEL AE1 B - D EH2 L
ABDELLA AE2 B - D EH1 - L AH0
ABDICATE AE1 B - D AH0 - K EY2 T
ABDICATED AE1 B - D AH0 - K EY2 - T AH0 D
ABDICATES AE1 B - D AH0 - K EY2 T S
ABDICATING AE1 B - D IH0 - K EY2 - T IH0 NG
ABDICATION AE2 B - D IH0 - K EY1 - SH AH0 N
ABDNOR AE1 B D - N ER0
ABDO AE1 B - D OW0
ABDOLLAH AE2 B - D AA1 - L AH0
ABDOMEN AE0 B - D OW1 - M AH0 N
ABDOMEN(2) AE1 B - D AH0 - M AH0 N
ABDOMINAL AE0 B - D AA1 - M AH0 - N AH0 L
ABDOMINAL(2) AH0 B - D AA1 - M AH0 - N AH0 L
ABDUCT AE0 B - D AH1 K T
ABDUCTED AE0 B - D AH1 K - T IH0 D
ABDUCTED(2) AH0 B - D AH1 K - T IH0 D
ABDUCTEE AE0 B - D AH2 K - T IY1
ABDUCTEES AE0 B - D AH2 K - T IY1 Z
ABDUCTING AE0 B - D AH1 K - T IH0 NG
ABDUCTING(2) AH0 B - D AH1 K - T IH0 NG
ABDUCTION AE0 B - D AH1 K - SH AH0 N
ABDUCTION(2) AH0 B - D AH1 K - SH AH0 N
ABDUCTIONS AE0 B - D AH1 K - SH AH0 N Z
ABDUCTIONS(2) AH0 B - D AH1 K - SH AH0 N Z
ABDUCTOR AE0 B - D AH1 K - T ER0
ABDUCTOR(2) AH0 B - D AH1 K - T ER0
ABDUCTORS AE0 B - D AH1 K - T ER0 Z
ABDUCTORS(2) AH0 B - D AH1 K - T ER0 Z
ABDUCTS AE0 B - D AH1 K T S
ABDUL AE0 B - D UW1 L
ABDULAZIZ AE0 B - D UW2 - L AH0 - Z IY1 Z
ABDULLA AA0 B - D UW1 - L AH0
ABDULLAH AE2 B - D AH1 - L AH0
ABE EY1 B
ABED AH0 - B EH1 D
ABEDI AH0 - B EH1 - D IY0
ABEE AH0 - B IY1
ABEL EY1 - B AH0 L
ABELA AA0 - B EH1 - L AH0
ABELARD AE1 - B IH0 - L ER0 D
ABELE AH0 - B IY1 L
ABELES AH0 - B IY1 L Z
ABELES(2) EY1 - B AH0 - L IY2 Z
ABELL EY1 - B AH0 L
ABELLA AH0 - B EH1 - L AH0
ABELN AE1 - B IH0 L N
ABELOW AE1 - B AH0 - L OW0
ABELS EY1 - B AH0 L Z
ABELSON AE1 - B IH0 L - S AH0 N
ABEND AE1 - B EH0 N D
ABEND(2) AH0 - B EH1 N D
ABENDROTH AE1 - B IH0 N - D R AO0 TH
ABER EY1 - B ER0
ABERCROMBIE AE2 - B ER0 - K R AA1 M - B IY0
ABERDEEN AE1 - B ER0 - D IY2 N
ABERFORD EY1 - B ER0 - F ER0 D
ABERG AE1 - B ER0 G
ABERLE AE1 - B ER0 - AH0 L
ABERLE(2) AE1 - B ER0 L
ABERMIN AE1 - B ER0 - M IH0 N
ABERNATHY AE1 - B ER0 - N AE2 - TH IY0
ABERNETHY AE1 - B ER0 - N EH2 - TH IY0
ABERRANT AE0 - B EH1 - R AH0 N T
ABERRATION AE2 - B ER0 - EY1 - SH AH0 N
ABERRATIONAL AE2 - B ER0 - EY1 - SH AH0 - N AH0 L
ABERRATIONS AE2 - B ER0 - EY1 - SH AH0 N Z
ABERT AE1 - B ER0 T
ABET AH0 - B EH1 T
ABETTED AH0 - B EH1 - T IH0 D
ABETTING AH0 - B EH1 - T IH0 NG
ABEX EY1 - B EH0 K S
ABEYANCE AH0 - B EY1 - AH0 N S
ABEYTA AA0 - B EY1 - T AH0
ABHOR AE0 B - HH AO1 R
ABHORRED AH0 B - HH AO1 R D
ABHORRENCE AH0 B - HH AO1 - R AH0 N S
ABHORRENT AE0 B - HH AO1 - R AH0 N T
ABHORS AH0 B - HH AO1 R Z
ABID EY1 - B IH0 D
ABIDE AH0 - B AY1 D
ABIDED AH0 - B AY1 - D IH0 D
ABIDES AH0 - B AY1 D Z
ABIDING AH0 - B AY1 - D IH0 NG
ABIE AE1 - B IY0
ABIGAIL AE1 - B AH0 - G EY2 L
ABILA AA0 - B IY1 - L AH0
ABILENE AE1 - B IH0 - L IY2 N
ABILITIES AH0 - B IH1 - L AH0 - T IY0 Z
ABILITY AH0 - B IH1 - L AH0 - T IY0
ABINGTON AE1 - B IH0 NG - T AH0 N
ABIO AA1 - B IY0 - OW0
ABIOLA AA2 - B IY0 - OW1 - L AH0
ABIOLA'S AA2 - B IY0 - OW1 - L AH0 Z
ABIOMED EY0 - B IY1 - AH0 - M EH0 D
ABITIBI AE2 - B IH0 - T IY1 - B IY0
ABITZ AE1 - B IH0 T S
ABJECT AE1 B - JH EH0 K T
ABKHAZIA AE0 B K - HH AA1 - Z Y AH0
ABKHAZIA(2) AE0 B K - HH AE1 - Z Y AH0
ABKHAZIAN AE0 B K - HH AA1 - Z IY0 - AH0 N
ABKHAZIAN(2) AE0 B K - HH AE1 - Z IY0 - AH0 N
ABKHAZIAN(3) AE0 B K - HH AA1 - Z Y AH0 N
ABKHAZIAN(4) AE0 B K - HH AE1 - Z Y AH0 N
ABKHAZIANS AE0 B K - HH AA1 - Z IY0 - AH0 N Z
ABKHAZIANS(2) AE0 B K - HH AE1 - Z IY0 - AH0 N Z
ABLAZE AH0 - B L EY1 Z
ABLE EY1 - B AH0 L
ABLED EY1 - B AH0 L D
ABLER EY1 - B AH0 L - ER0
ABLER(2) EY1 - B L ER0
ABLES EY1 - B AH0 L Z
ABLEST EY1 - B AH0 L S T
ABLEST(2) EY1 - B L AH0 S T
ABLOOM AH0 - B L UW1 M
ABLY EY1 - B L IY0
ABNER AE1 B - N ER0
ABNEY AE1 B - N IY0
ABNORMAL AE0 B - N AO1 R - M AH0 L
ABNORMALITIES AE2 B - N AO0 R - M AE1 - L AH0 - T IY0 Z
ABNORMALITY AE2 B - N AO0 R - M AE1 - L AH0 - T IY0
ABNORMALLY AE0 B - N AO1 R - M AH0 - L IY0
ABO AA1 - B OW0
ABO'S AA1 - B OW0 Z
ABOARD AH0 - B AO1 R D
ABODE AH0 - B OW1 D
ABOLISH AH0 - B AA1 - L IH0 SH
ABOLISHED AH0 - B AA1 - L IH0 SH T
ABOLISHES AH0 - B AA1 - L IH0 - SH IH0 Z
ABOLISHING AH0 - B AA1 - L IH0 - SH IH0 NG
ABOLITION AE2 - B AH0 - L IH1 - SH AH0 N
ABOLITIONISM AE2 - B AH0 - L IH1 - SH AH0 - N IH2 - Z AH0 M
ABOLITIONIST AE2 - B AH0 - L IH1 - SH AH0 - N AH0 S T
ABOLITIONISTS AE2 - B AH0 - L IH1 - SH AH0 - N AH0 S T S
ABOLITIONISTS(2) AE2 - B AH0 - L IH1 - SH AH0 - N AH0 S S
ABOLITIONISTS(3) AE2 - B AH0 - L IH1 - SH AH0 - N AH0 S
ABOMINABLE AH0 - B AA1 - M AH0 - N AH0 - B AH0 L
ABOMINATION AH0 - B AA2 - M AH0 - N EY1 - SH AH0 N
ABOOD AH0 - B UW1 D
ABOODI AH0 - B UW1 - D IY0
ABORIGINAL AE2 - B ER0 - IH1 - JH AH0 - N AH0 L
ABORIGINE AE2 - B ER0 - IH1 - JH AH0 - N IY0
ABORIGINES AE2 - B ER0 - IH1 - JH AH0 - N IY0 Z
ABORN AH0 - B AO1 R N
ABORT AH0 - B AO1 R T
ABORTED AH0 - B AO1 R - T IH0 D
ABORTIFACIENT AH0 - B AO2 R - T AH0 - F EY1 - SH AH0 N T
ABORTIFACIENTS AH0 - B AO2 R - T AH0 - F EY1 - SH AH0 N T S
ABORTING AH0 - B AO1 R - T IH0 NG
ABORTION AH0 - B AO1 R - SH AH0 N
ABORTIONIST AH0 - B AO1 R - SH AH0 N - IH0 S T
ABORTIONISTS AH0 - B AO1 R - SH AH0 N - IH0 S T S
ABORTIONISTS(2) AH0 - B AO1 R - SH AH0 N - IH0 S S
ABORTIONISTS(3) AH0 - B AO1 R - SH AH0 N - IH0 S
ABORTIONS AH0 - B AO1 R - SH AH0 N Z
ABORTIVE AH0 - B AO1 R - T IH0 V
ABOTT AH0 - B AA1 T
ABOU AH0 - B UW1
ABOUD AA0 - B UW1 D
ABOUHALIMA AA2 - B UW0 - HH AA0 - L IY1 - M AH0
ABOUHALIMA'S AA2 - B UW0 - HH AA0 - L IY1 - M AH0 Z
ABOUND AH0 - B AW1 N D
ABOUNDED AH0 - B AW1 N - D IH0 D
ABOUNDING AH0 - B AW1 N - D IH0 NG
ABOUNDS AH0 - B AW1 N D Z
ABOUT AH0 - B AW1 T
ABOUT'S AH0 - B AW1 T S
ABOVE AH0 - B AH1 V
ABOVE'S AH0 - B AH1 V Z
ABOVEBOARD AH0 - B AH1 V - B AO2 R D
ABPLANALP AE1 B - P L AH0 - N AE0 L P
ABRA AA1 - B R AH0
ABRACADABRA AE2 - B R AH0 - K AH0 - D AE1 - B R AH0
ABRAHAM EY1 - B R AH0 - HH AE2 M
ABRAHAMIAN AE2 - B R AH0 - HH EY1 - M IY0 - AH0 N
ABRAHAMS EY1 - B R AH0 - HH AE2 M Z
ABRAHAMSEN AE0 - B R AH0 - HH AE1 M - S AH0 N
ABRAHAMSON AH0 - B R AE1 - HH AH0 M - S AH0 N
ABRAM AH0 - B R AE1 M
ABRAMCZYK AA1 - B R AH0 M - CH IH0 K
ABRAMO AA0 - B R AA1 - M OW0
ABRAMOVITZ AH0 - B R AA1 - M AH0 - V IH0 T S
ABRAMOWICZ AH0 - B R AA1 - M AH0 - V IH0 CH
ABRAMOWITZ AH0 - B R AA1 - M AH0 - W IH0 T S
ABRAMS EY1 - B R AH0 M Z
ABRAMSON EY1 - B R AH0 M - S AH0 N
ABRASION AH0 - B R EY1 - ZH AH0 N
ABRASIONS AH0 - B R EY1 - ZH AH0 N Z
ABRASIVE AH0 - B R EY1 - S IH0 V
ABRASIVES AH0 - B R EY1 - S IH0 V Z
ABREAST AH0 - B R EH1 S T
ABREGO AA0 - B R EH1 - G OW0
ABREU AH0 - B R UW1
ABRIDGE AH0 - B R IH1 JH
ABRIDGED AH0 - B R IH1 JH D
ABRIL AH0 - B R IH1 L
ABROAD AH0 - B R AO1 D
ABROGATE AE1 - B R AH0 - G EY2 T
ABROGATED AE1 - B R AH0 - G EY2 - T IH0 D
ABROGATING AE1 - B R AH0 - G EY2 - T IH0 NG
ABROGATION AE2 - B R AH0 - G EY1 - SH AH0 N
ABRON AH0 - B R AA1 N
ABRUPT AH0 - B R AH1 P T
ABRUPTLY AH0 - B R AH1 P T - L IY0
ABRUPTNESS AH0 - B R AH1 P T - N AH0 S
ABRUTYN EY1 - B R UW0 - T IH0 N
ABRUZZESE AA0 - B R UW0 T - S EY1 - Z IY0
ABRUZZO AA0 - B R UW1 - Z OW0
ABS EY1 - B IY1 - EH1 S
ABS(2) AE1 B Z
ABSALOM AE1 B - S AH0 - L AH0 M
ABSCAM AE1 B - S K AE0 M
ABSCESS AE1 B - S EH2 S
ABSENCE AE1 B - S AH0 N S
ABSENCES AE1 B - S AH0 N - S IH0 Z
ABSENT AE1 B - S AH0 N T
ABSENTEE AE2 B - S AH0 N - T IY1
ABSENTEEISM AE2 B - S AH0 N - T IY1 - IH0 - Z AH0 M
ABSENTEES AE2 B - S AH0 N - T IY1 Z
ABSENTIA AE0 B - S EH1 N - SH AH0
ABSHER AE1 B - SH ER0
ABSHIER AE1 B - SH IY0 - ER0
ABSHIRE AE1 B - SH AY2 R
ABSO AE1 B - S OW0
ABSOLOM AE1 B - S AH0 - L AH0 M
ABSOLUT AE2 B - S AH0 - L UW1 T
ABSOLUTE AE1 B - S AH0 - L UW2 T
ABSOLUTELY AE2 B - S AH0 - L UW1 T - L IY0
ABSOLUTENESS AE1 B - S AH0 - L UW2 T - N AH0 S
ABSOLUTES AE1 B - S AH0 - L UW2 T S
ABSOLUTION AE2 B - S AH0 - L UW1 - SH AH0 N
ABSOLUTISM AE1 B - S AH0 - L UW2 - T IH2 - Z AH0 M
ABSOLUTIST AE0 B - S IH0 - L UW1 - T IH0 S T
ABSOLVE AH0 B - Z AA1 L V
ABSOLVE(2) AE0 B - Z AA1 L V
ABSOLVED AH0 B - Z AA1 L V D
ABSOLVED(2) AE0 B - Z AA1 L V D
ABSOLVES AH0 B - Z AA1 L V Z
ABSOLVES(2) AE0 B - Z AA1 L V Z
ABSOLVING AH0 B - Z AA1 L - V IH0 NG
ABSOLVING(2) AE0 B - Z AA1 L - V IH0 NG
ABSORB AH0 B - Z AO1 R B
ABSORBED AH0 B - Z AO1 R B D
ABSORBENCY AH0 B - Z AO1 R - B AH0 N - S IY0
ABSORBENT AH0 B - Z AO1 R - B AH0 N T
ABSORBER AH0 B - Z AO1 R - B ER0
ABSORBERS AH0 B - Z AO1 R - B ER0 Z
ABSORBING AH0 B - Z AO1 R - B IH0 NG
ABSORBS AH0 B - Z AO1 R B Z
ABSORPTION AH0 B - Z AO1 R P - SH AH0 N
ABSORPTION(2) AH0 B - S AO1 R P - SH AH0 N
ABSTAIN AH0 B - S T EY1 N
ABSTAIN(2) AE0 B - S T EY1 N
ABSTAINED AH0 B - S T EY1 N D
ABSTAINED(2) AE0 B - S T EY1 N D
ABSTAINING AH0 B - S T EY1 - N IH0 NG
ABSTAINING(2) AE0 B - S T EY1 - N IH0 NG
ABSTENTION AH0 B - S T EH1 N - CH AH0 N
ABSTENTION(2) AE0 B - S T EH1 N - CH AH0 N
ABSTENTIONS AH0 B - S T EH1 N - CH AH0 N Z
ABSTENTIONS(2) AE0 B - S T EH1 N - CH AH0 N Z
ABSTINENCE AE1 B - S T AH0 - N AH0 N S
ABSTINENT AE1 B - S T AH0 - N AH0 N T
ABSTON AE1 B - S T AH0 N
ABSTRACT AE0 B - S T R AE1 K T
ABSTRACT(2) AE1 B - S T R AE2 K T
ABSTRACTED AE1 B - S T R AE2 K - T IH0 D
ABSTRACTION AE0 B - S T R AE1 K - SH AH0 N
ABSTRACTIONS AE0 B - S T R AE1 K - SH AH0 N Z
ABSTRACTS AE1 B - S T R AE0 K T S
ABSTRUSE AH0 B - S T R UW1 S
ABSURD AH0 B - S ER1 D
ABSURDIST AH0 B - S ER1 - D IH0 S T
ABSURDITIES AH0 B - S ER1 - D AH0 - T IY0 Z
ABSURDITY AH0 B - S ER1 - D AH0 - T IY0
ABSURDLY AH0 B - S ER1 D - L IY0
ABT AE1 B T
ABT(2) EY1 - B IY1 - T IY1
ABTS AE1 B T S
ABTS(2) EY1 - B IY1 - T IY1 Z
ABTS(3) EY1 - B IY1 - T IY1 - EH1 S
ABU AE1 - B UW0
ABUDRAHM AH0 - B AH1 - D R AH0 M
ABULADZE AE2 - B Y UW0 - L AE1 D - Z IY0
ABUNDANCE AH0 - B AH1 N - D AH0 N S
ABUNDANT AH0 - B AH1 N - D AH0 N T
ABUNDANTLY AH0 - B AH1 N - D AH0 N T - L IY0
ABURTO AH0 - B UH1 R - T OW2
ABURTO'S AH0 - B UH1 R - T OW2 Z
ABUSE AH0 - B Y UW1 S
ABUSE(2) AH0 - B Y UW1 Z
ABUSED AH0 - B Y UW1 Z D
ABUSER AH0 - B Y UW1 - Z ER0
ABUSERS AH0 - B Y UW1 - Z ER0 Z
ABUSES AH0 - B Y UW1 - S IH0 Z
ABUSES(2) AH0 - B Y UW1 - Z IH0 Z
ABUSING AH0 - B Y UW1 - Z IH0 NG
ABUSIVE AH0 - B Y UW1 - S IH0 V
ABUT AH0 - B AH1 T
ABUTS AH0 - B AH1 T S
ABUTTED AH0 - B AH1 - T AH0 D
ABUTTING AH0 - B AH1 - T IH0 NG
ABUZZ AH0 - B AH1 Z
ABYSMAL AH0 - B IH1 Z - M AH0 L
ABYSMALLY AH0 - B IH1 Z - M AH0 - L IY0
ABYSS AH0 - B IH1 S
ABZUG AE1 B - Z AH2 G
ABZUG(2) AE1 B - Z UH2 G
AC EY1 - S IY1
ACA AE1 - K AH0
ACACIA AH0 - K EY1 - SH AH0
ACADEME AE1 - K AH0 - D IY2 M
ACADEMIA AE2 - K AH0 - D IY1 - M IY0 - AH0
ACADEMIC AE2 - K AH0 - D EH1 - M IH0 K
ACADEMICALLY AE2 - K AH0 - D EH1 - M IH0 K - L IY0
ACADEMICIAN AE2 - K AH0 - D AH0 - M IH1 - SH AH0 N
ACADEMICIANS AE2 - K AH0 - D AH0 - M IH1 - SH AH0 N Z
ACADEMICIANS(2) AH0 - K AE2 - D AH0 - M IH1 - SH AH0 N Z
ACADEMICS AE2 - K AH0 - D EH1 - M IH0 K S
ACADEMIES AH0 - K AE1 - D AH0 - M IY0 Z
ACADEMY AH0 - K AE1 - D AH0 - M IY0
ACADEMY'S AH0 - K AE1 - D AH0 - M IY0 Z
ACADIA AH0 - K EY1 - D IY0 - AH0
ACAMPORA AH0 - K AE1 M - P ER0 - AH0
ACANTHA AA0 - K AA1 N - DH AH0
ACAPULCO AE2 - K AH0 - P UH1 L - K OW0
ACCARDI AA0 - K AA1 R - D IY0
ACCARDO AA0 - K AA1 R - D OW0
ACCEDE AE0 K - S IY1 D
ACCEDED AE0 K - S IY1 - D IH0 D
ACCEDES AE0 K - S IY1 D Z
ACCEDING AE0 K - S IY1 - D IH0 NG
ACCEL AH0 K - S EH1 L
ACCELERANT AE0 K - S EH1 - L ER0 - AH0 N T
ACCELERANTS AE0 K - S EH1 - L ER0 - AH0 N T S
ACCELERATE AE0 K - S EH1 - L ER0 - EY2 T
ACCELERATED AE0 K - S EH1 - L ER0 - EY2 - T IH0 D
ACCELERATES AE0 K - S EH1 - L ER0 - EY2 T S
ACCELERATING AE0 K - S EH1 - L ER0 - EY2 - T IH0 NG
ACCELERATION AE2 K - S EH2 - L ER0 - EY1 - SH AH0 N
ACCELERATOR AE0 K - S EH1 - L ER0 - EY2 - T ER0
ACCELEROMETER AE0 K - S EH2 - L ER0 - AA1 - M AH0 - T ER0
ACCELEROMETERS AE0 K - S EH2 - L ER0 - AA1 - M AH0 - T ER0 Z
ACCENT AH0 K - S EH1 N T
ACCENT(2) AE1 K - S EH2 N T
ACCENTED AE1 K - S EH0 N - T IH0 D
ACCENTING AE1 K - S EH0 N - T IH0 NG
ACCENTS AE1 K - S EH0 N T S
ACCENTUATE AE0 K - S EH1 N - CH UW0 - EY0 T
ACCENTUATED AE0 K - S EH1 N - CH AH0 W - EY2 - T IH0 D
ACCENTUATES AE0 K - S EH1 N - CH UW0 - EY0 T S
ACCENTUATING AE0 K - S EH1 N - CH AH0 W - EY2 - T IH0 NG
ACCEPT AE0 K - S EH1 P T
ACCEPT(2) AH0 K - S EH1 P T
ACCEPTABILITY AH0 K - S EH2 P - T AH0 - B IH1 - L AH0 - T IY0
ACCEPTABLE AE0 K - S EH1 P - T AH0 - B AH0 L
ACCEPTABLE(2) AH0 K - S EH1 P - T AH0 - B AH0 L
ACCEPTANCE AE0 K - S EH1 P - T AH0 N S
ACCEPTANCE(2) AH0 K - S EH1 P - T AH0 N S
ACCEPTANCES AE0 K - S EH1 P - T AH0 N - S IH0 Z
ACCEPTED AE0 K - S EH1 P - T IH0 D
ACCEPTED(2) AH0 K - S EH1 P - T AH0 D
ACCEPTING AE0 K - S EH1 P - T IH0 NG
ACCEPTING(2) AH0 K - S EH1 P - T IH0 NG
ACCEPTS AE0 K - S EH1 P T S
ACCESS AE1 K - S EH2 S
ACCESSED AE1 K - S EH2 S T
ACCESSIBILITY AE2 K - S EH0 - S AH0 - B IH1 - L IH0 - T IY0
ACCESSIBLE AE0 K - S EH1 - S AH0 - B AH0 L
ACCESSING AE1 K - S EH2 - S IH0 NG
ACCESSION AH0 K - S EH1 - SH AH0 N
ACCESSORIES AE0 K - S EH1 - S ER0 - IY0 Z
ACCESSORIZE AE0 K - S EH1 - S ER0 - AY2 Z
ACCESSORIZED AE0 K - S EH1 - S ER0 - AY2 Z D
ACCESSORY AE0 K - S EH1 - S ER0 - IY0
ACCETTA AA0 - CH EH1 - T AH0
ACCIDENT AE1 K - S AH0 - D AH0 N T
ACCIDENT'S AE1 K - S AH0 - D AH0 N T S
ACCIDENTAL AE2 K - S AH0 - D EH1 N - T AH0 L
ACCIDENTAL(2) AE2 K - S AH0 - D EH1 - N AH0 L
ACCIDENTALLY AE2 K - S AH0 - D EH1 N - T AH0 - L IY0
ACCIDENTALLY(2) AE2 K - S AH0 - D EH1 - N AH0 - L IY0
ACCIDENTLY AE1 K - S AH0 - D AH0 N T - L IY0
ACCIDENTS AE1 K - S AH0 - D AH0 N T S
ACCION AE1 - CH IY0 - AH0 N
ACCIVAL AE1 - S IH0 - V AA2 L
ACCLAIM AH0 - K L EY1 M
ACCLAIMED AH0 - K L EY1 M D
ACCLAIMING AH0 - K L EY1 - M IH0 NG
ACCLIMATE AE1 - K L AH0 - M EY2 T
ACCLIMATED AE1 - K L AH0 - M EY2 - T IH0 D
ACCLIMATION AE2 - K L AH0 - M EY1 - SH AH0 N
ACCO AE1 - K OW0
ACCOLA AA0 - K OW1 - L AH0
ACCOLADE AE1 - K AH0 - L EY2 D
ACCOLADES AE1 - K AH0 - L EY2 D Z
ACCOMANDO AA0 - K OW0 - M AA1 N - D OW0
ACCOMMODATE AH0 - K AA1 - M AH0 - D EY2 T
ACCOMMODATED AH0 - K AA1 - M AH0 - D EY2 - T AH0 D
ACCOMMODATES AH0 - K AA1 - M AH0 - D EY2 T S
ACCOMMODATING AH0 - K AA1 - M AH0 - D EY2 - T IH0 NG
ACCOMMODATION AH0 - K AA2 - M AH0 - D EY1 - SH AH0 N
ACCOMMODATIONS AH0 - K AA2 - M AH0 - D EY1 - SH AH0 N Z
ACCOMMODATIVE AH0 - K AA1 - M AH0 - D EY2 - T IH0 V
ACCOMPANIED AH0 - K AH1 M - P AH0 - N IY0 D
ACCOMPANIES AH0 - K AH1 M - P AH0 - N IY0 Z
ACCOMPANIMENT AH0 - K AH1 M P - N IH0 - M AH0 N T
ACCOMPANIMENT(2) AH0 - K AH1 M P - N IY0 - M AH0 N T
ACCOMPANIMENTS AH0 - K AH1 M P - N IH0 - M AH0 N T S
ACCOMPANIMENTS(2) AH0 - K AH1 M P - N IY0 - M AH0 N T S
ACCOMPANIST AH0 - K AH1 M - P AH0 - N AH0 S T
ACCOMPANY AH0 - K AH1 M - P AH0 - N IY0
ACCOMPANYING AH0 - K AH1 M - P AH0 - N IY0 - IH0 NG
ACCOMPLI AA2 - K AA1 M - P L IY0
ACCOMPLI(2) AH0 - K AA1 M - P L IY0
ACCOMPLICE AH0 - K AA1 M - P L AH0 S
ACCOMPLICES AH0 - K AA1 M - P L AH0 - S AH0 Z
ACCOMPLISH AH0 - K AA1 M - P L IH0 SH
ACCOMPLISHED AH0 - K AA1 M - P L IH0 SH T
ACCOMPLISHES AH0 - K AA1 M - P L IH0 - SH IH0 Z
ACCOMPLISHING AH0 - K AA1 M - P L IH0 - SH IH0 NG
ACCOMPLISHMENT AH0 - K AA1 M - P L IH0 SH - M AH0 N T
ACCOMPLISHMENTS AH0 - K AA1 M - P L IH0 SH - M AH0 N T S
ACCOR AE1 - K AO2 R
ACCOR'S AE1 - K ER0 Z
ACCORD AH0 - K AO1 R D
ACCORD'S AH0 - K AO1 R D Z
ACCORDANCE AH0 - K AO1 R - D AH0 N S
ACCORDED AH0 - K AO1 R - D IH0 D
ACCORDING AH0 - K AO1 R - D IH0 NG
ACCORDINGLY AH0 - K AO1 R - D IH0 NG - L IY0
ACCORDION AH0 - K AO1 R - D IY0 - AH0 N
ACCORDIONS AH0 - K AO1 R - D IY0 - AH0 N Z
ACCORDS AH0 - K AO1 R D Z
ACCOST AH0 - K AO1 S T
ACCOSTED AH0 - K AA1 - S T AH0 D
ACCOSTING AH0 - K AA1 - S T IH0 NG
ACCOUNT AH0 - K AW1 N T
ACCOUNT'S AH0 - K AW1 N T S
ACCOUNTABILITY AH0 - K AW1 N - T AH0 - B IH0 - L IH0 - T IY0
ACCOUNTABILITY(2) AH0 - K AW1 - N AH0 - B IH0 - L IH0 - T IY0
ACCOUNTABLE AH0 - K AW1 N - T AH0 - B AH0 L
ACCOUNTABLE(2) AH0 - K AW1 - N AH0 - B AH0 L
ACCOUNTANCY AH0 - K AW1 N - T AH0 N - S IY0
ACCOUNTANT AH0 - K AW1 N - T AH0 N T
ACCOUNTANT'S AH0 - K AW1 N - T AH0 N T S
ACCOUNTANTS AH0 - K AW1 N - T AH0 N T S
ACCOUNTANTS' AH0 - K AW1 N - T AH0 N T S
ACCOUNTED AH0 - K AW1 N - T AH0 D
ACCOUNTED(2) AH0 - K AW1 - N AH0 D
ACCOUNTEMP AH0 - K AW1 N - T EH2 M P
ACCOUNTEMPS AH0 - K AW1 N - T EH2 M P S
ACCOUNTING AH0 - K AW1 N - T IH0 NG
ACCOUNTING(2) AH0 - K AW1 - N IH0 NG
ACCOUNTS AH0 - K AW1 N T S
ACCOUTERMENT AH0 - K UW1 - T ER0 - M AH0 N T
ACCOUTERMENTS AH0 - K UW1 - T ER0 - M AH0 N T S
ACCREDIT AH0 - K R EH2 - D AH0 T
ACCREDITATION AH0 - K R EH2 - D AH0 - T EY1 - SH AH0 N
ACCREDITATIONS AH0 - K R EH2 - D AH0 - D EY1 - SH AH0 N Z
ACCREDITED AH0 - K R EH1 - D IH0 - T IH0 D
ACCREDITING AH0 - K R EH1 - D AH0 - T IH0 NG
ACCRETION AH0 - K R IY1 - SH AH0 N
ACCRUAL AH0 - K R UW1 - AH0 L
ACCRUALS AH0 - K R UW1 - AH0 L Z
ACCRUE AH0 - K R UW1
ACCRUED AH0 - K R UW1 D
ACCRUES AH0 - K R UW1 Z
ACCRUING AH0 - K R UW1 - IH0 NG
ACCUMULATE AH0 - K Y UW1 - M Y AH0 - L EY2 T
ACCUMULATED AH0 - K Y UW1 - M Y AH0 - L EY2 - T IH0 D
ACCUMULATES AH0 - K Y UW1 - M Y AH0 - L EY2 T S
ACCUMULATING AH0 - K Y UW1 - M Y AH0 - L EY2 - T IH0 NG
ACCUMULATION AH0 - K Y UW2 - M Y AH0 - L EY1 - SH AH0 N
ACCUMULATIONS AH0 - K Y UW2 - M Y AH0 - L EY1 - SH AH0 N Z
ACCUMULATIVE AH0 - K Y UW1 - M Y AH0 - L EY2 - T IH0 V
ACCUMULATIVELY AH0 - K Y UW1 - M Y AH0 - L EY2 - T IH0 V - L IY0
ACCUMULATIVELY(2) AH0 - K Y UW1 - M Y AH0 - L AH0 - T IH0 V - L IY0
ACCUMULATOR AH0 - K Y UW1 - M Y AH0 - L EY2 - T ER0
ACCUMULATORS AH0 - K Y UW1 - M Y AH0 - L EY2 - T ER0 Z
ACCURACIES AE1 - K Y ER0 - AH0 - S IY0 Z
ACCURACY AE1 - K Y ER0 - AH0 - S IY0
ACCURATE AE1 - K Y ER0 - AH0 T
ACCURATELY AE1 - K Y ER0 - AH0 T - L IY0
ACCURAY AE1 - K Y ER0 - EY2
ACCURAY'S AE1 - K Y ER0 - EY2 Z
ACCURIDE AE1 - K Y ER0 - AY2 D
ACCURSO AA0 - K UH1 R - S OW0
ACCUSATION AE2 - K Y AH0 - Z EY1 - SH AH0 N
ACCUSATION(2) AE2 - K Y UW0 - Z EY1 - SH AH0 N
ACCUSATIONS AE2 - K Y AH0 - Z EY1 - SH AH0 N Z
ACCUSATIONS(2) AE2 - K Y UW0 - Z EY1 - SH AH0 N Z
ACCUSATIVE AH0 - K Y UW1 - Z AH0 - T IH0 V
ACCUSATORY AH0 - K Y UW1 - Z AH0 - T AO2 - R IY0
ACCUSE AH0 - K Y UW1 Z
ACCUSED AH0 - K Y UW1 Z D
ACCUSER AH0 - K Y UW1 - Z ER0
ACCUSERS AH0 - K Y UW1 - Z ER0 Z
ACCUSES AH0 - K Y UW1 - Z IH0 Z
ACCUSING AH0 - K Y UW1 - Z IH0 NG
ACCUSINGLY AH0 - K Y UW1 - Z IH0 NG - L IY0
ACCUSTOM AH0 - K AH1 - S T AH0 M
ACCUSTOMED AH0 - K AH1 - S T AH0 M D
ACCUTANE AE1 - K Y UW0 - T EY2 N
ACE EY1 S
ACED EY1 S T
ACER EY1 - S ER0
ACERBIC AH0 - S EH1 R - B IH0 K
ACERO AH0 - S EH1 - R OW0
ACERRA AH0 - S EH1 - R AH0
ACES EY1 - S IH0 Z
ACETAMINOPHEN AH0 - S IY2 - T AH0 - M IH1 - N AH0 - F AH0 N
ACETATE AE1 - S AH0 - T EY2 T
ACETIC AH0 - S EH1 - T IH0 K
ACETIC(2) AH0 - S IY1 - T IH0 K
ACETO AA0 - S EH1 - T OW0
ACETONE AE1 - S AH0 - T OW2 N
ACETYLCHOLINE AH0 - S EH2 - T AH0 L - K OW1 - L IY0 N
ACETYLCHOLINE(2) AH0 - S IY2 - T AH0 L - K OW1 - L IY0 N
ACETYLENE AH0 - S EH1 - T AH0 - L IY2 N
ACEVEDO AE0 - S AH0 - V EY1 - D OW0
ACEVES AA0 - S EY1 - V EH0 S
ACEY EY1 - S IY0
ACHATZ AE1 - K AH0 T S
ACHE EY1 K
ACHEBE AA0 - CH EY1 - B IY0
ACHEE AH0 - CH IY1
ACHENBACH AE1 - K IH0 N - B AA0 K
ACHENBAUM AE1 - K AH0 N - B AW2 M
ACHES EY1 K S
ACHESON AE1 - CH AH0 - S AH0 N
ACHEY AE1 - CH IY0
ACHIEVABLE AH0 - CH IY1 - V AH0 - B AH0 L
ACHIEVE AH0 - CH IY1 V
ACHIEVED AH0 - CH IY1 V D
ACHIEVEMENT AH0 - CH IY1 V - M AH0 N T
ACHIEVEMENTS AH0 - CH IY1 V - M AH0 N T S
ACHIEVER AH0 - CH IY1 - V ER0
ACHIEVERS AH0 - CH IY1 - V ER0 Z
ACHIEVES AH0 - CH IY1 V Z
ACHIEVING AH0 - CH IY1 - V IH0 NG
ACHILLE AH0 - K IH1 - L IY0
ACHILLES AH0 - K IH1 - L IY0 Z
ACHILLES' AH0 - K IH1 - L IY0 Z
ACHING EY1 - K IH0 NG
ACHMED AA1 HH - M EH0 D
ACHOA AH0 - CH OW1 - AH0
ACHOA'S AH0 - CH OW1 - AH0 Z
ACHOR EY1 - K ER0
ACHORD AE1 - K AO0 R D
ACHORN AE1 - K ER0 N
ACHTENBERG AE1 K - T EH0 N - B ER0 G
ACHTERBERG AE1 K - T ER0 - B ER0 G
ACHY EY1 - K IY0
ACID AE1 - S AH0 D
ACIDIC AH0 - S IH1 - D IH0 K
ACIDIFICATION AH0 - S IH2 - D AH0 - F AH0 - K EY1 - SH AH0 N
ACIDIFIED AH0 - S IH1 - D AH0 - F AY2 D
ACIDIFIES AH0 - S IH1 - D AH0 - F AY2 Z
ACIDIFY AH0 - S IH1 - D AH0 - F AY2
ACIDITY AH0 - S IH1 - D AH0 - T IY0
ACIDLY AE1 - S AH0 D - L IY0
ACIDOSIS AE2 - S AH0 - D OW1 - S AH0 S
ACIDS AE1 - S AH0 D Z
ACIDURIA AE2 - S AH0 - D UH1 - R IY0 - AH0
ACIERNO AA0 - S IH1 R - N OW0
ACK AE1 K
ACKER AE1 - K ER0
ACKER'S AE1 - K ER0 Z
ACKERLEY AE1 - K ER0 - L IY0
ACKERLY AE1 - K ER0 - L IY0
ACKERMAN AE1 - K ER0 - M AH0 N
ACKERMANN AE1 - K ER0 - M AH0 N
ACKERSON AE1 - K ER0 - S AH0 N
ACKERT AE1 - K ER0 T
ACKHOUSE AE1 K - HH AW2 S
ACKLAND AE1 K - L AH0 N D
ACKLES AE1 - K AH0 L Z
ACKLEY AE1 K - L IY0
ACKLIN AE1 - K L IH0 N
ACKMAN AE1 K - M AH0 N
ACKNOWLEDGE AE0 K - N AA1 - L IH0 JH
ACKNOWLEDGE(2) IH0 K - N AA1 - L IH0 JH
ACKNOWLEDGEABLE AE0 K - N AA1 - L IH0 - JH AH0 - B AH0 L
ACKNOWLEDGEABLE(2) IH0 K - N AA1 - L IH0 - JH AH0 - B AH0 L
ACKNOWLEDGED AE0 K - N AA1 - L IH0 JH D
ACKNOWLEDGED(2) IH0 K - N AA1 - L IH0 JH D
ACKNOWLEDGEMENT AE0 K - N AA1 - L IH0 JH - M AH0 N T
ACKNOWLEDGEMENT(2) IH0 K - N AA1 - L IH0 JH - M AH0 N T
ACKNOWLEDGES AE0 K - N AA1 - L IH0 - JH IH0 Z
ACKNOWLEDGES(2) IH0 K - N AA1 - L IH0 - JH IH0 Z
ACKNOWLEDGING AE0 K - N AA1 - L IH0 - JH IH0 NG
ACKNOWLEDGING(2) IH0 K - N AA1 - L IH0 - JH IH0 NG
ACKNOWLEDGMENT AE0 K - N AA1 - L IH0 JH - M AH0 N T
ACKNOWLEDGMENT(2) IH0 K - N AA1 - L IH0 JH - M AH0 N T
ACKROYD AE1 - K R OY2 D
ACKROYD'S AE1 - K R OY2 D Z
ACMAT AE1 K - M AE0 T
ACMAT'S AE1 K - M AE0 T S
ACME AE1 K - M IY0
ACME'S AE1 K - M IY0 Z
ACNE AE1 K - N IY0
ACOCELLA AA0 - K OW0 - CH EH1 - L AH0
ACOFF AE1 - K AO0 F
ACOG AH0 - K AO1 G
ACOLYTE AE1 - K AH0 - L AY2 T
ACOLYTES AE1 - K AH0 - L AY2 T S
ACORD AH0 - K AO1 R D
ACORN EY1 - K AO0 R N
ACORNS EY1 - K AO0 R N Z
ACOSTA AH0 - K AO1 - S T AH0
ACOUSTIC AH0 - K UW1 - S T IH0 K
ACOUSTICAL AH0 - K UW1 - S T IH0 - K AH0 L
ACOUSTICALLY AH0 - K UW1 - S T IH0 K - L IY0
ACOUSTICS AH0 - K UW1 - S T IH0 K S
ACQUAINT AH0 - K W EY1 N T
ACQUAINTANCE AH0 - K W EY1 N - T AH0 N S
ACQUAINTANCES AH0 - K W EY1 N - T AH0 N - S IH0 Z
ACQUAINTANCESHIP AH0 - K W EY1 N - T AH0 N S - SH IH0 P
ACQUAINTED AH0 - K W EY1 N - T IH0 D
ACQUAINTED(2) AH0 - K W EY1 - N IH0 D
ACQUAVIVA AA0 - K W AA0 - V IY1 - V AH0
ACQUIESCE AE2 - K W IY0 - EH1 S
ACQUIESCED AE2 - K W IY0 - EH1 S T
ACQUIESCENCE AE2 - K W IY0 - EH1 - S AH0 N S
ACQUIESCING AE2 - K W IY0 - EH1 - S IH0 NG
ACQUIRE AH0 - K W AY1 - ER0
ACQUIRED AH0 - K W AY1 - ER0 D
ACQUIRER AH0 - K W AY1 - ER0 - ER0
ACQUIRERS AH0 - K W AY1 - ER0 - ER0 Z
ACQUIRES AH0 - K W AY1 - ER0 Z
ACQUIRING AH0 - K W AY1 - R IH0 NG
ACQUIRING(2) AH0 - K W AY1 - ER0 - IH0 NG
ACQUISITION AE2 - K W AH0 - Z IH1 - SH AH0 N
ACQUISITION'S AE2 - K W AH0 - Z IH1 - SH AH0 N Z
ACQUISITIONS AE2 - K W AH0 - Z IH1 - SH AH0 N Z
ACQUISITIVE AH0 - K W IH1 - Z AH0 - T IH0 V
ACQUIT AH0 - K W IH1 T
ACQUITAINE AE1 - K W IH0 - T EY2 N
ACQUITS AH0 - K W IH1 T S
ACQUITTAL AH0 - K W IH1 - T AH0 L
ACQUITTALS AH0 - K W IH1 - T AH0 L Z
ACQUITTED AH0 - K W IH1 - T AH0 D
ACQUITTED(2) AH0 - K W IH1 - T IH0 D
ACQUITTING AH0 - K W IH1 - T IH0 NG
ACRE EY1 - K ER0
ACREAGE EY1 - K ER0 - IH0 JH
ACREAGE(2) EY1 - K R AH0 JH
ACREE AH0 - K R IY1
ACRES EY1 - K ER0 Z
ACREY AE1 - K R IY0
ACRI AA1 - K R IY0
ACRID AE1 - K R IH0 D
ACRIMONIOUS AE2 - K R AH0 - M OW1 - N IY0 - AH0 S
ACRIMONY AE1 - K R IH0 - M OW2 - N IY0
ACROBAT AE1 - K R AH0 - B AE2 T
ACROBATIC AE2 - K R AH0 - B AE1 - T IH0 K
ACROBATICS AE2 - K R AH0 - B AE1 - T IH0 K S
ACROBATS AE1 - K R AH0 - B AE2 T S
ACRONYM AE1 - K R AH0 - N IH0 M
ACRONYMS AE1 - K R AH0 - N IH0 M Z
ACROPOLIS AH0 - K R AA1 - P AH0 - L AH0 S
ACROSS AH0 - K R AO1 S
ACRYLIC AH0 - K R IH1 - L IH0 K
ACRYLICS AH0 - K R IH1 - L IH0 K S
ACT AE1 K T
ACT'S AE1 K T S
ACTAVA AE2 K - T AA1 - V AH0
ACTED AE1 K - T AH0 D
ACTED(2) AE1 K - T IH0 D
ACTIGALL AE1 K - T IH0 - G AO0 L
ACTIN AE1 K - T AH0 N
ACTING AE1 K - T IH0 NG
ACTINIDE AE1 K - T IH0 - N AY2 D
ACTINIDIA AE2 K - T IH0 - N IH1 - D IY0 - AH0
ACTION AE1 K - SH AH0 N
ACTION'S AE1 K - SH AH0 N Z
ACTIONABLE AE1 K - SH AH0 N - AH0 - B AH0 L
ACTIONS AE1 K - SH AH0 N Z
ACTIVASE AE1 K - T IH0 - V EY2 Z
ACTIVATE AE1 K - T AH0 - V EY2 T
ACTIVATED AE1 K - T AH0 - V EY2 - T AH0 D
ACTIVATED(2) AE1 K - T IH0 - V EY2 - T IH0 D
ACTIVATES AE1 K - T AH0 - V EY2 T S
ACTIVATING AE1 K - T AH0 - V EY2 - T IH0 NG
ACTIVATION AE2 K - T AH0 - V EY1 - SH AH0 N
ACTIVATOR AE1 K - T AH0 - V EY2 - T ER0
ACTIVE AE1 K - T IH0 V
ACTIVELY AE1 K - T IH0 V - L IY0
ACTIVES AE1 K - T IH0 V Z
ACTIVISION AE1 K - T IH0 - V IH2 - ZH AH0 N
ACTIVISM AE1 K - T IH0 - V IH2 - Z AH0 M
ACTIVIST AE1 K - T AH0 - V AH0 S T
ACTIVIST(
gitextract_toln448z/
├── .github/
│ └── workflows/
│ └── pypi.yml
├── .gitignore
├── Dockerfile
├── LICENSE
├── README.md
├── docs/
│ ├── install.md
│ ├── quick_use.md
│ └── training.md
├── melo/
│ ├── __init__.py
│ ├── api.py
│ ├── app.py
│ ├── attentions.py
│ ├── commons.py
│ ├── configs/
│ │ └── config.json
│ ├── data/
│ │ └── example/
│ │ └── metadata.list
│ ├── data_utils.py
│ ├── download_utils.py
│ ├── infer.py
│ ├── init_downloads.py
│ ├── losses.py
│ ├── main.py
│ ├── mel_processing.py
│ ├── models.py
│ ├── modules.py
│ ├── monotonic_align/
│ │ ├── __init__.py
│ │ └── core.py
│ ├── preprocess_text.py
│ ├── split_utils.py
│ ├── text/
│ │ ├── __init__.py
│ │ ├── chinese.py
│ │ ├── chinese_bert.py
│ │ ├── chinese_mix.py
│ │ ├── cleaner.py
│ │ ├── cleaner_multiling.py
│ │ ├── cmudict.rep
│ │ ├── cmudict_cache.pickle
│ │ ├── english.py
│ │ ├── english_bert.py
│ │ ├── english_utils/
│ │ │ ├── __init__.py
│ │ │ ├── abbreviations.py
│ │ │ ├── number_norm.py
│ │ │ └── time_norm.py
│ │ ├── es_phonemizer/
│ │ │ ├── __init__.py
│ │ │ ├── base.py
│ │ │ ├── cleaner.py
│ │ │ ├── es_symbols.json
│ │ │ ├── es_symbols.txt
│ │ │ ├── es_symbols_v2.json
│ │ │ ├── es_to_ipa.py
│ │ │ ├── example_ipa.txt
│ │ │ ├── gruut_wrapper.py
│ │ │ ├── punctuation.py
│ │ │ ├── spanish_symbols.txt
│ │ │ └── test.ipynb
│ │ ├── fr_phonemizer/
│ │ │ ├── __init__.py
│ │ │ ├── base.py
│ │ │ ├── cleaner.py
│ │ │ ├── en_symbols.json
│ │ │ ├── example_ipa.txt
│ │ │ ├── fr_symbols.json
│ │ │ ├── fr_to_ipa.py
│ │ │ ├── french_abbreviations.py
│ │ │ ├── french_symbols.txt
│ │ │ ├── gruut_wrapper.py
│ │ │ └── punctuation.py
│ │ ├── french.py
│ │ ├── french_bert.py
│ │ ├── japanese.py
│ │ ├── japanese_bert.py
│ │ ├── ko_dictionary.py
│ │ ├── korean.py
│ │ ├── opencpop-strict.txt
│ │ ├── spanish.py
│ │ ├── spanish_bert.py
│ │ ├── symbols.py
│ │ └── tone_sandhi.py
│ ├── train.py
│ ├── train.sh
│ ├── transforms.py
│ └── utils.py
├── requirements.txt
├── setup.py
└── test/
├── basetts_test_resources/
│ ├── en_egs_text.txt
│ ├── es_egs_text.txt
│ ├── fr_egs_text.txt
│ ├── jp_egs_text.txt
│ ├── kr_egs_text.txt
│ └── zh_mix_en_egs_text.txt
├── test_base_model_tts_package.py
└── test_base_model_tts_package_from_S3.py
SYMBOL INDEX (402 symbols across 49 files)
FILE: melo/api.py
class TTS (line 20) | class TTS(nn.Module):
method __init__ (line 21) | def __init__(self,
method audio_numpy_concat (line 66) | def audio_numpy_concat(segment_data_list, sr, speed=1.):
method split_sentences_into_pieces (line 75) | def split_sentences_into_pieces(text, language, quiet=False):
method tts_to_file (line 83) | def tts_to_file(self, text, speaker_id, output_path=None, sdp_ratio=0....
FILE: melo/app.py
function synthesize (line 31) | def synthesize(speaker, text, speed, language, progress=gr.Progress()):
function load_speakers (line 35) | def load_speakers(language, text):
function main (line 57) | def main(share, host, port):
FILE: melo/attentions.py
class LayerNorm (line 12) | class LayerNorm(nn.Module):
method __init__ (line 13) | def __init__(self, channels, eps=1e-5):
method forward (line 21) | def forward(self, x):
function fused_add_tanh_sigmoid_multiply (line 28) | def fused_add_tanh_sigmoid_multiply(input_a, input_b, n_channels):
class Encoder (line 37) | class Encoder(nn.Module):
method __init__ (line 38) | def __init__(
method forward (line 98) | def forward(self, x, x_mask, g=None):
class Decoder (line 118) | class Decoder(nn.Module):
method __init__ (line 119) | def __init__(
method forward (line 178) | def forward(self, x, x_mask, h, h_mask):
class MultiHeadAttention (line 204) | class MultiHeadAttention(nn.Module):
method __init__ (line 205) | def __init__(
method forward (line 258) | def forward(self, x, c, attn_mask=None):
method attention (line 268) | def attention(self, query, key, value, mask=None):
method _matmul_with_relative_values (line 319) | def _matmul_with_relative_values(self, x, y):
method _matmul_with_relative_keys (line 328) | def _matmul_with_relative_keys(self, x, y):
method _get_relative_embeddings (line 337) | def _get_relative_embeddings(self, relative_embeddings, length):
method _relative_position_to_absolute_position (line 355) | def _relative_position_to_absolute_position(self, x):
method _absolute_position_to_relative_position (line 376) | def _absolute_position_to_relative_position(self, x):
method _attention_bias_proximal (line 392) | def _attention_bias_proximal(self, length):
class FFN (line 404) | class FFN(nn.Module):
method __init__ (line 405) | def __init__(
method forward (line 433) | def forward(self, x, x_mask):
method _causal_padding (line 443) | def _causal_padding(self, x):
method _same_padding (line 452) | def _same_padding(self, x):
FILE: melo/commons.py
function init_weights (line 6) | def init_weights(m, mean=0.0, std=0.01):
function get_padding (line 12) | def get_padding(kernel_size, dilation=1):
function convert_pad_shape (line 16) | def convert_pad_shape(pad_shape):
function intersperse (line 22) | def intersperse(lst, item):
function kl_divergence (line 28) | def kl_divergence(m_p, logs_p, m_q, logs_q):
function rand_gumbel (line 37) | def rand_gumbel(shape):
function rand_gumbel_like (line 43) | def rand_gumbel_like(x):
function slice_segments (line 48) | def slice_segments(x, ids_str, segment_size=4):
function rand_slice_segments (line 57) | def rand_slice_segments(x, x_lengths=None, segment_size=4):
function get_timing_signal_1d (line 67) | def get_timing_signal_1d(length, channels, min_timescale=1.0, max_timesc...
function add_timing_signal_1d (line 83) | def add_timing_signal_1d(x, min_timescale=1.0, max_timescale=1.0e4):
function cat_timing_signal_1d (line 89) | def cat_timing_signal_1d(x, min_timescale=1.0, max_timescale=1.0e4, axis...
function subsequent_mask (line 95) | def subsequent_mask(length):
function fused_add_tanh_sigmoid_multiply (line 101) | def fused_add_tanh_sigmoid_multiply(input_a, input_b, n_channels):
function convert_pad_shape (line 110) | def convert_pad_shape(pad_shape):
function shift_1d (line 116) | def shift_1d(x):
function sequence_mask (line 121) | def sequence_mask(length, max_length=None):
function generate_path (line 128) | def generate_path(duration, mask):
function clip_grad_value_ (line 145) | def clip_grad_value_(parameters, clip_value, norm_type=2):
FILE: melo/data_utils.py
class TextAudioSpeakerLoader (line 17) | class TextAudioSpeakerLoader(torch.utils.data.Dataset):
method __init__ (line 24) | def __init__(self, audiopaths_sid_text, hparams):
method _filter (line 53) | def _filter(self):
method get_audio_text_speaker_pair (line 94) | def get_audio_text_speaker_pair(self, audiopath_sid_text):
method get_audio (line 107) | def get_audio(self, filename):
method get_text (line 150) | def get_text(self, text, word2ph, phone, tone, language_str, wav_path):
method get_sid (line 189) | def get_sid(self, sid):
method __getitem__ (line 193) | def __getitem__(self, index):
method __len__ (line 196) | def __len__(self):
class TextAudioSpeakerCollate (line 200) | class TextAudioSpeakerCollate:
method __init__ (line 203) | def __init__(self, return_ids=False):
method __call__ (line 206) | def __call__(self, batch):
class DistributedBucketSampler (line 285) | class DistributedBucketSampler(torch.utils.data.distributed.DistributedS...
method __init__ (line 295) | def __init__(
method _create_buckets (line 314) | def _create_buckets(self):
method __iter__ (line 346) | def __iter__(self):
method _bisect (line 397) | def _bisect(self, x, lo=0, hi=None):
method __len__ (line 412) | def __len__(self):
FILE: melo/download_utils.py
function load_or_download_config (line 44) | def load_or_download_config(locale, use_hf=True, config_path=None):
function load_or_download_model (line 55) | def load_or_download_model(locale, device, use_hf=True, ckpt_path=None):
function load_pretrain_model (line 66) | def load_pretrain_model():
FILE: melo/infer.py
function main (line 12) | def main(ckpt_path, text, language, output_dir):
FILE: melo/losses.py
function feature_loss (line 4) | def feature_loss(fmap_r, fmap_g):
function discriminator_loss (line 15) | def discriminator_loss(disc_real_outputs, disc_generated_outputs):
function generator_loss (line 31) | def generator_loss(disc_outputs):
function kl_loss (line 43) | def kl_loss(z_p, logs_q, m_p, logs_p, z_mask):
FILE: melo/main.py
function main (line 14) | def main(text, file, output_path, language, speaker, speed, device):
FILE: melo/mel_processing.py
function dynamic_range_compression_torch (line 9) | def dynamic_range_compression_torch(x, C=1, clip_val=1e-5):
function dynamic_range_decompression_torch (line 18) | def dynamic_range_decompression_torch(x, C=1):
function spectral_normalize_torch (line 27) | def spectral_normalize_torch(magnitudes):
function spectral_de_normalize_torch (line 32) | def spectral_de_normalize_torch(magnitudes):
function spectrogram_torch (line 41) | def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, cente...
function spectrogram_torch_conv (line 79) | def spectrogram_torch_conv(y, n_fft, sampling_rate, hop_size, win_size, ...
function spec_to_mel_torch (line 118) | def spec_to_mel_torch(spec, n_fft, num_mels, sampling_rate, fmin, fmax):
function mel_spectrogram_torch (line 132) | def mel_spectrogram_torch(
FILE: melo/models.py
class DurationDiscriminator (line 17) | class DurationDiscriminator(nn.Module): # vits2
method __init__ (line 18) | def __init__(
method forward_probability (line 53) | def forward_probability(self, x, x_mask, dur, g=None):
method forward (line 69) | def forward(self, x, x_mask, dur_r, dur_hat, g=None):
class TransformerCouplingBlock (line 91) | class TransformerCouplingBlock(nn.Module):
method __init__ (line 92) | def __init__(
method forward (line 147) | def forward(self, x, x_mask, g=None, reverse=False):
class StochasticDurationPredictor (line 157) | class StochasticDurationPredictor(nn.Module):
method __init__ (line 158) | def __init__(
method forward (line 206) | def forward(self, x, x_mask, w=None, g=None, reverse=False, noise_scal...
class DurationPredictor (line 268) | class DurationPredictor(nn.Module):
method __init__ (line 269) | def __init__(
method forward (line 294) | def forward(self, x, x_mask, g=None):
class TextEncoder (line 311) | class TextEncoder(nn.Module):
method __init__ (line 312) | def __init__(
method forward (line 360) | def forward(self, x, x_lengths, tone, language, bert, ja_bert, g=None):
class ResidualCouplingBlock (line 384) | class ResidualCouplingBlock(nn.Module):
method __init__ (line 385) | def __init__(
method forward (line 419) | def forward(self, x, x_mask, g=None, reverse=False):
class PosteriorEncoder (line 429) | class PosteriorEncoder(nn.Module):
method __init__ (line 430) | def __init__(
method forward (line 459) | def forward(self, x, x_lengths, g=None, tau=1.0):
class Generator (line 471) | class Generator(torch.nn.Module):
method __init__ (line 472) | def __init__(
method forward (line 519) | def forward(self, x, g=None):
method remove_weight_norm (line 540) | def remove_weight_norm(self):
class DiscriminatorP (line 548) | class DiscriminatorP(torch.nn.Module):
method __init__ (line 549) | def __init__(self, period, kernel_size=5, stride=3, use_spectral_norm=...
method forward (line 605) | def forward(self, x):
class DiscriminatorS (line 627) | class DiscriminatorS(torch.nn.Module):
method __init__ (line 628) | def __init__(self, use_spectral_norm=False):
method forward (line 643) | def forward(self, x):
class MultiPeriodDiscriminator (line 657) | class MultiPeriodDiscriminator(torch.nn.Module):
method __init__ (line 658) | def __init__(self, use_spectral_norm=False):
method forward (line 668) | def forward(self, y, y_hat):
class ReferenceEncoder (line 684) | class ReferenceEncoder(nn.Module):
method __init__ (line 690) | def __init__(self, spec_channels, gin_channels=0, layernorm=False):
method forward (line 724) | def forward(self, inputs, mask=None):
method calculate_channels (line 746) | def calculate_channels(self, L, kernel_size, stride, pad, n_convs):
class SynthesizerTrn (line 752) | class SynthesizerTrn(nn.Module):
method __init__ (line 757) | def __init__(
method forward (line 888) | def forward(self, x, x_lengths, y, y_lengths, sid, tone, language, ber...
method infer (line 966) | def infer(
method voice_conversion (line 1023) | def voice_conversion(self, y, y_lengths, sid_src, sid_tgt, tau=1.0):
FILE: melo/modules.py
class LayerNorm (line 17) | class LayerNorm(nn.Module):
method __init__ (line 18) | def __init__(self, channels, eps=1e-5):
method forward (line 26) | def forward(self, x):
class ConvReluNorm (line 32) | class ConvReluNorm(nn.Module):
method __init__ (line 33) | def __init__(
method forward (line 74) | def forward(self, x, x_mask):
class DDSConv (line 84) | class DDSConv(nn.Module):
method __init__ (line 89) | def __init__(self, channels, kernel_size, n_layers, p_dropout=0.0):
method forward (line 118) | def forward(self, x, x_mask, g=None):
class WN (line 133) | class WN(torch.nn.Module):
method __init__ (line 134) | def __init__(
method forward (line 185) | def forward(self, x, x_mask, g=None, **kwargs):
method remove_weight_norm (line 212) | def remove_weight_norm(self):
class ResBlock1 (line 221) | class ResBlock1(torch.nn.Module):
method __init__ (line 222) | def __init__(self, channels, kernel_size=3, dilation=(1, 3, 5)):
method forward (line 296) | def forward(self, x, x_mask=None):
method remove_weight_norm (line 311) | def remove_weight_norm(self):
class ResBlock2 (line 318) | class ResBlock2(torch.nn.Module):
method __init__ (line 319) | def __init__(self, channels, kernel_size=3, dilation=(1, 3)):
method forward (line 347) | def forward(self, x, x_mask=None):
method remove_weight_norm (line 358) | def remove_weight_norm(self):
class Log (line 363) | class Log(nn.Module):
method forward (line 364) | def forward(self, x, x_mask, reverse=False, **kwargs):
class Flip (line 374) | class Flip(nn.Module):
method forward (line 375) | def forward(self, x, *args, reverse=False, **kwargs):
class ElementwiseAffine (line 384) | class ElementwiseAffine(nn.Module):
method __init__ (line 385) | def __init__(self, channels):
method forward (line 391) | def forward(self, x, x_mask, reverse=False, **kwargs):
class ResidualCouplingLayer (line 402) | class ResidualCouplingLayer(nn.Module):
method __init__ (line 403) | def __init__(
method forward (line 437) | def forward(self, x, x_mask, g=None, reverse=False):
class ConvFlow (line 459) | class ConvFlow(nn.Module):
method __init__ (line 460) | def __init__(
method forward (line 486) | def forward(self, x, x_mask, g=None, reverse=False):
class TransformerCouplingLayer (line 519) | class TransformerCouplingLayer(nn.Module):
method __init__ (line 520) | def __init__(
method forward (line 562) | def forward(self, x, x_mask, g=None, reverse=False):
FILE: melo/monotonic_align/__init__.py
function maximum_path (line 7) | def maximum_path(neg_cent, mask):
FILE: melo/monotonic_align/core.py
function maximum_path_jit (line 14) | def maximum_path_jit(paths, values, t_ys, t_xs):
FILE: melo/preprocess_text.py
function main (line 30) | def main(
FILE: melo/split_utils.py
function split_sentence (line 9) | def split_sentence(text, min_len=10, language_str='EN'):
function split_sentences_latin (line 17) | def split_sentences_latin(text, min_len=10):
function split_sentences_zh (line 26) | def split_sentences_zh(text, min_len=10):
function merge_short_sentences_en (line 51) | def merge_short_sentences_en(sens):
function merge_short_sentences_zh (line 77) | def merge_short_sentences_zh(sens):
function txtsplit (line 105) | def txtsplit(text, desired_length=100, max_length=200):
FILE: melo/text/__init__.py
function cleaned_text_to_sequence (line 7) | def cleaned_text_to_sequence(cleaned_text, tones, language, symbol_to_id...
function get_bert (line 23) | def get_bert(norm_text, word2ph, language, device):
FILE: melo/text/chinese.py
function replace_punctuation (line 55) | def replace_punctuation(text):
function g2p (line 68) | def g2p(text):
function _get_initials_finals (line 80) | def _get_initials_finals(word):
function _g2p (line 93) | def _g2p(segments):
function text_normalize (line 171) | def text_normalize(text):
function get_bert_feature (line 179) | def get_bert_feature(text, word2ph, device=None):
FILE: melo/text/chinese_bert.py
function get_bert_feature (line 13) | def get_bert_feature(text, word2ph, device=None, model_id='hfl/chinese-r...
FILE: melo/text/chinese_mix.py
function replace_punctuation (line 59) | def replace_punctuation(text):
function g2p (line 69) | def g2p(text, impl='v2'):
function _get_initials_finals (line 87) | def _get_initials_finals(word):
function _g2p (line 101) | def _g2p(segments):
function text_normalize (line 189) | def text_normalize(text):
function get_bert_feature (line 197) | def get_bert_feature(text, word2ph, device):
function _g2p_v2 (line 202) | def _g2p_v2(segments):
FILE: melo/text/cleaner.py
function clean_text (line 9) | def clean_text(text, language):
function clean_text_bert (line 16) | def clean_text_bert(text, language, device=None):
function text_to_sequence (line 30) | def text_to_sequence(text, language):
FILE: melo/text/cleaner_multiling.py
function replace_punctuation (line 43) | def replace_punctuation(text):
function lowercase (line 48) | def lowercase(text):
function collapse_whitespace (line 52) | def collapse_whitespace(text):
function remove_punctuation_at_begin (line 55) | def remove_punctuation_at_begin(text):
function remove_aux_symbols (line 58) | def remove_aux_symbols(text):
function replace_symbols (line 63) | def replace_symbols(text, lang="en"):
function unicleaners (line 98) | def unicleaners(text, cased=False, lang='en'):
FILE: melo/text/english.py
function post_replace_ph (line 95) | def post_replace_ph(ph):
function read_dict (line 118) | def read_dict():
function cache_dict (line 142) | def cache_dict(g2p_dict, file_path):
function get_dict (line 147) | def get_dict():
function refine_ph (line 161) | def refine_ph(phn):
function refine_syllables (line 169) | def refine_syllables(syllables):
function text_normalize (line 181) | def text_normalize(text):
function g2p_old (line 190) | def g2p_old(text):
function g2p (line 217) | def g2p(text, pad_start_end=True, tokenized=None):
function get_bert_feature (line 262) | def get_bert_feature(text, word2ph, device=None):
FILE: melo/text/english_bert.py
function get_bert_feature (line 9) | def get_bert_feature(text, word2ph, device=None):
FILE: melo/text/english_utils/abbreviations.py
function expand_abbreviations (line 28) | def expand_abbreviations(text, lang="en"):
FILE: melo/text/english_utils/number_norm.py
function _remove_commas (line 16) | def _remove_commas(m):
function _expand_decimal_point (line 20) | def _expand_decimal_point(m):
function __expand_currency (line 24) | def __expand_currency(value: str, inflection: Dict[float, str]) -> str:
function _expand_currency (line 42) | def _expand_currency(m: "re.Match") -> str:
function _expand_ordinal (line 74) | def _expand_ordinal(m):
function _expand_number (line 78) | def _expand_number(m):
function normalize_numbers (line 91) | def normalize_numbers(text):
FILE: melo/text/english_utils/time_norm.py
function _expand_num (line 18) | def _expand_num(n: int) -> str:
function _expand_time_english (line 22) | def _expand_time_english(match: "re.Match") -> str:
function expand_time_english (line 46) | def expand_time_english(text: str) -> str:
FILE: melo/text/es_phonemizer/base.py
class BasePhonemizer (line 7) | class BasePhonemizer(abc.ABC):
method __init__ (line 34) | def __init__(self, language, punctuations=Punctuation.default_puncs(),...
method _init_language (line 46) | def _init_language(self, language):
method language (line 57) | def language(self):
method name (line 63) | def name():
method is_available (line 69) | def is_available(cls):
method version (line 75) | def version(cls):
method supported_languages (line 81) | def supported_languages():
method is_supported_language (line 85) | def is_supported_language(self, language):
method _phonemize (line 90) | def _phonemize(self, text, separator):
method _phonemize_preprocess (line 93) | def _phonemize_preprocess(self, text) -> Tuple[List[str], List]:
method _phonemize_postprocess (line 107) | def _phonemize_postprocess(self, phonemized, punctuations) -> str:
method phonemize (line 116) | def phonemize(self, text: str, separator="|", language: str = None) ->...
method print_logs (line 137) | def print_logs(self, level: int = 0):
FILE: melo/text/es_phonemizer/cleaner.py
function replace_punctuation (line 43) | def replace_punctuation(text):
function lowercase (line 48) | def lowercase(text):
function collapse_whitespace (line 52) | def collapse_whitespace(text):
function remove_punctuation_at_begin (line 55) | def remove_punctuation_at_begin(text):
function remove_aux_symbols (line 58) | def remove_aux_symbols(text):
function replace_symbols (line 63) | def replace_symbols(text, lang="en"):
function spanish_cleaners (line 98) | def spanish_cleaners(text):
FILE: melo/text/es_phonemizer/es_to_ipa.py
function es2ipa (line 4) | def es2ipa(text):
FILE: melo/text/es_phonemizer/gruut_wrapper.py
class Gruut (line 14) | class Gruut(BasePhonemizer):
method __init__ (line 41) | def __init__(
method name (line 54) | def name():
method phonemize_gruut (line 57) | def phonemize_gruut(self, text: str, separator: str = "|", tie=False) ...
method _phonemize (line 109) | def _phonemize(self, text, separator):
method is_supported_language (line 112) | def is_supported_language(self, language):
method supported_languages (line 117) | def supported_languages() -> List:
method version (line 125) | def version(self):
method is_available (line 134) | def is_available(cls):
FILE: melo/text/es_phonemizer/punctuation.py
class PuncPosition (line 12) | class PuncPosition(Enum):
class Punctuation (line 21) | class Punctuation:
method __init__ (line 43) | def __init__(self, puncs: str = _DEF_PUNCS):
method default_puncs (line 47) | def default_puncs():
method puncs (line 52) | def puncs(self):
method puncs (line 56) | def puncs(self, value):
method strip (line 62) | def strip(self, text):
method strip_to_restore (line 74) | def strip_to_restore(self, text):
method _strip_to_restore (line 88) | def _strip_to_restore(self, text):
method restore (line 120) | def restore(cls, text, puncs):
method _restore (line 135) | def _restore(cls, text, puncs, num): # pylint: disable=too-many-retur...
FILE: melo/text/fr_phonemizer/base.py
class BasePhonemizer (line 7) | class BasePhonemizer(abc.ABC):
method __init__ (line 34) | def __init__(self, language, punctuations=Punctuation.default_puncs(),...
method _init_language (line 46) | def _init_language(self, language):
method language (line 57) | def language(self):
method name (line 63) | def name():
method is_available (line 69) | def is_available(cls):
method version (line 75) | def version(cls):
method supported_languages (line 81) | def supported_languages():
method is_supported_language (line 85) | def is_supported_language(self, language):
method _phonemize (line 90) | def _phonemize(self, text, separator):
method _phonemize_preprocess (line 93) | def _phonemize_preprocess(self, text) -> Tuple[List[str], List]:
method _phonemize_postprocess (line 107) | def _phonemize_postprocess(self, phonemized, punctuations) -> str:
method phonemize (line 116) | def phonemize(self, text: str, separator="|", language: str = None) ->...
method print_logs (line 137) | def print_logs(self, level: int = 0):
FILE: melo/text/fr_phonemizer/cleaner.py
function replace_punctuation (line 48) | def replace_punctuation(text):
function expand_abbreviations (line 53) | def expand_abbreviations(text, lang="fr"):
function lowercase (line 61) | def lowercase(text):
function collapse_whitespace (line 65) | def collapse_whitespace(text):
function remove_punctuation_at_begin (line 68) | def remove_punctuation_at_begin(text):
function remove_aux_symbols (line 71) | def remove_aux_symbols(text):
function replace_symbols (line 76) | def replace_symbols(text, lang="en"):
function french_cleaners (line 111) | def french_cleaners(text):
FILE: melo/text/fr_phonemizer/fr_to_ipa.py
function remove_consecutive_t (line 5) | def remove_consecutive_t(input_str):
function fr2ipa (line 23) | def fr2ipa(text):
FILE: melo/text/fr_phonemizer/gruut_wrapper.py
class Gruut (line 14) | class Gruut(BasePhonemizer):
method __init__ (line 41) | def __init__(
method name (line 54) | def name():
method phonemize_gruut (line 57) | def phonemize_gruut(self, text: str, separator: str = "|", tie=False) ...
method _phonemize (line 109) | def _phonemize(self, text, separator):
method is_supported_language (line 112) | def is_supported_language(self, language):
method supported_languages (line 117) | def supported_languages() -> List:
method version (line 125) | def version(self):
method is_available (line 134) | def is_available(cls):
FILE: melo/text/fr_phonemizer/punctuation.py
class PuncPosition (line 12) | class PuncPosition(Enum):
class Punctuation (line 21) | class Punctuation:
method __init__ (line 43) | def __init__(self, puncs: str = _DEF_PUNCS):
method default_puncs (line 47) | def default_puncs():
method puncs (line 52) | def puncs(self):
method puncs (line 56) | def puncs(self, value):
method strip (line 62) | def strip(self, text):
method strip_to_restore (line 74) | def strip_to_restore(self, text):
method _strip_to_restore (line 88) | def _strip_to_restore(self, text):
method restore (line 118) | def restore(cls, text, puncs):
method _restore (line 133) | def _restore(cls, text, puncs, num): # pylint: disable=too-many-retur...
FILE: melo/text/french.py
function distribute_phone (line 11) | def distribute_phone(n_phone, n_word):
function text_normalize (line 19) | def text_normalize(text):
function g2p (line 26) | def g2p(text, pad_start_end=True, tokenized=None):
function get_bert_feature (line 66) | def get_bert_feature(text, word2ph, device=None):
function text_normalize (line 83) | def text_normalize(text):
FILE: melo/text/french_bert.py
function get_bert_feature (line 9) | def get_bert_feature(text, word2ph, device=None):
FILE: melo/text/japanese.py
function _makerulemap (line 325) | def _makerulemap():
function kata2phoneme (line 333) | def kata2phoneme(text: str) -> str:
function hira2kata (line 360) | def hira2kata(text: str) -> str:
function text2kata (line 370) | def text2kata(text: str) -> str:
function japanese_convert_numbers_to_words (line 467) | def japanese_convert_numbers_to_words(text: str) -> str:
function japanese_convert_alpha_symbols_to_words (line 474) | def japanese_convert_alpha_symbols_to_words(text: str) -> str:
function japanese_text_to_phonemes (line 478) | def japanese_text_to_phonemes(text: str) -> str:
function is_japanese_character (line 488) | def is_japanese_character(char):
function replace_punctuation (line 524) | def replace_punctuation(text):
function text_normalize (line 548) | def text_normalize(text):
function distribute_phone (line 557) | def distribute_phone(n_phone, n_word):
function g2p (line 571) | def g2p(norm_text):
function get_bert_feature (line 614) | def get_bert_feature(text, word2ph, device):
FILE: melo/text/japanese_bert.py
function get_bert_feature (line 8) | def get_bert_feature(text, word2ph, device=None, model_id='tohoku-nlp/be...
FILE: melo/text/korean.py
function normalize (line 16) | def normalize(text):
function normalize_with_dictionary (line 25) | def normalize_with_dictionary(text, dic):
function normalize_english (line 32) | def normalize_english(text):
function korean_text_to_phonemes (line 44) | def korean_text_to_phonemes(text, character: str = "hangeul") -> str:
function text_normalize (line 73) | def text_normalize(text):
function distribute_phone (line 82) | def distribute_phone(n_phone, n_word):
function g2p (line 97) | def g2p(norm_text):
function get_bert_feature (line 141) | def get_bert_feature(text, word2ph, device='cuda'):
FILE: melo/text/spanish.py
function distribute_phone (line 11) | def distribute_phone(n_phone, n_word):
function text_normalize (line 19) | def text_normalize(text):
function post_replace_ph (line 23) | def post_replace_ph(ph):
function refine_ph (line 44) | def refine_ph(phn):
function refine_syllables (line 52) | def refine_syllables(syllables):
function g2p (line 68) | def g2p(text, pad_start_end=True, tokenized=None):
function get_bert_feature (line 108) | def get_bert_feature(text, word2ph, device=None):
FILE: melo/text/spanish_bert.py
function get_bert_feature (line 9) | def get_bert_feature(text, word2ph, device=None):
FILE: melo/text/tone_sandhi.py
class ToneSandhi (line 22) | class ToneSandhi:
method __init__ (line 23) | def __init__(self):
method _neural_sandhi (line 466) | def _neural_sandhi(self, word: str, pos: str, finals: List[str]) -> Li...
method _bu_sandhi (line 522) | def _bu_sandhi(self, word: str, finals: List[str]) -> List[str]:
method _yi_sandhi (line 533) | def _yi_sandhi(self, word: str, finals: List[str]) -> List[str]:
method _split_word (line 558) | def _split_word(self, word: str) -> List[str]:
method _three_sandhi (line 571) | def _three_sandhi(self, word: str, finals: List[str]) -> List[str]:
method _all_tone_three (line 611) | def _all_tone_three(self, finals: List[str]) -> bool:
method _merge_bu (line 616) | def _merge_bu(self, seg: List[Tuple[str, str]]) -> List[Tuple[str, str]]:
method _merge_yi (line 636) | def _merge_yi(self, seg: List[Tuple[str, str]]) -> List[Tuple[str, str]]:
method _merge_continuous_three_tones (line 669) | def _merge_continuous_three_tones(
method _is_reduplication (line 700) | def _is_reduplication(self, word: str) -> bool:
method _merge_continuous_three_tones_2 (line 704) | def _merge_continuous_three_tones_2(
method _merge_er (line 734) | def _merge_er(self, seg: List[Tuple[str, str]]) -> List[Tuple[str, str]]:
method _merge_reduplication (line 743) | def _merge_reduplication(self, seg: List[Tuple[str, str]]) -> List[Tup...
method pre_merge_for_modify (line 752) | def pre_merge_for_modify(self, seg: List[Tuple[str, str]]) -> List[Tup...
method modified_tone (line 764) | def modified_tone(self, word: str, pos: str, finals: List[str]) -> Lis...
FILE: melo/train.py
function run (line 49) | def run():
function train_and_evaluate (line 291) | def train_and_evaluate(
function evaluate (line 539) | def evaluate(hps, generator, eval_loader, writer_eval):
FILE: melo/transforms.py
function piecewise_rational_quadratic_transform (line 12) | def piecewise_rational_quadratic_transform(
function searchsorted (line 45) | def searchsorted(bin_locations, inputs, eps=1e-6):
function unconstrained_rational_quadratic_spline (line 50) | def unconstrained_rational_quadratic_spline(
function rational_quadratic_spline (line 100) | def rational_quadratic_spline(
FILE: melo/utils.py
function get_text_for_tts_infer (line 22) | def get_text_for_tts_infer(text, language_str, hps, device, symbol_to_id...
function load_checkpoint (line 60) | def load_checkpoint(checkpoint_path, model, optimizer=None, skip_optimiz...
function save_checkpoint (line 119) | def save_checkpoint(model, optimizer, learning_rate, iteration, checkpoi...
function summarize (line 140) | def summarize(
function latest_checkpoint_path (line 159) | def latest_checkpoint_path(dir_path, regex="G_*.pth"):
function plot_spectrogram_to_numpy (line 166) | def plot_spectrogram_to_numpy(spectrogram):
function plot_alignment_to_numpy (line 192) | def plot_alignment_to_numpy(alignment, info=None):
function load_wav_to_torch (line 223) | def load_wav_to_torch(full_path):
function load_wav_to_torch_new (line 228) | def load_wav_to_torch_new(full_path):
function load_wav_to_torch_librosa (line 233) | def load_wav_to_torch_librosa(full_path, sr):
function load_filepaths_and_text (line 238) | def load_filepaths_and_text(filename, split="|"):
function get_hparams (line 244) | def get_hparams(init=True):
function clean_checkpoints (line 290) | def clean_checkpoints(path_to_models="logs/44k/", n_ckpts_to_keep=2, sor...
function get_hparams_from_dir (line 335) | def get_hparams_from_dir(model_dir):
function get_hparams_from_file (line 346) | def get_hparams_from_file(config_path):
function check_git_hash (line 355) | def check_git_hash(model_dir):
function get_logger (line 380) | def get_logger(model_dir, filename="train.log"):
class HParams (line 395) | class HParams:
method __init__ (line 396) | def __init__(self, **kwargs):
method keys (line 402) | def keys(self):
method items (line 405) | def items(self):
method values (line 408) | def values(self):
method __len__ (line 411) | def __len__(self):
method __getitem__ (line 414) | def __getitem__(self, key):
method __setitem__ (line 417) | def __setitem__(self, key, value):
method __contains__ (line 420) | def __contains__(self, key):
method __repr__ (line 423) | def __repr__(self):
FILE: setup.py
class PostInstallCommand (line 11) | class PostInstallCommand(install):
method run (line 13) | def run(self):
class PostDevelopCommand (line 18) | class PostDevelopCommand(develop):
method run (line 20) | def run(self):
Condensed preview — 90 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (4,506K chars).
[
{
"path": ".github/workflows/pypi.yml",
"chars": 1094,
"preview": "# This workflow will upload a Python Package using Twine when a release is created\n# For more information see: https://d"
},
{
"path": ".gitignore",
"chars": 151,
"preview": "__pycache__/\n.ipynb_checkpoints/\nbasetts_outputs_use_bert/\nbasetts_outputs/\nmultilingual_ckpts\nbasetts_outputs_package/\n"
},
{
"path": "Dockerfile",
"chars": 316,
"preview": "FROM python:3.9-slim\nWORKDIR /app\nCOPY . /app\n\nRUN apt-get update && apt-get install -y \\\n build-essential libsndfile"
},
{
"path": "LICENSE",
"chars": 1053,
"preview": "Copyright (c) 2024 MyShell.ai\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this soft"
},
{
"path": "README.md",
"chars": 3492,
"preview": "<div align=\"center\">\n <div> </div>\n <img src=\"logo.png\" width=\"300\"/> <br>\n <a href=\"https://trendshift.io/repos"
},
{
"path": "docs/install.md",
"chars": 5146,
"preview": "## Install and Use Locally\n\n### Table of Content\n- [Linux and macOS Install](#linux-and-macos-install)\n- [Docker Install"
},
{
"path": "docs/quick_use.md",
"chars": 1666,
"preview": "## Use MeloTTS without Installation\n\n**Quick Demo**\n\n- [Official live demo](https://app.myshell.ai/bot/UN77N3/1709094629"
},
{
"path": "docs/training.md",
"chars": 1403,
"preview": "## Training\n\nBefore training, please install MeloTTS in dev mode and go to the `melo` folder. \n```\npip install -e .\ncd m"
},
{
"path": "melo/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "melo/api.py",
"chars": 5108,
"preview": "import os\nimport re\nimport json\nimport torch\nimport librosa\nimport soundfile\nimport torchaudio\nimport numpy as np\nimport"
},
{
"path": "melo/app.py",
"chars": 3044,
"preview": "# WebUI by mrfakename <X @realmrfakename / HF @mrfakename>\n# Demo also available on HF Spaces: https://huggingface.co/sp"
},
{
"path": "melo/attentions.py",
"chars": 15936,
"preview": "import math\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nfrom . import commons\nimport logging"
},
{
"path": "melo/commons.py",
"chars": 4956,
"preview": "import math\nimport torch\nfrom torch.nn import functional as F\n\n\ndef init_weights(m, mean=0.0, std=0.01):\n classname ="
},
{
"path": "melo/configs/config.json",
"chars": 1681,
"preview": "{\n \"train\": {\n \"log_interval\": 200,\n \"eval_interval\": 1000,\n \"seed\": 52,\n \"epochs\": 10000,\n \"learning_ra"
},
{
"path": "melo/data/example/metadata.list",
"chars": 2860,
"preview": "data/example/wavs/000.wav|EN-default|EN|Well, there are always new trends and styles emerging in the fashion world, but "
},
{
"path": "melo/data_utils.py",
"chars": 14829,
"preview": "import os\nimport random\nimport torch\nimport torch.utils.data\nfrom tqdm import tqdm\nfrom loguru import logger\nimport comm"
},
{
"path": "melo/download_utils.py",
"chars": 3440,
"preview": "import torch\nimport os\nfrom . import utils\nfrom cached_path import cached_path\nfrom huggingface_hub import hf_hub_downlo"
},
{
"path": "melo/infer.py",
"chars": 998,
"preview": "import os\nimport click\nfrom melo.api import TTS\n\n \n \n@click.command()\n@click.option('--ckpt_path', '-m', type=str,"
},
{
"path": "melo/init_downloads.py",
"chars": 393,
"preview": "\n\nif __name__ == '__main__':\n\n from melo.api import TTS\n device = 'auto'\n models = {\n 'EN': TTS(language"
},
{
"path": "melo/losses.py",
"chars": 1386,
"preview": "import torch\n\n\ndef feature_loss(fmap_r, fmap_g):\n loss = 0\n for dr, dg in zip(fmap_r, fmap_g):\n for rl, gl "
},
{
"path": "melo/main.py",
"chars": 1850,
"preview": "import click\nimport warnings\nimport os\n\n\n@click.command\n@click.argument('text')\n@click.argument('output_path')\n@click.op"
},
{
"path": "melo/mel_processing.py",
"chars": 5868,
"preview": "import torch\nimport torch.utils.data\nimport librosa\nfrom librosa.filters import mel as librosa_mel_fn\n\nMAX_WAV_VALUE = 3"
},
{
"path": "melo/models.py",
"chars": 34027,
"preview": "import math\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nfrom melo import commons\nfrom melo i"
},
{
"path": "melo/modules.py",
"chars": 18975,
"preview": "import math\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nfrom torch.nn import Conv1d\nfrom tor"
},
{
"path": "melo/monotonic_align/__init__.py",
"chars": 563,
"preview": "from numpy import zeros, int32, float32\r\nfrom torch import from_numpy\r\n\r\nfrom .core import maximum_path_jit\r\n\r\n\r\ndef max"
},
{
"path": "melo/monotonic_align/core.py",
"chars": 1270,
"preview": "import numba\r\n\r\n\r\n@numba.jit(\r\n numba.void(\r\n numba.int32[:, :, ::1],\r\n numba.float32[:, :, ::1],\r\n "
},
{
"path": "melo/preprocess_text.py",
"chars": 4423,
"preview": "import json\nfrom collections import defaultdict\nfrom random import shuffle\nfrom typing import Optional\n\nfrom tqdm import"
},
{
"path": "melo/split_utils.py",
"chars": 6251,
"preview": "import re\nimport os\nimport glob\nimport numpy as np\nimport soundfile as sf\nimport torchaudio\nimport re\n\ndef split_sentenc"
},
{
"path": "melo/text/__init__.py",
"chars": 1477,
"preview": "from .symbols import *\n\n\n_symbol_to_id = {s: i for i, s in enumerate(symbols)}\n\n\ndef cleaned_text_to_sequence(cleaned_te"
},
{
"path": "melo/text/chinese.py",
"chars": 5616,
"preview": "import os\nimport re\n\nimport cn2an\nfrom pypinyin import lazy_pinyin, Style\n\nfrom .symbols import punctuation\nfrom .tone_s"
},
{
"path": "melo/text/chinese_bert.py",
"chars": 2481,
"preview": "import torch\nimport sys\nfrom transformers import AutoTokenizer, AutoModelForMaskedLM\n\n\n# model_id = 'hfl/chinese-roberta"
},
{
"path": "melo/text/chinese_mix.py",
"chars": 8250,
"preview": "import os\nimport re\n\nimport cn2an\nfrom pypinyin import lazy_pinyin, Style\n\n# from text.symbols import punctuation\nfrom ."
},
{
"path": "melo/text/cleaner.py",
"chars": 1245,
"preview": "from . import chinese, japanese, english, chinese_mix, korean, french, spanish\nfrom . import cleaned_text_to_sequence\nim"
},
{
"path": "melo/text/cleaner_multiling.py",
"chars": 2590,
"preview": "\"\"\"Set of default text cleaners\"\"\"\n# TODO: pick the cleaner for languages dynamically\n\nimport re\n\n# Regular expression m"
},
{
"path": "melo/text/cmudict.rep",
"chars": 3969309,
"preview": "## Date: August 8, 1998\n##\n## The Carnegie Mellon Pronouncing Dictionary [cmudict.0.6] is Copyright 1998\n## by Carnegie"
},
{
"path": "melo/text/english.py",
"chars": 6479,
"preview": "import pickle\nimport os\nimport re\nfrom g2p_en import G2p\n\nfrom . import symbols\n\nfrom .english_utils.abbreviations impor"
},
{
"path": "melo/text/english_bert.py",
"chars": 1194,
"preview": "import torch\nfrom transformers import AutoTokenizer, AutoModelForMaskedLM\nimport sys\n\nmodel_id = 'bert-base-uncased'\ntok"
},
{
"path": "melo/text/english_utils/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "melo/text/english_utils/abbreviations.py",
"chars": 948,
"preview": "import re\n\n# List of (regular expression, replacement) pairs for abbreviations in english:\nabbreviations_en = [\n (re."
},
{
"path": "melo/text/english_utils/number_norm.py",
"chars": 2804,
"preview": "\"\"\" from https://github.com/keithito/tacotron \"\"\"\n\nimport re\nfrom typing import Dict\n\nimport inflect\n\n_inflect = inflect"
},
{
"path": "melo/text/english_utils/time_norm.py",
"chars": 1173,
"preview": "import re\n\nimport inflect\n\n_inflect = inflect.engine()\n\n_time_re = re.compile(\n r\"\"\"\\b\n ((0?"
},
{
"path": "melo/text/es_phonemizer/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "melo/text/es_phonemizer/base.py",
"chars": 4339,
"preview": "import abc\nfrom typing import List, Tuple\n\nfrom .punctuation import Punctuation\n\n\nclass BasePhonemizer(abc.ABC):\n \"\"\""
},
{
"path": "melo/text/es_phonemizer/cleaner.py",
"chars": 2549,
"preview": "\"\"\"Set of default text cleaners\"\"\"\n# TODO: pick the cleaner for languages dynamically\n\nimport re\n\n# Regular expression m"
},
{
"path": "melo/text/es_phonemizer/es_symbols.json",
"chars": 1180,
"preview": "{\n \"symbols\": [\n \"_\",\n \",\",\n \".\",\n \"!\",\n \"?\",\n \"-\",\n \"~\",\n \"\\"
},
{
"path": "melo/text/es_phonemizer/es_symbols.txt",
"chars": 78,
"preview": "_,.!?-~…NQabdefghijklmnopstuvwxyzɑæʃʑçɯɪɔɛɹðəɫɥɸʊɾʒθβŋɦ⁼ʰ`^#*=ˈˌ→↓↑ ɡrɲʝɣʎː—¿¡"
},
{
"path": "melo/text/es_phonemizer/es_symbols_v2.json",
"chars": 1243,
"preview": "{\n \"symbols\": [\n \"_\",\n \",\",\n \".\",\n \"!\",\n \"?\",\n \"-\",\n \"~\",\n \"\\"
},
{
"path": "melo/text/es_phonemizer/es_to_ipa.py",
"chars": 393,
"preview": "from .cleaner import spanish_cleaners\nfrom .gruut_wrapper import Gruut\n\ndef es2ipa(text):\n e = Gruut(language=\"es-es\""
},
{
"path": "melo/text/es_phonemizer/example_ipa.txt",
"chars": 36314,
"preview": "kapˈitulo ˈuno de daβˈid kˌoppeɾfjˈelð o el soβɾˈino de mi tˈia de tʃˈaɾles dˌiθjˈens.\nˈesta ɡɾˌaβaθjˈon de lˌiβɾˈiβoks "
},
{
"path": "melo/text/es_phonemizer/gruut_wrapper.py",
"chars": 6991,
"preview": "import importlib\nfrom typing import List\n\nimport gruut\nfrom gruut_ipa import IPA # pip install gruut_ipa\n\nfrom .base imp"
},
{
"path": "melo/text/es_phonemizer/punctuation.py",
"chars": 5526,
"preview": "import collections\nimport re\nfrom enum import Enum\n\nimport six\n\n_DEF_PUNCS = ';:,.!?¡¿—…\"«»“”'\n\n_PUNC_IDX = collections."
},
{
"path": "melo/text/es_phonemizer/spanish_symbols.txt",
"chars": 37,
"preview": "dˌaβˈiðkopeɾfjl unθsbmtʃwɛxɪŋʊɣɡrɲʝʎː"
},
{
"path": "melo/text/es_phonemizer/test.ipynb",
"chars": 5440,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 1,\n \"metadata\": {},\n \"outputs\": [\n {\n \"ename\""
},
{
"path": "melo/text/fr_phonemizer/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "melo/text/fr_phonemizer/base.py",
"chars": 4339,
"preview": "import abc\nfrom typing import List, Tuple\n\nfrom .punctuation import Punctuation\n\n\nclass BasePhonemizer(abc.ABC):\n \"\"\""
},
{
"path": "melo/text/fr_phonemizer/cleaner.py",
"chars": 2875,
"preview": "\"\"\"Set of default text cleaners\"\"\"\n# TODO: pick the cleaner for languages dynamically\n\nimport re\nfrom .french_abbreviati"
},
{
"path": "melo/text/fr_phonemizer/en_symbols.json",
"chars": 847,
"preview": "{\"symbols\": [\n \"_\",\n \",\",\n \".\",\n \"!\",\n \"?\",\n \"-\",\n \"~\",\n \"\\u2026\",\n \"N\",\n \"Q\",\n \"a\",\n "
},
{
"path": "melo/text/fr_phonemizer/fr_symbols.json",
"chars": 1351,
"preview": "{\n \"symbols\": [\n \"_\",\n \",\",\n \".\",\n \"!\",\n \"?\",\n \"-\",\n \"~\",\n \"\\"
},
{
"path": "melo/text/fr_phonemizer/fr_to_ipa.py",
"chars": 742,
"preview": "from .cleaner import french_cleaners\nfrom .gruut_wrapper import Gruut\n\n\ndef remove_consecutive_t(input_str):\n result "
},
{
"path": "melo/text/fr_phonemizer/french_abbreviations.py",
"chars": 1360,
"preview": "import re\n\n# List of (regular expression, replacement) pairs for abbreviations in french:\nabbreviations_fr = [\n (re.c"
},
{
"path": "melo/text/fr_phonemizer/french_symbols.txt",
"chars": 84,
"preview": "_,.!?-~…NQabdefghijklmnopstuvwxyzɑæʃʑçɯɪɔɛɹðəɫɥɸʊɾʒθβŋɦ⁼ʰ`^#*=ˈˌ→↓↑ ɣɡrɲʝʎː̃œøʁɒʌ—ɜɐ"
},
{
"path": "melo/text/fr_phonemizer/gruut_wrapper.py",
"chars": 7168,
"preview": "import importlib\nfrom typing import List\n\nimport gruut\nfrom gruut_ipa import IPA # pip install gruut_ipa\n\nfrom .base imp"
},
{
"path": "melo/text/fr_phonemizer/punctuation.py",
"chars": 5442,
"preview": "import collections\nimport re\nfrom enum import Enum\n\nimport six\n\n_DEF_PUNCS = ';:,.!?¡¿—…\"«»“”'\n\n_PUNC_IDX = collections."
},
{
"path": "melo/text/french.py",
"chars": 2885,
"preview": "import pickle\nimport os\nimport re\n\nfrom . import symbols\nfrom .fr_phonemizer import cleaner as fr_cleaner\nfrom .fr_phone"
},
{
"path": "melo/text/french_bert.py",
"chars": 1215,
"preview": "import torch\nfrom transformers import AutoTokenizer, AutoModelForMaskedLM\nimport sys\n\nmodel_id = 'dbmdz/bert-base-french"
},
{
"path": "melo/text/japanese.py",
"chars": 13440,
"preview": "# Convert Japanese text to phonemes which is\n# compatible with Julius https://github.com/julius-speech/segmentation-kit\n"
},
{
"path": "melo/text/japanese_bert.py",
"chars": 1510,
"preview": "import torch\nfrom transformers import AutoTokenizer, AutoModelForMaskedLM\nimport sys\n\n\nmodels = {}\ntokenizers = {}\ndef g"
},
{
"path": "melo/text/ko_dictionary.py",
"chars": 756,
"preview": "# coding: utf-8\r\n# Add the word you want to the dictionary.\r\netc_dictionary = {\"1+1\": \"원플러스원\", \"2+1\": \"투플러스원\"}\r\n\r\n\r\nengl"
},
{
"path": "melo/text/korean.py",
"chars": 5970,
"preview": "# Convert Japanese text to phonemes which is\n# compatible with Julius https://github.com/julius-speech/segmentation-kit\n"
},
{
"path": "melo/text/opencpop-strict.txt",
"chars": 4084,
"preview": "a\tAA a\nai\tAA ai\nan\tAA an\nang\tAA ang\nao\tAA ao\nba\tb a\nbai\tb ai\nban\tb an\nbang\tb ang\nbao\tb ao\nbei\tb ei\nben\tb en\nbeng\tb eng\nb"
},
{
"path": "melo/text/spanish.py",
"chars": 3157,
"preview": "import pickle\nimport os\nimport re\n\nfrom . import symbols\nfrom .es_phonemizer import cleaner as es_cleaner\nfrom .es_phone"
},
{
"path": "melo/text/spanish_bert.py",
"chars": 1216,
"preview": "import torch\nfrom transformers import AutoTokenizer, AutoModelForMaskedLM\nimport sys\n\nmodel_id = 'dccuchile/bert-base-sp"
},
{
"path": "melo/text/symbols.py",
"chars": 4183,
"preview": "# punctuation = [\"!\", \"?\", \"…\", \",\", \".\", \"'\", \"-\"]\npunctuation = [\"!\", \"?\", \"…\", \",\", \".\", \"'\", \"-\", \"¿\", \"¡\"]\npu_symbo"
},
{
"path": "melo/text/tone_sandhi.py",
"chars": 21326,
"preview": "# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the "
},
{
"path": "melo/train.py",
"chars": 22501,
"preview": "# flake8: noqa: E402\n\nimport os\nimport torch\nfrom torch.nn import functional as F\nfrom torch.utils.data import DataLoade"
},
{
"path": "melo/train.sh",
"chars": 391,
"preview": "CONFIG=$1\nGPUS=$2\nMODEL_NAME=$(basename \"$(dirname $CONFIG)\")\n\nPORT=10902\n\nwhile : # auto-resume: the code sometimes cra"
},
{
"path": "melo/transforms.py",
"chars": 7253,
"preview": "import torch\nfrom torch.nn import functional as F\n\nimport numpy as np\n\n\nDEFAULT_MIN_BIN_WIDTH = 1e-3\nDEFAULT_MIN_BIN_HEI"
},
{
"path": "melo/utils.py",
"chars": 13223,
"preview": "import os\nimport glob\nimport argparse\nimport logging\nimport json\nimport subprocess\nimport numpy as np\nfrom scipy.io.wavf"
},
{
"path": "requirements.txt",
"chars": 424,
"preview": "txtsplit\ntorch\ntorchaudio\ncached_path\ntransformers==4.27.4\nnum2words==0.5.12\nunidic_lite==1.0.8\nunidic==1.1.0\nmecab-pyth"
},
{
"path": "setup.py",
"chars": 1010,
"preview": "import os \nfrom setuptools import setup, find_packages\nfrom setuptools.command.develop import develop\nfrom setuptools.co"
},
{
"path": "test/basetts_test_resources/en_egs_text.txt",
"chars": 4966,
"preview": "Did you ever hear a folk tale about a giant turtle?\nCan you name five cars that were popular in the 1970s?\nMay I ask wha"
},
{
"path": "test/basetts_test_resources/es_egs_text.txt",
"chars": 1869,
"preview": "El resplandor del sol acaricia las olas, pintando el cielo con una paleta deslumbrante.\nLas estrellas bailan en la noche"
},
{
"path": "test/basetts_test_resources/fr_egs_text.txt",
"chars": 1873,
"preview": "La lueur dorée du soleil caresse les vagues, peignant le ciel d'une palette éblouissante.\nLes étoiles dansent dans la nu"
},
{
"path": "test/basetts_test_resources/jp_egs_text.txt",
"chars": 286,
"preview": "彼は毎朝ジョギングをして体を健康に保っています。\n私たちは来年、友人たちと一緒にヨーロッパ旅行を計画しています。\n新しいレストランで美味しい料理を試すことが楽しみです。\n彼女の絵は情熱と芸術性が溢れていて、見る人を魅了します。\n最近、忙しさ"
},
{
"path": "test/basetts_test_resources/kr_egs_text.txt",
"chars": 183,
"preview": "안녕하세요! 오늘은 날씨가 정말 좋네요.\n한국 음식을 먹어보고 싶어요. 불고기랑 김치찌개가 제가 좋아하는 음식이에요.\n요즘에는 한국 드라마를 자주 보고 있어요. 정말 재미있어요.\n한글을 배우는 것이 재미있어요. 조금"
},
{
"path": "test/basetts_test_resources/zh_mix_en_egs_text.txt",
"chars": 342,
"preview": "人工智能是一种非常适合和促进自上而下集中控制的技术,而加密货币则是一种完全关注自下而上分散合作的技术。\nWeb 3的一个目标是支持艺术家。\n欢迎来到Web 3与A6Z,一个由团队打造的构建下一代互联网的节目。\n我最喜欢的fruit是苹果。\n"
},
{
"path": "test/test_base_model_tts_package.py",
"chars": 1577,
"preview": "from melo.api import TTS\nimport os\nimport glob\nimport sys\n\n\nlanguage = sys.argv[1]\nmodel = TTS(language=language)\n\nspeak"
},
{
"path": "test/test_base_model_tts_package_from_S3.py",
"chars": 1599,
"preview": "from melo.api import TTS\nimport os\nimport glob\nimport sys\n\n\nlanguage = sys.argv[1]\nmodel = TTS(language=language, use_hf"
}
]
// ... and 2 more files (download for full content)
About this extraction
This page contains the full source code of the myshell-ai/MeloTTS GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 90 files (15.4 MB), approximately 1.1M tokens, and a symbol index with 402 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.