Repository: gia-guar/JARVIS-ChatGPT
Branch: main
Commit: 9aca1c051631
Files: 339
Total size: 2.8 MB

Directory structure:
gitextract_p75g_f1y/

├── .gitignore
├── Assistant/
│   ├── Agents.py
│   ├── Chat.py
│   ├── VirtualAssistant.py
│   ├── __init__.py
│   ├── get_audio.py
│   ├── research_mode.py
│   ├── semantic_scholar/
│   │   ├── S2_tools.py
│   │   ├── __init__.py
│   │   ├── agent_tools.py
│   │   └── simple.py
│   ├── tools.py
│   ├── voice.py
│   └── webui.py
├── LICENSE
├── README.md
├── TTS/
│   ├── .models.json
│   ├── VERSION
│   ├── __init__.py
│   ├── api.py
│   ├── bin/
│   │   ├── __init__.py
│   │   ├── collect_env_info.py
│   │   ├── compute_attention_masks.py
│   │   ├── compute_embeddings.py
│   │   ├── compute_statistics.py
│   │   ├── eval_encoder.py
│   │   ├── extract_tts_spectrograms.py
│   │   ├── find_unique_chars.py
│   │   ├── find_unique_phonemes.py
│   │   ├── remove_silence_using_vad.py
│   │   ├── resample.py
│   │   ├── synthesize.py
│   │   ├── train_encoder.py
│   │   ├── train_tts.py
│   │   ├── train_vocoder.py
│   │   └── tune_wavegrad.py
│   ├── config/
│   │   ├── __init__.py
│   │   └── shared_configs.py
│   ├── encoder/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── configs/
│   │   │   ├── base_encoder_config.py
│   │   │   ├── emotion_encoder_config.py
│   │   │   └── speaker_encoder_config.py
│   │   ├── dataset.py
│   │   ├── losses.py
│   │   ├── models/
│   │   │   ├── base_encoder.py
│   │   │   ├── lstm.py
│   │   │   └── resnet.py
│   │   ├── requirements.txt
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── generic_utils.py
│   │       ├── io.py
│   │       ├── prepare_voxceleb.py
│   │       ├── training.py
│   │       └── visual.py
│   ├── model.py
│   ├── server/
│   │   ├── README.md
│   │   ├── __init__.py
│   │   ├── conf.json
│   │   ├── server.py
│   │   └── templates/
│   │       ├── details.html
│   │       └── index.html
│   ├── tts/
│   │   ├── __init__.py
│   │   ├── configs/
│   │   │   ├── __init__.py
│   │   │   ├── align_tts_config.py
│   │   │   ├── fast_pitch_config.py
│   │   │   ├── fast_speech_config.py
│   │   │   ├── fastspeech2_config.py
│   │   │   ├── glow_tts_config.py
│   │   │   ├── neuralhmm_tts_config.py
│   │   │   ├── overflow_config.py
│   │   │   ├── shared_configs.py
│   │   │   ├── speedy_speech_config.py
│   │   │   ├── tacotron2_config.py
│   │   │   ├── tacotron_config.py
│   │   │   └── vits_config.py
│   │   ├── datasets/
│   │   │   ├── __init__.py
│   │   │   ├── dataset.py
│   │   │   └── formatters.py
│   │   ├── layers/
│   │   │   ├── __init__.py
│   │   │   ├── align_tts/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── duration_predictor.py
│   │   │   │   └── mdn.py
│   │   │   ├── feed_forward/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── decoder.py
│   │   │   │   ├── duration_predictor.py
│   │   │   │   └── encoder.py
│   │   │   ├── generic/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── aligner.py
│   │   │   │   ├── gated_conv.py
│   │   │   │   ├── normalization.py
│   │   │   │   ├── pos_encoding.py
│   │   │   │   ├── res_conv_bn.py
│   │   │   │   ├── time_depth_sep_conv.py
│   │   │   │   ├── transformer.py
│   │   │   │   └── wavenet.py
│   │   │   ├── glow_tts/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── decoder.py
│   │   │   │   ├── duration_predictor.py
│   │   │   │   ├── encoder.py
│   │   │   │   ├── glow.py
│   │   │   │   └── transformer.py
│   │   │   ├── losses.py
│   │   │   ├── overflow/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── common_layers.py
│   │   │   │   ├── decoder.py
│   │   │   │   ├── neural_hmm.py
│   │   │   │   └── plotting_utils.py
│   │   │   ├── tacotron/
│   │   │   │   ├── __init__.py
│   │   │   │   ├── attentions.py
│   │   │   │   ├── capacitron_layers.py
│   │   │   │   ├── common_layers.py
│   │   │   │   ├── gst_layers.py
│   │   │   │   ├── tacotron.py
│   │   │   │   └── tacotron2.py
│   │   │   └── vits/
│   │   │       ├── discriminator.py
│   │   │       ├── networks.py
│   │   │       ├── stochastic_duration_predictor.py
│   │   │       └── transforms.py
│   │   ├── models/
│   │   │   ├── __init__.py
│   │   │   ├── align_tts.py
│   │   │   ├── base_tacotron.py
│   │   │   ├── base_tts.py
│   │   │   ├── forward_tts.py
│   │   │   ├── glow_tts.py
│   │   │   ├── neuralhmm_tts.py
│   │   │   ├── overflow.py
│   │   │   ├── tacotron.py
│   │   │   ├── tacotron2.py
│   │   │   └── vits.py
│   │   └── utils/
│   │       ├── __init__.py
│   │       ├── data.py
│   │       ├── helpers.py
│   │       ├── languages.py
│   │       ├── managers.py
│   │       ├── measures.py
│   │       ├── monotonic_align/
│   │       │   ├── __init__.py
│   │       │   ├── core.c
│   │       │   ├── core.pyx
│   │       │   └── setup.py
│   │       ├── speakers.py
│   │       ├── ssim.py
│   │       ├── synthesis.py
│   │       ├── text/
│   │       │   ├── __init__.py
│   │       │   ├── characters.py
│   │       │   ├── chinese_mandarin/
│   │       │   │   ├── __init__.py
│   │       │   │   ├── numbers.py
│   │       │   │   ├── phonemizer.py
│   │       │   │   └── pinyinToPhonemes.py
│   │       │   ├── cleaners.py
│   │       │   ├── cmudict.py
│   │       │   ├── english/
│   │       │   │   ├── __init__.py
│   │       │   │   ├── abbreviations.py
│   │       │   │   ├── number_norm.py
│   │       │   │   └── time_norm.py
│   │       │   ├── french/
│   │       │   │   ├── __init__.py
│   │       │   │   └── abbreviations.py
│   │       │   ├── japanese/
│   │       │   │   ├── __init__.py
│   │       │   │   └── phonemizer.py
│   │       │   ├── korean/
│   │       │   │   ├── __init__.py
│   │       │   │   ├── ko_dictionary.py
│   │       │   │   ├── korean.py
│   │       │   │   └── phonemizer.py
│   │       │   ├── phonemizers/
│   │       │   │   ├── __init__.py
│   │       │   │   ├── base.py
│   │       │   │   ├── espeak_wrapper.py
│   │       │   │   ├── gruut_wrapper.py
│   │       │   │   ├── ja_jp_phonemizer.py
│   │       │   │   ├── ko_kr_phonemizer.py
│   │       │   │   ├── multi_phonemizer.py
│   │       │   │   └── zh_cn_phonemizer.py
│   │       │   ├── punctuation.py
│   │       │   └── tokenizer.py
│   │       └── visual.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── audio/
│   │   │   ├── __init__.py
│   │   │   ├── numpy_transforms.py
│   │   │   ├── processor.py
│   │   │   └── torch_transforms.py
│   │   ├── callbacks.py
│   │   ├── capacitron_optimizer.py
│   │   ├── distribute.py
│   │   ├── download.py
│   │   ├── downloaders.py
│   │   ├── generic_utils.py
│   │   ├── io.py
│   │   ├── manage.py
│   │   ├── radam.py
│   │   ├── samplers.py
│   │   ├── synthesizer.py
│   │   ├── training.py
│   │   └── vad.py
│   ├── vc/
│   │   ├── configs/
│   │   │   ├── __init__.py
│   │   │   ├── freevc_config.py
│   │   │   └── shared_configs.py
│   │   ├── models/
│   │   │   ├── __init__.py
│   │   │   ├── base_vc.py
│   │   │   └── freevc.py
│   │   └── modules/
│   │       ├── __init__.py
│   │       └── freevc/
│   │           ├── __init__.py
│   │           ├── commons.py
│   │           ├── mel_processing.py
│   │           ├── modules.py
│   │           ├── speaker_encoder/
│   │           │   ├── __init__.py
│   │           │   ├── audio.py
│   │           │   ├── hparams.py
│   │           │   └── speaker_encoder.py
│   │           └── wavlm/
│   │               ├── __init__.py
│   │               ├── config.json
│   │               ├── modules.py
│   │               └── wavlm.py
│   └── vocoder/
│       ├── README.md
│       ├── __init__.py
│       ├── configs/
│       │   ├── __init__.py
│       │   ├── fullband_melgan_config.py
│       │   ├── hifigan_config.py
│       │   ├── melgan_config.py
│       │   ├── multiband_melgan_config.py
│       │   ├── parallel_wavegan_config.py
│       │   ├── shared_configs.py
│       │   ├── univnet_config.py
│       │   ├── wavegrad_config.py
│       │   └── wavernn_config.py
│       ├── datasets/
│       │   ├── __init__.py
│       │   ├── gan_dataset.py
│       │   ├── preprocess.py
│       │   ├── wavegrad_dataset.py
│       │   └── wavernn_dataset.py
│       ├── layers/
│       │   ├── __init__.py
│       │   ├── hifigan.py
│       │   ├── losses.py
│       │   ├── lvc_block.py
│       │   ├── melgan.py
│       │   ├── parallel_wavegan.py
│       │   ├── pqmf.py
│       │   ├── upsample.py
│       │   └── wavegrad.py
│       ├── models/
│       │   ├── __init__.py
│       │   ├── base_vocoder.py
│       │   ├── fullband_melgan_generator.py
│       │   ├── gan.py
│       │   ├── hifigan_discriminator.py
│       │   ├── hifigan_generator.py
│       │   ├── melgan_discriminator.py
│       │   ├── melgan_generator.py
│       │   ├── melgan_multiscale_discriminator.py
│       │   ├── multiband_melgan_generator.py
│       │   ├── parallel_wavegan_discriminator.py
│       │   ├── parallel_wavegan_generator.py
│       │   ├── random_window_discriminator.py
│       │   ├── univnet_discriminator.py
│       │   ├── univnet_generator.py
│       │   ├── wavegrad.py
│       │   └── wavernn.py
│       └── utils/
│           ├── __init__.py
│           ├── distribution.py
│           └── generic_utils.py
├── TTS_additional_material/
│   ├── .gitignore
│   ├── .pre-commit-config.yaml
│   ├── .pylintrc
│   ├── .readthedocs.yml
│   ├── CODE_OF_CONDUCT.md
│   ├── README.md
│   ├── hubconf.py
│   └── requirements.txt
├── UpdateHistory.md
├── Vicuna/
│   ├── README.md
│   ├── start-webui-vicuna-gpu.bat
│   └── vicuna.ps1
├── demos/
│   ├── chat_with_keyboard.py
│   ├── demo_da_vinci.py
│   ├── demo_elevenlabs.py
│   ├── demo_google_search.py
│   ├── demo_local_search_engine.py
│   ├── demo_pyaudio.py
│   ├── demo_research_mode.py
│   └── demo_tts.py
├── env.txt
├── openai_api_chatbot.py
├── run.bat
├── saved_chats/
│   ├── 2023-03-13_ExploringtheConceptofDeepLearningandItsApplicationsintheAviationIndustry.txt
│   ├── 2023-03-25_ExploringtheAmbitiousWorldofStarCitizenIsitRightforCasualGamers.txt
│   ├── 2023-03-26_AnIntroductiontoArtificialIntelligence.txt
│   ├── 2023-03-26_ArtificialIntelligence.txt
│   ├── 2023-03-26_ExploringthePossibilityofHumanExtinction.txt
│   ├── 2023-03-26_ExploringtheUseofTransformersinImageSegmentation.txt
│   ├── 2023-03-26_FlutterDevelopment.txt
│   ├── 2023-03-26_FlutterDevelopmentLearningMethodsandManagingAppState.txt
│   ├── 2023-03-26_HumanExtinction.txt
│   ├── 2023-03-26_NuclearandElectricPropulsioninSpaceExplorationAdvantagesandChallenges.txt
│   ├── 2023-03-26_RevolutionizingHealthcare.txt
│   ├── 2023-03-26_RevolutionizingHealthcarewithArtificialIntelligenceBenefitsandOpportunities.txt
│   ├── 2023-03-26_SpacePropulsion.txt
│   ├── 2023-03-26_TagImageSegmentation.txt
│   ├── 2023-03-26_TipsforRunningScriptsatSystemStartuponWindowsandLinux.txt
│   ├── 2023-03-26_TipsforRunningStartup.txt
│   ├── 2023-03-28_AsteroidImpact.txt
│   ├── 2023-03-28_ChoosingAppleComputer.txt
│   ├── 2023-03-28_ChoosingtheRightAppleComputerforDeepLearningandMachineLearningM1vsM2ChipandOptimalMemorySize.txt
│   ├── 2023-03-28_SizeandImpactHowBigDoesanAsteroidNeedtoBetoCauseSignificantDamage.txt
│   ├── 2023-03-29_InteractiveSystem.txt
│   ├── 2023-03-29_TraininganInteractiveSystemtoRespondtoCommandsandQuestions.txt
│   ├── 2023-03-30_ExploringtheColdestTemperaturesonEarthRecordBreakingLowTemperaturesandHistoricalEstimates.txt
│   ├── 2023-03-30_ExtremeColdTemperatures.txt
│   ├── 2023-03-30_TagBeerBasics.txt
│   ├── 2023-03-30_TagSkyColors.txt
│   ├── 2023-03-30_TheBasicsofBeerUnderstandingthePopularAlcoholicBeverage.txt
│   ├── 2023-03-30_TheColorsoftheSkyExploringtheHuesandVariations.txt
│   ├── 2023-04-01_IslandTravel.txt
│   ├── 2023-04-01_RegalieattivitàperunviaggioinIslanda.txt
│   ├── 2023-04-02_SoundCardComponentsandADC.txt
│   ├── 2023-04-02_UnderstandingtheComponentsofaSoundCardandtheFunctionofADC.txt
│   ├── 2023-04-03_ExploringMultiHeadSelfAttentionforKeywordIdentificationinNaturalLanguageProcessing.txt
│   ├── 2023-04-03_TagNaturalLanguageProcessing.txt
│   ├── 2023-04-04_ExploringFeasibleOptionsforPoweringanIronManSuitSolarCellsBatteriesandNuclearReactors.txt
│   ├── 2023-04-04_PoweringIronManSuit.txt
│   ├── 2023-04-05_DockerContainerization.txt
│   ├── 2023-04-07_ApproachesforTextto.txt
│   ├── 2023-04-09_FitbitDataandSleep.txt
│   ├── 2023-04-11_Fitness.txt
│   ├── 2023-04-19_ChineseLanguages.txt
│   ├── 2023-04-19_SyntheticMeat.txt
│   ├── 2023-04-20_LabGrownMeat.txt
│   ├── 2023-04-20_SyntheticMeatCont.txt
│   ├── 2023-04-20_andTextto.txt
│   ├── 2023-05-03_AIConversations.txt
│   ├── 2023-05-03_MeaninginLife.txt
│   ├── 2023-06-07_ImageProcessing.txt
│   └── DATAFRAME.csv
├── setup.bat
├── test_TTS.py
├── tests.py
├── venv_requirements.txt
├── whisper_edits/
│   ├── __init__.py
│   └── model.py
└── workspaces/
    └── Vision_09df18b156814c80a3e1c1ab544423fc/
        ├── refy_suggestions/
        │   ├── test.csv
        │   └── test.html
        └── results/
            ├── papers.bib
            └── papers.csv

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
jarvis/
chatgptwrapper/
output.wav
.env
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/
vicuna/oobabooga-windows/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
#  and can be added to the global gitignore or merged into this file.  For a more nuclear
#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/


================================================
FILE: Assistant/Agents.py
================================================
from langchain import OpenAI, LLMChain
from langchain.llms import OpenAI
from langchain.agents import Tool, AgentExecutor, ZeroShotAgent
from langchain.memory import ConversationBufferMemory 
from langchain.agents import initialize_agent, load_tools
import datetime

from Assistant import VirtualAssistant
import os

# generate a Zero Shot React Agent with memory that looks K interactions behind
def generateReactAgent(VA:VirtualAssistant, k:int):
    # Local Search Engine ($)
    LocalSearchEngine = Tool(
        name= 'Key Search',
        func=VA.find_file,
        description="Useful when you don't know the name of a resource. Inputs should be keywords. Keywords are used to find resources. You ddon't know the name of the resources"
    )

    Save = Tool(name='Memorize', 
             func=VA.save_chat,
             description='save the current conversation. Useful for when the conversation will be needed in future')

    FileReader = Tool(
        name='Load File',
        func=VA.open_file,
        description='Useful when you have file names. Loads the content of a file given its filename'
    )
    
    Summarize = Tool(
        name='TLDR',
        func=VA.search_engine.tldr,
        description='Summarize large amounts of text'
    )
    tools = [LocalSearchEngine, Save, FileReader, Summarize]
    
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True, human_prefix='user', ai_prefix='assistant')

    # need to work on a custom LangChain llm model 
    prefix = """You are an AI research assistant designed to assist users with their academic research. You are equipped with these tools:"""
    suffix = """Begin!"

    {chat_history}
    Question: {input}
    {agent_scratchpad}"""

    prompt = ZeroShotAgent.create_prompt(
        tools, 
        prefix=prefix, 
        suffix=suffix, 
        input_variables=["input", "chat_history", "agent_scratchpad"]
    )
    llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
    agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)

    # adding a window of memory:
    memory = build_memory(chat_history = VA.current_conversation(), k=k)
    
    return AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True, memory=memory, early_stopping_method ='generate', max_iterations=2)


def build_memory(chat_history, k):
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True, human_prefix='user', ai_prefix='assistant')
    k = min(k, len(chat_history)//2)

    if k==0 :return memory

    if chat_history[-k]["role"] != 'user': 
        print('refreshing memory warning - considering last interaction only')
        k=1
    try:
        for i in range(-k*2-1, -1, 2):
            input  = chat_history[i]['content']
            output = chat_history[i+1]['content']
            memory.save_context({"input":input}, {"output":output})
    except:
        memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True, human_prefix='user', ai_prefix='assistant')
    return memory


def generateGoogleAgent(VA:VirtualAssistant, k:int):
    names = ['wikipedia', 'requests', 'open-meteo-api']
    llm = OpenAI(temperature=0)
    if len(os.getenv('SERPER_API_KEY'))>1:
        print('(using google-seper)')
        names.append('google-serper')
    elif len(os.getenv('GOOGLE_API_KEY'))>1:
        print('(using google-serch)')
        names.append('google-search')
    
    tools = load_tools(names, llm=llm)
    custom_tools = [
        Tool(
            name ='Locate me',
            func = locate_me, 
            description='useful to know the current geographical location'),
        Tool(
            name='News',
            func=news,
            description='Use this when you want to get information about the top headlines of current news stories. The input should be a keyword describing the topic'),
        Tool(
            name='Today',
            func=today,
            description='Useful to know the current day'),
        Tool(
            name='Delta days',
            func= time_between_dates,
            description='Use this you need to compute the time between two Dates. Input should be two dates in the ISO 8601 format: Year-Month-Day'
        )
        ]
    
    for item in custom_tools: tools.append(item)
        
    prefix = """Answer the question. You have also access to the following tools:"""
    suffix = """Begin!"

    {chat_history}
    Question: {input}
    {agent_scratchpad}"""

    prompt = ZeroShotAgent.create_prompt(
        tools, 
        prefix=prefix, 
        suffix=suffix, 
        input_variables=["input", "chat_history", "agent_scratchpad"]
    )
    llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
    agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)

    # adding a window of memory:
    memory = build_memory(VA.current_conversation(), k)
    
    return AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True, memory=memory, early_stopping_method = 'generate', max_iterations=4)


## FUNCTIONS FOR TOOLS
import geocoder
from newsapi import NewsApiClient

def locate_me(p):
    g = geocoder.ip('me')
    return [g.city, g.state, g.country]

def today(p):
    return str(datetime.date.today())

def time_between_dates(date1, date2):
    try:
        date1 = datetime.date.fromisoformat(date1)
        date2 = datetime.date.fromisoformat(date2)
    except:
        return 'date format incorrect'
    if date2.toordinal()>date1.toordinal(): return str(date2-date1)
    else: return str(date1-date2) 

def news(keyword):
    newsapi = NewsApiClient(api_key=os.getenv('NEWS_API_KEY'))
    top_headlines = newsapi.get_top_headlines(q=keyword)
    if len(top_headlines['articles'])==0:
        top_headlines = newsapi.get_everything(q=keyword)
        top_headlines['articles'] = top_headlines['articles'][0:min(len(top_headlines['articles']),10)]

    res = ''

    for article in top_headlines['articles']:
        res += '\nsource: '+article['source']['name']+'\n'
        res += '\n'+article['title']+'\nurl: '+article['url']+'\n'
        res += article['description']

    print(res)
    return res

================================================
FILE: Assistant/Chat.py
================================================


================================================
FILE: Assistant/VirtualAssistant.py
================================================
# import for prompt routing
from langchain import OpenAI
from langchain.agents import Tool
from langchain.agents import initialize_agent

from Assistant.research_mode import ResearchAssistant
from .Agents import generateReactAgent, generateGoogleAgent
import tiktoken

# imports for chats
import  pygame
import os
import re

import pandas as pd
from datetime import datetime
import copy
import openai
import time
import langid
import torch

import Assistant.get_audio as myaudio
from .voice import *
from .tools import Translator, LocalSearchEngine, AssistantChat
from .tools import parse_conversation, count_tokens, take_last_k_interactions
from .webui import oobabooga_textgen

# imports for audio
import whisper
import wave
import pyaudio
import speech_recognition as sr
import time
import sys
from contextlib import contextmanager

#module used for speaking during recording
import webrtcvad
      
class VirtualAssistant:
    DEBUG = True
    DEFAULT_CHAT =  AssistantChat([{"role": "system", "content": "You are a helpful assistant. You can make question to make the conversation entertaining."}])
    RESPONSE_TIME = 1.5 #values that work well in my environment (ticks, not seconds)
    SLEEP_DELAY = 3 #seconds 
    MIN_RECORDING_TIME = .5 #seconds
    MAX_RECORDING_TIME = 60 #seconds
    VAD_AGGRESSIVENESS = 2  #1-3

    MAX_TOKENS = 4096

    DEVICE_INDEX = myaudio.detect_microphones()[0]
    CHUNK = 1024
    FORMAT = pyaudio.paInt16
    CHANNELS = myaudio.get_device_channels()[DEVICE_INDEX]
    RATE = int(myaudio.get_devices()[DEVICE_INDEX]['defaultSampleRate'])

    print('using input device: ', myaudio.get_devices()[DEVICE_INDEX]['name'])

    CONVERSATION_LONG_ENOUGH = 4 #interactions (2 questions)

    def __init__(self, 
                 whisper_model=None, 
                 awake_with_keywords = ['elephant'],
                 model = "gpt-3.5-turbo",
                 embed_model = "text-embedding-ada-002",
                 translator_model = 'argostranslator',
                 
                 **kwargs):
        try:
            openai.api_key = kwargs['openai_api']
        except:
            print('OpenAI API key not found')
        
        # HEAVY STUFF FIRTS
        # Filling the GPU with the model
        if whisper_model == None:
            if 'whisper_size' not in kwargs: raise Exception('whisper model needs to be specified')
            self.interpreter = whisper.load_model(kwargs['whisper_size'])
        else:
            self.interpreter = whisper_model

        # STATUS and PROMPT ANALYZER
        if 'mode' in kwargs: 
            if kwargs['mode'].upper() != 'CHAT' and  kwargs['mode'].upper() != 'RESEARCH': raise KeyError()
            self.MODE = kwargs['mode']
        self.DIRECTORIES={
            'CHAT_DIR': os.path.realpath(os.path.join(os.getcwd(), 'saved_chats')),
            'SOUND_DIR':os.path.realpath(os.path.join(os.getcwd(), 'Assistant', 'sounds')),
            'VOICE_DIR':os.path.realpath(os.path.join(os.getcwd(), 'Assistant', 'voices'))
        }

        self.func_descript={
            "CHAT":[
                "tools: the prompt requires an action like handling a file, saving a conversation, changing some specified parameters...",
                "respond: provide an answer to a question",
                "you don't know the answer or you can't satisfy the request."],
            "RESEARCH":[
                "tools: the prompt requires one or multiple actions like reading a file, downloading a known resource",
                "respond: provide an answer based on scientific information",
            ]
        }


        # TEXT and VOICE
        if 'voice_id' in kwargs.keys():
            for item in kwargs['voice_id']:
                print(kwargs['voice_id'][item])
                kwargs['voice_id'][item] = (os.path.join(self.DIRECTORIES["VOICE_DIR"], kwargs['voice_id'][item])) + '.wav'
        else:
            kwargs['voice_id'] = os.path.join(self.DIRECTORIES["VOICE_DIR"], 'default.wav')

        self.languages = {
            'en': "English",
            'it': "Italian",
            # add yours
        }


        self.voice = Voice(write_dir = self.DIRECTORIES['SOUND_DIR'], languages = self.languages, **kwargs)
        self.translator = Translator(model=translator_model, translator_languages = list(self.languages.keys()))
        self.answer_engine = model

        self.search_engine = LocalSearchEngine(
            embed_model = embed_model, 
            tldr_model = kwargs['search_engine_llm'] if 'search_engine_llm' in list(kwargs.keys()) else model,
            translator_model=translator_model,
            translator_languages = list(self.languages.keys()))
        
        self.is_awake = False
        self.current_conversation = self.DEFAULT_CHAT

        # AUDIO 
        # initialize the VAD module
        self.vad = webrtcvad.Vad()
        self.vad.set_mode(self.VAD_AGGRESSIVENESS) 
        
        if 'awake_with_keywords' in kwargs:
            self.Keywords = kwargs['awake_with_keywords']
        else:
            self.Keywords = awake_with_keywords
        self.Keywords = awake_with_keywords
        self.ears = sr.Recognizer()


        # init finished
        self.play('system_online_bleep.mp3')
    
    # STATUS ###############################################################################################
    def switch_mode(self):
        if self.MODE == 'CHAT': 
            self.say('Moving into research mode', VoiceIdx='en', elevenlabs=True)
            self.play('Sci-Fi-UI.mp3',loop=True)
            self.init_research_mode()
            pygame.mixer.stop()
            self.play('system_online_bleep.mp3', PlayAndWait=True)
            response =  self.translator.translate('research mode is ready', to_language=langid.classify(self.current_conversation[-1]['content']), from_language='en').lower()
            return response
        else:
            self.MODE = 'CHAT'
            response =  self.translator.translate('chat mode enabled', to_language=langid.classify(self.current_conversation[-1]['content'])[0],from_language='en').lower()
            return response

    def identify_explicit_command(self, prompt):
        prompt = self.translator.translate(prompt, to_language='en').lower()
        
        # if the prompt is long it's unlikely to be an explicit command 
        # (this condition prevents false positives)
        if len(prompt.split())>15: return
        
        INTERNET_COMMANDS = [
            "do an internet search",
            "look on the web",
            "do a web search",
            "control on the internet",
            "do a search",
            "make a search",
            "perform a search",
            "perform a web search"]
        
        if ("research mode" in prompt and self.MODE=='CHAT') or ("chat mode" in prompt and self.MODE=='RESEARCH'):
            print('found explicit command')
            return '-1'
        
        if self.MODE == 'CHAT':
            if any(command in prompt for command in INTERNET_COMMANDS):
                print('found explicit command')
                return '3'
        
        if self.MODE == 'RESEARCH':
            if "new workspace" in prompt or "new environment" in prompt:
                print('found explicit command')
                return '3'
        
    
    def use_tools(self, prompt, debug = DEBUG):
        if debug: print(' -use tools ')
        # tools for chat mode
        if self.MODE == "CAHT":
            ActionManager = generateReactAgent(self, k=1)
            return ActionManager.run(input = prompt)
        
        # research mode
        else:
            return self.ResearchAssistant.agent.run(input = prompt)
            
    
    def secondary_agent(self, prompt, debug = DEBUG): 
        if self.MODE == 'CHAT':
            if debug: print(' - web surfing ')   
            WebSurfingAgent = generateGoogleAgent(self, k=1)
            return WebSurfingAgent.run(prompt)
        if self.MODE == 'RESEARCH':
            if debug: print(' - assessing new workspace ') 
            return self.ResearchAssistant.PROTOCOL_begin_new_workspace(prompt)
        

    def set_directories(self, **kwargs):
        for item in kwargs:
            try:
                if not(os.path.isdir(kwargs[item])): raise Exception

                print(f'updating {item} from {self.DIRECTORIES[item.upper()]} == to => {kwargs[item]}')
                self.DIRECTORIES[item.upper()] = kwargs[item]
                
            except:
                self.play('error.mp3', PlayAndWait=True)
                print(f"{kwargs[item]}: not found")

    def go_to_sleep(self):
        print('[Assistant going to sleep]')
        self.is_awake = False
        if len(self.current_conversation()) > self.CONVERSATION_LONG_ENOUGH:
            self.save_chat()

        self.play('sleep.mp3', PlayAndWait=True)

    # [stable]
    def analyze_prompt(self, prompt, debug = DEBUG):
        if debug: print(f' - analyzing prompt in {self.MODE} mode')
        # Hard coded options: DO this, Look on INTERNET...
        flag = self.identify_explicit_command(prompt)
        if flag is not None: return flag

        # CHAT MODE 
        if self.MODE == 'CHAT':
            context ="""You are a prompt manager. A number must always be present in your answer. You can perform some actions and decide which associated number is required. Your actions:"""
            for i, function in enumerate(self.func_descript['CHAT']):
                context += f"\n{i+1}) {function};"
            context += "\nYou can answer only with numbers. A number must always be present in your answer."
            context += """\nHere are some example:
            \nPROMPT: 'find and summarize all the files about history'\n1
            \nPROMPT: 'find a past conversation about planes'\n1
            \nPROMPT: 'do you agree?'\n2
            \nPROMPT: 'Salva questa conversazione'\n1
            \nPROMPT: 'How is the weather?'\n3
            \nPROMPT: 'credo sia giusto.'\n2
            \nPROMPT: '¿Cuál es la noticia de hoy?'\n3
            \nPROMPT: 'Thank you'\n2"""

            CHAT = [{"role": "system", "content": context},
                    {"role": "user", "content":f"PROMPT: '{prompt}'"}]

            flag = self.identify_explicit_command(prompt)

        # RESEARCH MODE    
        else:
            context ="""You are a prompt manager. A number must always be present in your answer. You can perform some actions and decide which associated number is required. You are designed to assist users with their academic research. You are equipped with a range of tools. Your tools:"""
            for i, function in enumerate(self.func_descript['RESEARCH']):
                context += f"\n{i+1}) {function};"
            context += "\nYou can answer only with numbers. A number must always be present in your answer."
            context += """\nHere are some example:
            \nPROMPT: 'begin a new project'\n1
            \nPROMPT: 'download papers about ...'\n1
            \nPROMPT: 'what are the mechanichal properties of carbon fiber?'\n2
            \nPROMPT: 'Salva questa conversazione'\n1
            \nPROMPT: 'What are the authors of the paper XYZ?'\n2
            \nPROMPT: 'What studies mention Transformers architectures?'\n2
            \nPROMPT: 'Find new papers that are similar to paper XYZ'\n1
            \nPROMPT: 'Tell me more'\n2
            \nPROMPT: 'what is up?\n2"""

            CHAT = [{"role": "system", "content": context},
                    {"role": "user", "content":f"PROMPT: '{prompt}'"}]


        if debug: print(' - - submitting request')
        response = openai.ChatCompletion.create(
                    model="gpt-3.5-turbo",
                    temperature=0,
                    max_tokens=10,
                    messages=CHAT)
        flag = response['choices'][0]['message']['content']
        if debug: print(' - - got answer')
        
        return flag

    # CONVERSATION ################################################################################

    def start_new_conversation(self):
        if len(self.current_conversation)>2: 
            print('forgetting the last conversation')
        self.current_conversation = self.DEFAULT_CHAT

    def expand_conversation(self, role, content): self.current_conversation.append({"role":role, "content":content})

    def get_answer(self, question, optimize_cuda = False, debug=DEBUG):
        if debug: print(' - thinking')
        if self.MODE == "CHAT":
            temp = copy.deepcopy(self.current_conversation())
            temp.append({"role":"user", "content":question})

            self.play('thinking.mp3', loop=True)

            if self.answer_engine == 'gpt-3.5-turbo':
                if debug: print(' - - submitting request')
                API_response = openai.ChatCompletion.create(
                    model=self.answer_engine,
                    messages=temp)
                answer = API_response['choices'][0]['message']['content']
                if debug: print(' - - got answer')
                
            
            elif self.answer_engine == 'anon8231489123_vicuna-13b-GPTQ-4bit-128g':
                lang_id = langid.classify(question)[0]
                if optimize_cuda:
                    # free space on the GPU
                    self.deallocate_whisper()
                # use GPU to process the answer
                answer = oobabooga_textgen(prompt = temp)
                answer = self.translator.translate(answer, from_language=langid.classify(answer)[0], to_language=lang_id)
                if optimize_cuda:
                    # try to get the model back to GPU
                    self.allocate_whisper()

            elif self.answer_engine == 'eachadea_ggml-vicuna-13b-4bit':
                answer = oobabooga_textgen(prompt = question)
        
        # RESEARCH MODE
        else:
            if self.ResearchAssistant.query_engine == None:
                return 'error: no workspace loaded. I cannot provide precise information without a workspace loaded on research mode'
            res = self.ResearchAssistant.query_engine.query(question)
            answer = res.response
        pygame.mixer.stop()

        self.expand_conversation(role="assistant", content=answer)

        self.last_interaction = time.perf_counter()
        if debug: print(' - - finished')
        return answer

    def save_chat(self, debug = DEBUG):
        if debug: print(' - saving chat')
        if not os.path.isdir(self.DIRECTORIES['CHAT_DIR']): os.mkdir(self.DIRECTORIES['CHAT_DIR'])

        if not self.current_conversation.is_saved(): 
            if debug: print(' - - generating title')
            title = self.get_answer(question="generate a very short title for this conversation")
            self.say(f'I am saving this conversation with title: {title}', VoiceIdx='en', IBM=False, elevenlabs=True)

            self.play('data_writing.mp3', PlayAndWait=True)
            
            prompt =  [{"role": "system", "content": "You don't like redundancy and use as few words as possible"},
                {"role":"user", "content":f"Associate a tag to this title: {title} \nHere is an example: 'Exploring Text to Speech Popular Techniques and Deep Learning Approaches' is associated to 'Deep Learning'"}]
            
            if debug: print(' - - submitting request')
            API_response = openai.ChatCompletion.create(
                        model='gpt-3.5-turbo',
                        max_tokens=5,
                        temperature=0,
                        messages=prompt)
            if debug: print(' - - got answer')
            if debug: print(' - - processing response')
            answer = API_response['choices'][0]['message']['content']
            answer = re.sub(r'[^\w\s]', '',answer)
            answer = re.sub(' ', '',answer)
            fname = str( str(datetime.today().strftime('%Y-%m-%d')) + '_' + str(answer)+'.txt')
            self.current_conversation.filename = fname
        else:
            self.say(f'I am overwriting the conversation {fname}', VoiceIdx='en', IBM=False, elevenlabs=True)
            fname = self.current_conversation.filename

        with open(os.path.join(self.DIRECTORIES['CHAT_DIR'], fname), 'w') as f:
            for message in self.current_conversation():
                f.write(message["role"]+ ': ' + message["content"]+'\n')
            f.close()
        
        self.is_awake = False
        return f"file: {os.path.join(self.DIRECTORIES['CHAT_DIR'], fname)} saved successfully"


    # ACTIONS ##################################################################################
    def init_research_mode(self, workspace=None):
        
        if workspace is None:
            # get last created workspace
            if 'workspaces' in os.listdir(os.getcwd()):
                search_dir = os.path.join('workspaces')
                subdirs = os.listdir(search_dir)
                subdirs.sort(key=lambda fn: os.path.getmtime(os.path.join(search_dir, fn)))
                subdirs.reverse()
                for subd in subdirs:
                    folder_path = os.path.join('workspaces',subd)
                    if os.path.isdir(folder_path):
                        self.say('loading the last created workspace', VoiceIdx='en', elevenlabs=True)
                        workspace = os.path.abspath( folder_path )
                        break

        self.play('Sci-Fi-UI.mp3',loop=True)
        self.MODE = 'RESEARCH'
        self.ResearchAssistant = ResearchAssistant(
            current_conversation=self.current_conversation,
            index_name='paperquestioning',
            workspace=workspace)
        pygame.mixer.stop()
        
    def deallocate_whisper(self):
        model_name = self.interpreter.name
        model_current_device = self.interpreter.device

        self.interpreter = None
        torch.cuda.empty_cache()

        if model_current_device.type == 'cuda':
            print('loading Whisper model to cpu')
            self.interpreter = whisper.load_model(model_name, device='cpu')
        torch.cuda.empty_cache()

    def allocate_whisper(self):
        model_name = self.interpreter.name
        model_current_device = self.interpreter.device
        self.interpreter = None

        torch.cuda.empty_cache()
        if model_current_device.type == 'cpu':
            try:
                torch.cuda.empty_cache()
                print('loading Whisper model to CUDA')
                self.interpreter = whisper.load_model(model_name, device='cuda')
            except:
                self.interpreter = None
                print(f"cuda dedicated memory isufficient: {torch.cuda.memory_allocated()/1e6} GB already occupuied")
                print(f"keeping Whisper model to cpu")
                self.interpreter = whisper.load_model(model_name, device='cpu')
        torch.cuda.empty_cache()
                
    def switch_whisper_device(self):
        model_name = self.interpreter.name
        model_current_device = self.interpreter.device

        self.interpreter = None
        torch.cuda.empty_cache()

        if model_current_device.type == 'cuda':
            print('loading Whisper model to cpu')
            self.interpreter = whisper.load_model(model_name, device='cpu')
            torch.cuda.empty_cache()
        else:
            try:
                torch.cuda.empty_cache()
                print('loading Whisper model to CUDA')
                self.interpreter = whisper.load_model(model_name, device='cuda')
            except:
                print(f"cuda dedicated memory isufficient: {torch.cuda.memory_allocated()/1e6} GB already occupuied")
                print(f"keeping Whisper model to cpu")
                self.interpreter = whisper.load_model(model_name, device='cpu')
                torch.cuda.empty_cache()
    
    def open_file(self, filename, debug=DEBUG):
        if debug: print(' - opening file')
        # look for the file
        file = None
        for fname in os.listdir(self.DIRECTORIES['CHAT_DIR']):
            
            # look for sub-strings (in case extension is forgotten)
            if filename in fname:
                file = open(os.path.join(self.DIRECTORIES['CHAT_DIR'], filename), 'r')
                file = file.read()

        if file is None: return 'No such file'

        return file  
   

    def find_file(self, keywords, n=3, debug=DEBUG):
        if debug: print(' -finding file')
        #self.play('thinking.mp3', loop=True)
        summary = self.search_engine.accurate_search(key=keywords, from_csv=True, n=n)
        # self.play('wake.mp3')

        response = ''
        for i in range(n):
            response += f"\nFilename: {summary.file_names[i]} ; Topics discussed: {summary.tags[i]}" 
        return response


    # SPEAK ####################################################################################
    def play(self, fname, PlayAndWait=False, loop=False, debug = DEBUG):
        if loop: loop=-1
        else: loop = 0

        if pygame.mixer.get_init() is None: pygame.mixer.init()
        if debug: print(' - playing')
        try:
            pygame.mixer.music.load(os.path.join(self.DIRECTORIES["SOUND_DIR"], fname))
        except Exception as e:
            print(e)
            return
        pygame.mixer.music.set_volume(0.5)
        pygame.mixer.music.play(loops=loop)

        if PlayAndWait:
            while(pygame.mixer.music.get_busy()):pass
        if debug: print(' - -  finihed playing')

    def say(self, text, VoiceIdx='jarvis', elevenlabs=False, IBM=False):  
        langIdx = langid.classify(text)[0]

        print(f"[Assistant]: {text}")
        if elevenlabs and IBM: raise(Exception('IBM and ElevenLabs can t be both true'))
        
        if elevenlabs:
            try:
                try: 
                    self.voice.speak(text=text, VoiceIdx=langIdx, elevenlabs=True, IBM=False, mode='online')
                    return
                except Exception as e:
                    print(f"couldn t speak with: {e}")
                    self.voice.speak(text=text, VoiceIdx=langIdx, elevenlabs=False, IBM=True, mode='online')
                    return
            except:
                self.voice.speak(text=text, VoiceIdx=VoiceIdx, elevenlabs=False, IBM=False, mode='offline')
                return
        
        elif IBM:
            try:
                try: 
                    self.voice.speak(text=text, VoiceIdx=langIdx, elevenlabs=False, IBM=True, mode='online')
                    return
                except:
                    self.voice.speak(text=text, VoiceIdx=langIdx, elevenlabs=True, IBM=False, mode='online')
                    return
            except:
                self.voice.speak(text=text, VoiceIdx=VoiceIdx, elevenlabs=False, IBM=False, mode='offline')
                return
        
        try:
            self.voice.speak(text=text, VoiceIdx='jarvis',elevenlabs=False, IBM=False, mode='offline')
        except Exception as e:
            self.voice.speak(text=text, VoiceIdx=langIdx, elevenlabs=False, IBM=False, mode='offline')
            print(VoiceIdx, elevenlabs, IBM)
            print(e)
            raise Exception('No such specifications')


    # LISTEN #############################################################################################

    
    #function that blocks the code until the wakeword, or wakewords are encountered
    def block_until_wakeword(self, verbosity=False):        
        if verbosity: print("listening passively...", end="")

        from struct import unpack_from
        import pvporcupine

        #initialize values
        porcupine = None
        pa = None
        audio_stream = None

        try:
            porcupine = pvporcupine.create(access_key=os.environ["PORCUPINE_KEY"], 
                                        keywords=self.Keywords)

            pa = pyaudio.PyAudio()

            audio_stream = pa.open(
                rate=porcupine.sample_rate,
                channels=1,
                format=pyaudio.paInt16,
                input=True,
                frames_per_buffer=porcupine.frame_length)

            #not strictly necessary, but helps debug if something overwrote the keywords
            print(f"Listening for wake word '{self.Keywords[0]}'...")

            #loop to preform while waiting(does not noticeably use the CPU)
            while True:
                pcm = audio_stream.read(porcupine.frame_length)
                pcm = unpack_from("h" * porcupine.frame_length, pcm)

                keyword_index = porcupine.process(pcm)

                #same actions activated as previous function
                #NOTE: keyword_index is -1 unless wakeword encountered, then it's the index of the wakeword in the list
                #(different wakewords activate different profiles?)
                if keyword_index >= 0:
                    print("wakeword encountered")
                    self.start_new_conversation()
                    self.play('wake.mp3',PlayAndWait=False)
                    self.is_awake = True
                    return
        finally:
            #clean up
            if audio_stream is not None:
                audio_stream.close()
            if pa is not None:
                pa.terminate()
            if porcupine is not None:
                porcupine.delete()
     
    def listen_passively(self, verbosity=False):
        with sr.Microphone() as source:
            if verbosity: print("listenting passively...", end="")
            audio = self.ears.listen(source)
            query = ''

            try: 
                query = self.ears.recognize(audio)
                if verbosity: print(f"user said: {query}")
            except Exception as e:
                if verbosity: print(str(e))
        
        # if any keyword is present in the query return True (awake the assistant)
        if any(keyword in query.split() for keyword in self.Keywords):
            self.start_new_conversation()
            self.play('wake.mp3',PlayAndWait=False)
            self.is_awake = True
        
    def record_to_file(self, file_path):
        wf = wave.open(file_path, 'wb', )
        wf.setnchannels(self.CHANNELS)
        sample_width = pyaudio.PyAudio().get_sample_size(self.FORMAT)
        wf.setsampwidth(sample_width)
        frames = self.record()
        
        wf.setframerate(self.RATE)
        wf.writeframes(b''.join(frames))
        wf.close()
    
    def record(self):
        # Your current setup
        vad_rate = 32000
        frame_length_ms = 20
        vad_CHUNK = (vad_rate * frame_length_ms) // 1000


        p = pyaudio.PyAudio()
        vad_stream = p.open(format=self.FORMAT,
                        channels=1,
                        rate=vad_rate,
                        input=True,
                        frames_per_buffer=vad_CHUNK)
        
        rec_stream = p.open(format=self.FORMAT,
                        channels=self.CHANNELS,
                        rate=self.RATE,
                        input=True,
                        frames_per_buffer=self.CHUNK)

        frames = []
        try:
            silence_time = 0
            speaked = False
            is_voice = False
            print("listening...")
            
            start_time = time.perf_counter()
            while True:
                rec_data = rec_stream.read(self.CHUNK)
                frames.append(rec_data)

                # detect voice activity
                data = vad_stream.read(vad_CHUNK) 
                                       
                try:
                    is_voice = self.vad.is_speech(data, vad_rate) 
                except Exception as e:
                    print(f"Error during VAD: {e}")

                # Calculate time since the last voice activity
                if is_voice and (time.perf_counter()-start_time)>self.MIN_RECORDING_TIME:
                    speaked = True
                    silence_time = 0
                else:
                    silence_time += frame_length_ms / 1000

                # Print debugging information (useful for tuning sensitivity)

                # Stop recording if silence duration exceeds the threshold or if the time limit is reached
                if (silence_time > self.RESPONSE_TIME and speaked) or (time.perf_counter() - start_time > self.MAX_RECORDING_TIME):
                    break

                if silence_time > self.MAX_RECORDING_TIME:
                    self.go_to_sleep()
                    break
                
                time.sleep(frame_length_ms / 10000)

        except KeyboardInterrupt:
            print("Done recording")
        except Exception as e:
            print(str(e))
            print(silence_time,self.RESPONSE_TIME,self.SLEEP_DELAY)
            exit()

        vad_stream.stop_stream()
        vad_stream.close()
        rec_stream.stop_stream()
        rec_stream.close()
        p.terminate()
        return frames
    

@contextmanager
def suppress_stdout():
    with open(os.devnull, "w") as devnull:
        old_stdout = sys.stdout
        sys.stdout = devnull
        try:  
            yield
        finally:
            sys.stdout = old_stdout


================================================
FILE: Assistant/__init__.py
================================================


================================================
FILE: Assistant/get_audio.py
================================================
import whisper
import pyaudio

# CHUNK = 1024
# FORMAT = pyaudio.paInt16
# CHANNELS = 2
# RATE = 44100
# SILENCE_THRESHOLD = 1500

# convert audio content into text
def whisper_wav_to_text(audio_name, model=[], model_name=False, prior=None):
    if isinstance(model_name, str):
        print('loading model ', model_name)
        model = whisper.load_model(model_name)

    if model == []:
        raise Exception("model cannot be unspecified")

    print('listening to ',audio_name,'...')
    # load audio and pad/trim it to fit 30 seconds
    audio = whisper.load_audio(audio_name)
    audio = whisper.pad_or_trim(audio)

    # make log-Mel spectrogram and move to the same device as the model
    mel = whisper.log_mel_spectrogram(audio).to(model.device)

    # detect the spoken language
    try:
        _, probs = model.detect_language(mel)
        if not(prior is None):
            filt_probs = {str(lan):probs.get(lan) for lan in prior}
            probs = filt_probs
        print(f"Detected language: {max(probs, key=probs.get)}")
        detected_lang = str(max(probs, key=probs.get))

        options = whisper.DecodingOptions(language=detected_lang)
    except:
        # model does not support multiple languages, default to English
        print('language: en')
        detected_lang = 'en'
        options = whisper.DecodingOptions(language='en')
    
    result = whisper.decode(model, mel, options)

    # print the recognized text
    print('\n[User]: '+ result.text)
    return result.text, detected_lang

def get_device_channels():
    p = pyaudio.PyAudio()
    DEVICES = {}
    for i in range(p.get_device_count()):
        dev = p.get_device_info_by_index(i)
        DEVICES[i] = dev['maxInputChannels']
    return DEVICES

def detect_microphones():
    p = pyaudio.PyAudio()
    MICS = []
    for i in range(p.get_device_count()):
        dev = p.get_device_info_by_index(i)
        if 'microphone' in dev['name'].lower():
            MICS.append(i)
    
    return MICS if len(MICS)>=1 else [0]

def get_devices():
    p = pyaudio.PyAudio()
    DEV = []
    for i in range(p.get_device_count()):
        DEV.append( p.get_device_info_by_index(i))
    return DEV
        

================================================
FILE: Assistant/research_mode.py
================================================
# AGENT
from langchain import OpenAI, LLMChain, PromptTemplate
from langchain.llms import OpenAI
from langchain.agents import Tool, AgentExecutor, ZeroShotAgent
from langchain.memory import ConversationBufferMemory 
from langchain.agents import initialize_agent, load_tools

from typing import Any
from Assistant.semantic_scholar.agent_tools import *
from Assistant.semantic_scholar.S2_tools import *
from langchain.agents import AgentType

class ResearchAssistant:
    def __init__(self, current_conversation, workspace = None, index_name = 'paperquestioning'):
        self.current_workspace = workspace
        self.index_name = index_name
        self.query_engine = None
        self.current_conversation = current_conversation

        self.ans = []
        self.docs = []

        if 'workspaces' not in os.listdir(os.getcwd()):
            os.mkdir('workspaces')
            print('\tinitializing Research Assistant but no workspace are available: begin a new search please')
        elif len(os.listdir('workspaces'))>0:          
            self.boot_workspace(workspace)

        print('\tResearch Assistant initialization done')      
        self.agent = generateResearchAgent(self, k=1)

    def boot_workspace(self, workspace):
        if os.path.isdir(workspace):
                print('\tinitializing Research Assistant with Workspace directory: ', workspace)

                # init vector store
                init_attempts = 0
                print(' ')
                self.docs = load_workspace(workspace)
                while True:
                    init_attempts +=1

                    try:
                        self.query_engine, self.Index = llama_query_engine(self.docs,pinecone_index_name=self.index_name)
                        break
                    except Exception as e:
                        print(f'initialization attempt {init_attempts} failed with exception {e}')
                        time.sleep(2)
                        if init_attempts<=3:continue
        else: return None

    # make wrappers to store results and info
    def wrapper_find_papers_from_query(self, query):  
        try:
            res = find_paper_from_query(query, 20)
            text = ''
            for result in res:
                if not result['isOpenAccess']:continue
                text += result['title']+'; paperId: '+result['paperId']
                self.ans.append( f"{result['title']}: {result['paperId']}")
            if len(text)==0: return "couldn't find any open access result"
            return text
        except Exception as e:
            return f'error: {e}'
        

    def load_pdf_to_pinecone(self, paths):
        # read the pdf
        if isinstance(paths, str): paths = [paths]
        if isinstance(paths, list): raise Exception

        for path in paths:
            if not path.endswith('.pdf'): continue
            content = readPDF(path)
            doc = Document(
                text = content,
                doc_id = uuid.uuid4().hex
            )
            self.docs.append(doc)

        # upload to Pinecone and synch index
        self.Index.insert(document=doc)
        # regresh the query engine
        self.query_engine = self.Index.as_query_engine()
        return

    def PROTOCOL_begin_new_workspace(self, query):
        # PRELIMINARY ASSESSMENT
        prompt_template = "Do you really need to search for something on internet?: {query} \n Answer Yes or No"
        llm = OpenAI(temperature=0)
        llm_chain = LLMChain(
            llm=llm,
            prompt=PromptTemplate.from_template(prompt_template)
        )
        assessment =  llm_chain.predict(query = query)
        if 'yes' not in assessment.lower():
            return 'what should be the topic of the workspace?'
        
        # GOING WITH IT
        # preprocessing
        prompt_template = "Extract a search key from the following query: {query}"
        llm = OpenAI(temperature=0)
        llm_chain = LLMChain(
            llm=llm,
            prompt=PromptTemplate.from_template(prompt_template)
        )
        # extraction of a query
        search_query = llm_chain.predict(query = query)

        # post processing of extracted search-query
        search_query = re.sub('[^0-9a-zA-Z]', ' ', search_query.lower())
        if "search key" in search_query: search_query=search_query.replace("search key","").strip()

        print('SEARCH QUERY: ', search_query)
        self.current_workspace = PaperSearchAndDownload(query=search_query)
        self.boot_workspace(self.current_workspace)
        return f'new workspace created at {self.current_workspace}'
    
    def wrapper_download_paper(self, id):
        if 'cache' not in os.listdir(self.current_workspace): os.mkdir(os.path.join(self.current_workspace,'cache'))
        ans = download_pdf_from_id(paperid= id, path= os.path.join(self.current_workspace, 'cache'))
        update_workspace_dataframe(self.current_workspace, verbose = False)
        pdf_paths = ans.split('\n')
        self.load_pdf_to_pinecone(pdf_paths)
        return ans
    
    def wrapper_find_reccomendation(self, paperId):
        return find_recommendations(paper=paperId, result_limit=5)
    
    def find_in_papers(self, query):
        attempt =0
        while True:
            try: 
                attempt +=1
                answer = self.query_engine.query(query)
            except Exception as e:
                if attempt<=3: continue
                return str(e)
            return answer
        

# generate a Zero Shot React Agent with memory that looks K interactions behind
def generateResearchAgent(RA:ResearchAssistant, k:int):

    findpapers = Tool(
        name='Find from query',
        description='find a paper from a query, title and/or other information.',
        func= RA.wrapper_find_papers_from_query
    )

    download_ID = Tool(
        name='Download ID',
        description='download a paper from paperId. Take as input a paperId',
        func=RA.wrapper_download_paper
    )


    peek = Tool(
        name='glimpse pdf',
        description="get paper information if available, Take as input a paper title",
        func=glimpse_pdf
    )

    reccomend = Tool(
        name='find reccomendations',
        description='find similar paper from paperId. Take as input a paperId',
        func=RA.wrapper_find_reccomendation
    )
    
    tools = [findpapers, download_ID, peek, reccomend]
    
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True, human_prefix='user', ai_prefix='assistant')

    # need to work on a custom LangChain llm model 
    prefix = """You are an assistant designed to browse scientific libraries, you have the following tools to complete the user requests:"""
    suffix = """Begin!"

    {chat_history}
    Question: {input}
    {agent_scratchpad}"""

    prompt = ZeroShotAgent.create_prompt(
        tools, 
        prefix=prefix, 
        suffix=suffix, 
        input_variables=["input", "chat_history", "agent_scratchpad"]
    )
    llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
    agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)

    # adding a window of memory:
    memory = build_memory(chat_history = RA.current_conversation(), k=k)
    
    return AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True, memory=memory, early_stopping_method ='generate', max_iterations=20)

def build_memory(chat_history, k):
    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True, human_prefix='user', ai_prefix='assistant')
    k = min(k, len(chat_history)//2)

    if k==0 :return memory

    if chat_history[-k]["role"] != 'user': 
        print('refreshing memory warning - considering last interaction only')
        k=1
    try:
        for i in range(-k*2-1, -1, 2):
            input  = chat_history[i]['content']
            output = chat_history[i+1]['content']
            memory.save_context({"input":input}, {"output":output})
    except:
        memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True, human_prefix='user', ai_prefix='assistant')
    return memory


================================================
FILE: Assistant/semantic_scholar/S2_tools.py
================================================
import csv
import re
from time import time
import requests
import dotenv
import aspose.pdf as ap
dotenv.load_dotenv()

import argparse
import os
from requests import Session
from typing import Generator, Union
import subprocess
import urllib3
import json
urllib3.disable_warnings()

import refy

import pdftitle
from langchain.document_loaders import OnlinePDFLoader
import time
import arxiv
from pymed import PubMed

from .simple import Main

RESULT_LIMIT = 10
S2_API_KEY = os.environ['S2_API_KEY']

PAPER_FIELDS = 'paperId,externalIds,title,authors,year,abstract,isOpenAccess,openAccessPdf,influentialCitationCount,citationStyles,tldr,venue,journal'

def get_paper(session: Session, paper_id: str, fields: str = 'paperId,title', **kwargs) -> dict:
    params = {
        'fields': fields,
        **kwargs,
    }
    headers = {
        'x-api-key': S2_API_KEY,
    }

    with session.get(f'https://api.semanticscholar.org/graph/v1/paper/{paper_id}', params=params, headers=headers) as response:
        response.raise_for_status()
        return response.json()


def find_paper_from_query(query, result_limit=RESULT_LIMIT):
    papers = None
    while papers is None:
        try:
            while True:
                rsp = requests.get('https://api.semanticscholar.org/graph/v1/paper/search',
                                        params={'query': query, 'limit': result_limit, 'fields': PAPER_FIELDS})
                if rsp.status_code == 429: 
                    time.sleep(60)
                    continue
                break
            
            rsp.raise_for_status()
            results = rsp.json()
            total = results["total"]
            if not total:
                print('No matches found. Please try another query.')
                return 'No matches found. Please try another query.'

            
            papers = results['data']
            filtered = []
            for paper in (papers):
                if paper['isOpenAccess']: filtered.append(paper)
                if len(filtered)>=result_limit:break
                    
        except Exception as e:
            print('\n!!!!!!!!\n')
            print('ERROR OCCURRED: ',e)
            print(rsp)
            return rsp.status_code

    print(f'Found {total} results. OpenAccess: {len(filtered)}.')
    return papers

# Finds papers which are similar to an exisiting one
def find_recommendations(paper, result_limit = RESULT_LIMIT):

    print(f"Looking for up to {result_limit} recommendations based on: {paper['title']}")
    rsp = requests.get(f"https://api.semanticscholar.org/recommendations/v1/papers/forpaper/{paper['paperId']}",
                       params={'fields': 'title,url,isOpenAccess', 'limit': result_limit})
    
    rsp.raise_for_status()
    results = rsp.json()
    print_papers(results['recommendedPapers'])
    return results['recommendedPapers']

def extract_title(path):
    try:
        title = pdftitle.get_title_from_file(path)

        # remove non letter chars
        title = re.sub('[^0-9a-zA-Z]', ' ', title)

        # sometime the full text is returned instead of just the title. in that 
        # case use a coarse title detection
        if len(title.split())>30: raise Exception
        return title
    except:
        # ROUGH TITLE DETECTION
        loader = OnlinePDFLoader(path)
        data  = loader.load()
        text_content = ''
        formatted_content = data[0].page_content.replace('\n\n', ' ')
        text_content+=formatted_content
        
        # CONSIDER THE FIRST 100 WORDS
        # exclude single chars and remove puncts
        title = ' '.join([word for word in text_content.split()[0:min(100,len(text_content.split()))] if (len(word)>1)])
        title = re.sub('[^0-9a-zA-Z]', ' ', title)

        # option 1: take the capital words contained in the first 100 words
        title = ' '.join([word for word in title.split()[0:min(100,len(title.split()))] if (word.isupper() and len(word)>1)])
        if len(title)>20: title = " ".join(title.split()[0:10])

        # option 2: take the first 10 words 
        else:
            title = [word for word in text_content.split()[0:100] if (len(word)>1)]
            title = ' '.join( title[0:min(10,len(title))])

        print(' > generated title: ', title)
        if title=='': return []
        return title

def find_paper_online(path):
    # if every word of the original title is present in the result, return it
    def same_title(title1, title2):
        return (word in title2.lower().split() for word in title1.lower().split())
    
    # OPEN AND EXTRACT PAPER TITLE
    title = extract_title(path)

    # LOOK IN OTHER PAPER DATABASES
    # 1) scholar attempt
    while True:
        res = find_paper_from_query(title, result_limit=5)
        if isinstance(res, int):
            if res == 400: raise Exception
            if res == 429: 
                time.sleep(60)
            continue
        break
    if isinstance(res, list):
        for article in res:
            if same_title(title, article['title']): return article

    # 2) arxiv attempt
    search = arxiv.Search(
        query=title,
        id_list= [],
        max_results=5,
    )
    res = search.results()
    for article in res:
        if same_title(article.title, title): 
            return  article._raw
        
    # 3) pubmed attempt
    pubmed = PubMed(tool="MyTool", email="my@email.address")
    res = pubmed.query(title, max_results=5)
    
    for article in res:
        art = json.loads(article.toJSON())
        if same_title(title, art['title']):return art

    # noting found :(
    return 
    
def print_papers(papers):
    results = ''
    for idx, paper in enumerate(papers):
        results+= f"{idx}  {paper['title']} {paper['url']}"
    return results


def chunks(items, chunk_size):
    return [items[i:i + chunk_size] for i in range(0, len(items), chunk_size)]


def fetch_paper_batch(paperid: list):
    req = {'ids': [f'{id}' for id in paperid]}
    # https://api.semanticscholar.org/api-docs/graph#tag/Paper-Data/operation/post_graph_get_papers
    
    rsp = requests.post('https://api.semanticscholar.org/graph/v1/paper/batch',
                        params={'fields': PAPER_FIELDS},
                        json=req)
    
    if rsp.status_code != 200:
        return f'Problem fetching {req}: ' + rsp.text
    return rsp.json()


def download_pdf_from_id(paperid, path=os.getcwd()):
    try:
        res = Main(paper_ids=paperid, dir=path)
        print(res)
        return res
    except:
        return f'error with {paperid}'

# add to PAPER.CSV semantic scolar entries from ID
def update_dataframe(incomplete, dest):
    results = fetch_paper_batch(paperid= [item['paperId'] for item in incomplete])
    if isinstance(results, str): 
        print(results) 
        print(f" input: {incomplete}")
        return
    pdf_des= dest
    pdf_des = pdf_des[:-4] + '.pdf'
    text = ''

    with open(pdf_des, 'a+',encoding='utf-8') as f:
        for paper in results:
            try:
                text += paper['title'].upper()+'\n'
                if 'tldr' in paper.keys():
                    if paper['tldr'] is not None:text += paper['tldr']['text']+'\n'
                text += paper['abstract']+'\n'
                if 'summary' in paper.keys(): text += paper['summary']+'\n'
                text += '\n\n'
            except:
                pass

    write_to_pdf(text, pdf_des)

    count = 0

    # Read existing entries from the CSV file
    existing_entries = set()

    isFile =  os.path.isfile(dest)
    if not isFile:
        with open(dest, 'w',encoding='utf-8') as fp:
            csvfile = csv.DictWriter(fp, ['paperId', 'title', 'first_author', 'year', 'abstract','tldr','bibtex','influentialCitationCount','venue','journal','pages'])
            csvfile.writeheader()
    if isFile:
        with open(dest, 'r',encoding='utf-8') as fp:
            csvfile = csv.DictReader(fp)
            for row in csvfile:
                existing_entries.add(row['paperId'])

    # Append new entries to the CSV file
    with open(dest, 'a', encoding='utf-8') as fp:
        csvfile = csv.DictWriter(fp, ['paperId', 'title', 'first_author', 'year', 'abstract','tldr','bibtex','influentialCitationCount','venue','journal','pages'])
        
        for paper in results:
            paperId = paper['paperId']
            if paperId in existing_entries:
                continue  # Skip if the entry already exists
            
            paper_authors = paper.get('authors', [])
            
            journal_data = {}
            if 'journal' in paper:
                journal_data = paper.get('journal',[])
            if journal_data is not None:
                if 'name' not in journal_data: journal_data['name'] = ''
                if 'pages' not in journal_data: journal_data['pages'] = ''

            if paper.get('tldt',[]) != []:
                tldr = paper['tldr']['text']
            elif paper.get('summary',[]) != []:
                tldr = paper['summary']
            else:
                tldr = paper['abstract']

            csvfile.writerow({
                'title': paper['title'],
                'first_author': paper_authors[0]['name'] if paper_authors else '',
                'year': paper['year'],
                'abstract': paper['abstract'],
                'paperId': paperId,
                'tldr':tldr,
                'bibtex':paper['citationStyles']['bibtex'] if paper['citationStyles']['bibtex'] else '',
                'influentialCitationCount':paper['influentialCitationCount'],
                'venue':paper['venue'],
                'journal':journal_data['name'] if journal_data is not None else '',
                'pages':journal_data['pages'] if journal_data is not None else '',
            })
            # except Exception as e:
            #     print('error adding paper: ',e, '\n',paper)
            #     paper['title']
            #     paper['year']
            #     paper['abstract']
            #     paper['citationStyles']['bibtex']
                
            #     if paper['tldr']: paper['tldr']
            #     if paper_authors: paper_authors[0]['name']

            #     quit()
            count += 1

    return f'Added {count} new results to {dest}'


def write_bib_file(csv_file, bib_file=None):
    if bib_file is None:
        bib_file = csv_file[:-4]+'.bib'
    with open(csv_file, 'r', encoding='utf-8') as file:
        reader = csv.DictReader(file)
        with open(bib_file, 'w', encoding='utf-8') as output:
            print(f'writing bibtex file at {bib_file}')
            for row in reader:
                bib_entry = create_bib_entry(row)
                output.write(bib_entry + '\n\n')

def create_bib_entry(row):
    paper_id = row['paperId']
    title = row['title']
    author = row['first_author']
    year = row['year'].split('-')[0] # assuming format like 2023-03-24T15:46:10Z (arxiv use this)

    journal_match = re.search(r"journal\s*=\s*{([^}]*)}", row['bibtex'])
    if journal_match:
        journal = journal_match.group(1)
    else: journal = ''
    pages_match = re.search(r"pages\s*=\s*{([^}]*)}", row['bibtex'])
    if pages_match:
        pages = pages_match.group(1)
    else: pages = ''

    abstract = replace_non_alphanumeric(row['abstract'])

    # Generate the BibTeX entry
    bib_entry = f"@ARTICLE{{{paper_id},\n"
    bib_entry += f"  title     = \"{title}\",\n"
    bib_entry += f"  author    = \"{author}\",\n"
    bib_entry += f"  abstract  = \"{abstract}\",\n"
    bib_entry += f"  year      = {year},\n"
    bib_entry += f"  journal   = \"{journal}\",\n"
    bib_entry += f"  pages     = \"{pages}\"\n"
    bib_entry += "}"

    return bib_entry


def replace_non_alphanumeric(string, replacement=' '):
    pattern = r'[^a-zA-Z0-9]'
    replaced_string = re.sub(pattern, replacement, string)
    return replaced_string


def refy_reccomend(bib_path, number=20):
    d = refy.Recomender(
        bib_path,            # path to your .bib file
        n_days=30,               # fetch preprints from the last N days
        html_path=os.path.join(os.path.join(bib_path.replace('\\results\\papers.bib',''),'refy_suggestions'),"test.html"),   # save results to a .csv (Optional)
        N=number                 # number of recomended papers 
        )


def write_to_pdf(text, dest):
    # Initialize document object
    document = ap.Document()

    # Add page
    page = document.pages.add()

    # Initialize textfragment object
    text_fragment = ap.text.TextFragment(text)

    # Add text fragment to new page
    page.paragraphs.add(text_fragment)

    # Save updated PDF
    document.save(dest)

================================================
FILE: Assistant/semantic_scholar/__init__.py
================================================
#

================================================
FILE: Assistant/semantic_scholar/agent_tools.py
================================================
from contextlib import contextmanager
import uuid
import os
import tiktoken

from . import S2_tools as scholar

import csv
import sys
import requests

# pdf loader
from langchain.document_loaders import OnlinePDFLoader

## paper questioning tools
from llama_index import Document
from llama_index.vector_stores import PineconeVectorStore
from llama_index import GPTVectorStoreIndex, StorageContext, ServiceContext
from llama_index.embeddings.openai import OpenAIEmbedding
 

def PaperSearchAndDownload(query):
    # make new workspace 
    if not os.path.exists( os.path.join(os.getcwd(),'workspaces') ): os.mkdir(os.path.join(os.getcwd(),'workspaces'))
    workspace_dir_name = os.path.join(os.getcwd(),'workspaces',query.split()[0] + '_'+ str(uuid.uuid4().hex))
    os.mkdir(workspace_dir_name)
    os.mkdir(os.path.join(workspace_dir_name,'results'))
    os.mkdir(os.path.join(workspace_dir_name,'refy_suggestions'))
    os.environ['workspace'] = workspace_dir_name

    # 1) search papers
    print('  1) Searching base papers') 
    papers = scholar.find_paper_from_query(query, result_limit=10)
    if len(papers == 0):
        papers = scholar.find_paper_from_query(query, result_limit=50)
    scholar.update_dataframe(incomplete=papers, dest=os.path.join(workspace_dir_name, 'results','papers.csv'))
    delete_duplicates_from_csv(csv_file=os.path.join(workspace_dir_name, 'results','papers.csv'))

    # 2) Cross-reference reccomendation system:
    # a paper is reccomended if and only if it's related to more than one paper
    print('\n\n 2) Expanding with Scholar reccomendations')
    counts = {}
    candidates = {}
    for paper in papers:
        guesses = scholar.find_recommendations(paper)

        for guess in guesses:
            if not guess['isOpenAccess']: continue

            candidates[guess['title']] = guess
            if guess['title'] not in counts.keys(): counts[guess['title']] = 1
            else: counts[guess['title']] += 1
    
    # reccomend only papers that appeared more than once 
    reccomends = []
    for key in counts:
        if counts[key]>1: reccomends.append(candidates[key])

    print(f'found {len(reccomends)} additional papers')
    # update the csv
    scholar.update_dataframe(incomplete= reccomends, dest=os.path.join(workspace_dir_name, 'results','papers.csv'))
    delete_duplicates_from_csv(csv_file=os.path.join(workspace_dir_name, 'results','papers.csv'))

    # download the papers (1/2)
    print('downloading papers (1/2)')
    with open(os.path.join(workspace_dir_name,'results','papers.csv'), 'r',encoding='utf-8') as fp:
        csvfile = csv.DictReader(fp)  
        scholar.download_pdf_from_id(" ".join( row['paperId'] for row in csvfile), workspace_dir_name)
    
    scholar.write_bib_file(csv_file=os.path.join(workspace_dir_name,'results','papers.csv'), bib_file=os.path.join(workspace_dir_name,'results','papers.bib'))

    # expand further with refy reccomendendation system
    print('\n\n 3) Expanding with Refy reccomendendation system')
    print('this might take a while...')
    scholar.refy_reccomend(bib_path=os.path.join(workspace_dir_name,'results','papers.bib'))

    with open(os.path.join(workspace_dir_name, 'refy_suggestions', 'test.csv'), 'r',encoding='utf-8') as fp:
        csvfile = csv.DictReader(fp) 
        for row in csvfile:
            title = scholar.replace_non_alphanumeric(row['title'])
            title = title.replace(" ","_")

            save_path = os.path.join(workspace_dir_name,'refy_suggestions',(title+'.pdf'))
            try:
                download_paper(url=row['url'], save_path=save_path)
            except:
                print(f'couldn t download {row}')

    return f'{os.path.join(os.getcwd(), workspace_dir_name)}'


import urllib
def download_paper(url, save_path=f"{uuid.uuid4().hex}.pdf"):
    success_string = f"paper saved successfully at {os.path.join(os.path.abspath(save_path))}"
    if url.endswith('.pdf'):
        urllib.request.urlretrieve(url, save_path)
        return success_string
    if 'doi' in url:
        doi = paper_id = "/".join(url.split("/")[-2:])
        # Construct the Crossref API URL
        print(doi)
        doi_url = f"https://doi.org/{doi}"

        # Send a GET request to the doi.org URL
        response = requests.get(doi_url, allow_redirects=True)

        # Check if the request was successful
        if response.status_code == 200:
            # Extract the final URL after redirection
            url = response.url

    if 'arxiv' in url:
        # URL del paper su arXiv

        # Ottieni l'ID del paper dall'URL
        paper_id = url.split("/")[-1]

        # Costruisci l'URL di download del paper
        pdf_url = f"http://arxiv.org/pdf/{paper_id}.pdf"

        # Scarica il paper in formato PDF
        urllib.request.urlretrieve(pdf_url, save_path)
        return success_string

    else:
        if '/full' in url:
            urllib.request.urlretrieve(url.replace('/full','/pdf'))
            return success_string
        if 'plos.org' in url:
            final_url = url.replace('article?', 'article/file?')
            urllib.request.urlretrieve(final_url, save_path)
            return success_string
    
    return f'\nfailed to download {url}'
        

def download_bibtex_library(csv_path):
    with open(csv_path, 'r',encoding='utf-8') as fp:
        csvfile = csv.DictReader(fp) 
        for row in csvfile:
            title = scholar.replace_non_alphanumeric(row['title'])
            title = title.replace(" ","-")

            save_path = os.path.join(os.path.join(csv_path, '..', title+'.pdf'))
            try:
                download_paper(url=row['url'], save_path=save_path)
            except:
                try:
                    download_paper(url=row['url']+'.pdf', save_path=save_path)
                except:
                    print(f'couldn t download {row}')

def generate_chunks(text, CHUNK_LENGTH = 4000):
    enc = tiktoken.encoding_for_model("gpt-4")
    tokens = enc.encode(text)
    token_chunks = [tokens[i:i + CHUNK_LENGTH] for i in range(0, len(tokens), CHUNK_LENGTH)]

    word_chunks = [enc.decode(chunk) for chunk in token_chunks]
    return word_chunks


from langchain.vectorstores import Chroma, Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
import pinecone

import langid
import time

# def process_pdf_folder(folder_path):
#     if not os.path.exists(folder_path):
#         return 'the folder does not exist, check your spelling'

#     for item in os.listdir(folder_path):
#         if not item.endswith('.pdf'):continue
        
#         with open(os.path.join(folder_path,'SUMMARY.txt'), 'a', encoding='UTF-8') as write_file:
#             write_file.write(item)
#             write_file.write("\n\n\n")
#             txt = summarize_pdf(item, model='Vicuna')
#             try:
#                 write_file.write(txt)
#             except:
#                 print(txt)
    
#     with open(os.path.join(folder_path,'SUMMARY.txt'), 'r', encoding='UTF-8') as read_file:
#         return read_file.read()


# # def summarize_pdf(pdf_path, model= None):
#     text = readPDF(pdf_path)

#     # according to the TLDR Model, consider smaller chunks
#     text_chunks = generate_chunks(text, 700)

#     if model is not None:
#         summarizer = LocalSearchEngine(tldr_model=model)
    
#     summary=''
#     for chunk in text_chunks:
#         summary += summarizer.tldr(chunk)

#     return summary

def get_result_path(path, exclude = []):
    for item in os.listdir(path):
        if item == 'papers.csv':
            return os.path.join(path, item)
        if os.path.isdir(os.path.join(path, item)) and item not in exclude: 
            res = get_result_path(os.path.join(path, item))
            if res: return res
    return 

def get_workspace_titles(workspace_name):
    csv_file_path = get_result_path(workspace_name)
    papers_available = []
    with open(csv_file_path, 'r', encoding='utf-8') as file:
        csv_file = csv.DictReader(file)
        for row in csv_file:
            papers_available.append(row['title'])
    return papers_available

import re
def same_title(title1, title2):
    try:
        title1 = re.sub(r'[^a-zA-Z]', ' ', title1)
        title2 = re.sub(r'[^a-zA-Z]', ' ', title2)
    except:
        return False
    words1 = set(title1.lower().split())
    words2 = set(title2.lower().split())
    return words1 == words2 or words1 <= words2 or words1 >= words2


def glimpse_pdf(title):
    # find papers.csv in workspace
    
    for workspace_name in os.listdir('workspaces'):
        csv_file_path = get_result_path(workspace_name)
        if csv_file_path is None: return 'no paper found'
        
        with open(csv_file_path, 'r', encoding='utf-8') as file:
            csv_file = csv.DictReader(file)
            for row in csv_file:
                if same_title(row['title'], title): return f"{row['title']}, paperId: {row['paperId']}, summary: {row['abstract']}"
    
    return f'\nno paper found with title {title}'
    
def count_tokens(text):
    enc = tiktoken.encoding_for_model("gpt-4")
    tokens = enc.encode(text)
    return len(tokens)

def readPDF(pdf_path):
    loader = OnlinePDFLoader(pdf_path)
    data  = loader.load()
    text_content = ''

    for page in data:
        formatted_content = page.page_content.replace('\n\n', ' ')
        text_content+=formatted_content
    
    return text_content

def get_pdf_path(dir, exclude=[]):
    paths = []
    for item in os.listdir(dir):
        itempath = os.path.join(dir,item)
        if item.endswith('.pdf'): paths.append(itempath)
        if os.path.isdir(itempath)and item not in exclude: 
            subpaths = get_pdf_path(itempath)
            for i in subpaths: paths.append(i)
    return paths

def delete_duplicates_from_csv(csv_file):
    print('verifying duplicates...') 
    to_delete = []
    def delete_csv_row_by_title(csv_file, title):
        # Read the CSV file and store rows in a list
        with open(csv_file, 'r',encoding='UTF-8') as file:
            reader = csv.DictReader(file)
            rows = list(reader)

        # Find the row index with the matching title
        row_index = None
        for index, row in enumerate(rows):
            if row['title'] == title:
                row_index = index
                break

        # If no matching title is found, return
        if row_index is None:
            print(f"No row with title '{title}' found.")
            return

        # Remove the row from the list
        del rows[row_index]

        # Write the updated rows back to the CSV file
        with open(csv_file, 'w', newline='',encoding='UTF-8') as file:
            fieldnames = reader.fieldnames
            writer = csv.DictWriter(file, fieldnames=fieldnames)
            writer.writeheader()
            writer.writerows(rows)
    
    
    with open(csv_file, 'r', encoding='UTF-8') as file:
        DELETED = 0
        reader = csv.DictReader(file)
        rows = list(reader)
        entries = set()
        for row in rows:
            if row['title']=='' or row['title'] is None: continue
            if row['title'] not in entries:entries.add(row['title'])
            else:
                DELETED+=1
                to_delete.append(row['title'])
                
        for title in to_delete: delete_csv_row_by_title(csv_file, title=title)
    print(f"Deleted {DELETED} duplicates")
    return
            
        
def update_workspace_dataframe(workspace, verbose = True):
    ADDED = 0
    # find results.csv 
    csv_path = get_result_path(workspace)

    # get titles in csv
    titles = get_workspace_titles(workspace)

    # get local papers path
    paths = get_pdf_path(workspace, exclude='refy_suggestions')

    # adding new to csv:
    for path in paths:
        
        exists = False

        # extract the title from the local paper
        title = scholar.extract_title(path)

        for t in titles:
            if same_title(t,title): exists = True
            
        
        # add it to dataframe if it was not found on the DF
        if not exists:
            if verbose: print(f"\nnew paper detected: {title}")
            # find it with online
            paper = scholar.find_paper_online(path)
            if paper : 
                if verbose: print(f"\t---> best match found online: {paper['title']} " )
                for t in titles:
                    if same_title(paper['title'], title): 
                        if verbose: print(f"\t    this paper is already present in the dataframe. skipping")
            else: 
                if verbose: print(path,   '-x-> no match found')
                continue
            with open(csv_path, 'a', encoding='utf-8') as fp:
                areYouSure = True
                for t in titles:
                    if same_title(t,paper['title']): areYouSure =False
                if not areYouSure:
                    if verbose: print(f"double check revealed that the paper is already in the dataframe. Skipping") 
                    continue
                if verbose: print(f"\t---> adding {paper['title']}")
                ADDED +=1
                paper_authors = paper.get('authors', [])
                journal_data = {}
                if 'journal' in paper:
                    journal_data = paper.get('journal',[])
                if journal_data is not None:
                    if 'name' not in journal_data: journal_data['name'] = ''
                    if 'pages' not in journal_data: journal_data['pages'] = ''

                if paper.get('tldr',[]) != []:tldr = paper['tldr']['text']
                elif paper.get('summary',[]) != []:tldr = paper['summary']
                elif 'abstract' in paper:tldr = paper['abstract']
                else: tldr = 'No summary available'

                if 'year' in paper:
                    year = paper['year']
                elif 'updated' in paper:year = paper['updated']
                else: year = ''

                if 'citationStyles' in paper:
                    if 'bibtex' in paper['citationStyles']: citStyle = paper['citationStyles']['bibtex'] 
                    else: citStyle = paper['citationStyles'][0]
                else: citStyle = ''

                csvfile = csv.DictWriter(fp, ['paperId', 'title', 'first_author', 'year', 'abstract','tldr','bibtex','influentialCitationCount','venue','journal','pages'])
                try:
                    csvfile.writerow({
                        'title': paper['title'],
                        'first_author': paper_authors[0]['name'] if paper_authors else '',
                        'year': year,
                        'abstract': paper['abstract'] if 'abstract' in paper else '',
                        'paperId': paper['paperId'] if 'paperId' in paper else '',
                        'tldr':tldr,
                        'bibtex':citStyle,
                        'influentialCitationCount': paper['influentialCitationCount'] if 'influentialCitationCount' in paper else '0',
                        'venue':paper['venue'] if 'venue' in paper else '',
                        'journal':journal_data['name'] if journal_data is not None else '',
                        'pages':journal_data['pages'] if journal_data is not None else '',
                    })      
                except Exception as e: 
                    if verbose: print('could not add ', title, '\n',e)
                # delete dupes if present

    if verbose: print(f"\n\nCSV UPDATE: Added {ADDED} new papers")
    
    # clean form dupes
    delete_duplicates_from_csv(csv_path)
    
    # update bib
    scholar.write_bib_file(csv_path)
    return
    

def load_workspace(folderdir):
    docs =[]
    
    for item in os.listdir(folderdir):
        if item.endswith('.pdf'):
            print(f'   > loading {item}')
            with suppress_stdout():
                content = readPDF(os.path.join(folderdir, item))
                docs.append(Document(
                    text = content,
                    doc_id = uuid.uuid4().hex
                ))
        
        if item =='.'or item =='..':continue
        if os.path.isdir( os.path.join(folderdir,item) ):
            sub_docs = load_workspace(os.path.join(folderdir,item))
            for doc in sub_docs:
                docs.append(doc)
        
    return docs

# List paths of all pdf files in a folder
def list_workspace_elements(folderdir):
    docs =[] 
    for item in os.listdir(folderdir):
        if item.endswith('.pdf'):
            docs.append(rf"{os.path.join(folderdir,item)}")
        
        if item =='.'or item =='..':continue
        if os.path.isdir( os.path.join(folderdir,item) ):
            sub_docs = list_workspace_elements(os.path.join(folderdir,item))
            for doc in sub_docs:
                docs.append(doc)
    return docs

def llama_query_engine(docs:list, pinecone_index_name:str):
    pinecone.init(
        api_key= os.environ['PINECONE_API_KEY'],
        environment= os.environ['PINECONE_API_ENV']
    )
    
    # Find the pinecone index
    if pinecone_index_name not in pinecone.list_indexes():
        # we create a new index
        pinecone.create_index(
            name=pinecone_index_name,
            metric='dotproduct',
            dimension=1536  # 1536 dim of text-embedding-ada-002
        )

    index = pinecone.Index(pinecone_index_name)
    
    # init it
    vector_store = PineconeVectorStore(pinecone_index=index)
    time.sleep(1)

    # setup our storage (vector db)
    storage_context = StorageContext.from_defaults(
        vector_store=vector_store
    )

    embed_model = OpenAIEmbedding(model='text-embedding-ada-002', embed_batch_size=100)
    service_context = ServiceContext.from_defaults(embed_model=embed_model)

    
    # populate the vector store
    LamaIndex = GPTVectorStoreIndex.from_documents(
        docs, storage_context=storage_context,
        service_context=service_context
    )

    print('PINECONE Vector Index initialized:\n',index.describe_index_stats())

    # init the query engine
    query_engine = LamaIndex.as_query_engine()
    
    return query_engine, LamaIndex


@contextmanager
def suppress_stdout():
    with open(os.devnull, "w") as devnull:
        old_stdout = sys.stdout
        sys.stdout = devnull
        try:  
            yield
        finally:
            sys.stdout = old_stdout

================================================
FILE: Assistant/semantic_scholar/simple.py
================================================
#!/usr/bin/env python3
import dotenv
dotenv.load_dotenv()
import re
import argparse
import os
from requests import Session
from typing import Generator, Union

import urllib3
urllib3.disable_warnings()

S2_API_KEY = os.environ['S2_API_KEY']


def get_paper(session: Session, paper_id: str, fields: str = 'paperId,title', **kwargs) -> dict:
    params = {
        'fields': fields,
        **kwargs,
    }
    headers = {
        'x-api-key': S2_API_KEY,
    }

    with session.get(f'https://api.semanticscholar.org/graph/v1/paper/{paper_id}', params=params, headers=headers) as response:
        response.raise_for_status()
        return response.json()


def download_pdf(session: Session, url: str, path: str, user_agent: str = 'requests/2.0.0'):
    # send a user-agent to avoid server error
    headers = {
        'user-agent': user_agent,
    }

    # stream the response to avoid downloading the entire file into memory
    with session.get(url, headers=headers, stream=True, verify=False) as response:
        # check if the request was successful
        response.raise_for_status()

        if response.headers['content-type'] != 'application/pdf':
            raise Exception('The response is not a pdf')

        with open(path, 'wb') as f:
            # write the response to the file, chunk_size bytes at a time
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)


def download_paper(session: Session, paper_id: str, directory: str = 'papers', user_agent: str = 'requests/2.0.0') -> Union[str, None]:
    try:
        directory = os.environ['workspace']
    except:
        pass
    
    paper = get_paper(session, paper_id, fields='paperId,title,isOpenAccess,openAccessPdf')

    # check if the paper is open access
    if not paper['isOpenAccess']:
        return None

    paperId: str =re.sub(r'\W+', '', paper['title']).encode("utf-8").decode("utf-8")
    pdf_url: str = paper['openAccessPdf']['url']
    pdf_path = os.path.join(directory, f'{paperId}.pdf')
    if os.path.isfile(pdf_path):
        return None

    # create the directory if it doesn't exist
    os.makedirs(directory, exist_ok=True)

    # check if the pdf has already been downloaded
    if not os.path.exists(pdf_path):
        download_pdf(session, pdf_url, pdf_path, user_agent=user_agent)

    return pdf_path


def download_papers(paper_ids: list[str], directory: str = 'papers', user_agent: str = 'requests/2.0.0') -> Generator[tuple[str, Union[str, None, Exception]], None, None]:
    # use a session to reuse the same TCP connection
    with Session() as session:
        for paper_id in paper_ids:
            try:
                yield paper_id, download_paper(session, paper_id, directory=directory, user_agent=user_agent)
            except Exception as e:
                yield paper_id, e


def main(args: argparse.Namespace) -> None:
    for paper_id, result in download_papers(args.paper_ids, directory=args.directory, user_agent=args.user_agent):
        if isinstance(result, Exception):
            return f"Failed to download '{paper_id}': {type(result).__name__}: {result}"
        elif result is None:
            return f"'{paper_id}' is not open access"
        else:
            return f"Downloaded '{paper_id}' to '{result}'"


if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--directory', '-d', default='papers')
    parser.add_argument('--user-agent', '-u', default='requests/2.0.0')
    parser.add_argument('paper_ids', nargs='+', default=[])
    args = parser.parse_args()
    main(args)

def Main(paper_ids=[], dir='papers', user_agent = 'requests/2.0.0', ):
    outcome = ''
    if isinstance(paper_ids, str):
        if ',' in paper_ids: 
            paper_ids = paper_ids.split(',')
        else:
            paper_ids = paper_ids.split()
            
        paper_ids = (id.strip() for id in paper_ids)
    for paper_id, result in download_papers(paper_ids, directory=dir, user_agent=user_agent):
        if isinstance(result, Exception):
            outcome += f"Failed to download '{paper_id}': {type(result).__name__}: {result}\n"
        elif result is None:
            outcome += f"couldn't download '{paper_id} because it is not open access\n"
        else:
            outcome += f"{result}\n"
    return outcome

================================================
FILE: Assistant/tools.py
================================================
# imports for Local Search Engine
import openai
import os
import pandas as pd
import numpy as np
from openai.embeddings_utils import distances_from_embeddings, cosine_similarity
from tqdm import tqdm
import ast

from . import webui

# import for Translator
import regex as re
import langid
from textblob import TextBlob
try: import translators as ts
except: print('could not import translators package')
import argostranslate.package
import argostranslate.translate

import math
import time 
import collections

"""
AssistantChat: dictionary on steroids.
"""
class AssistantChat(collections.MutableSequence):
    def __init__(self, begin:list, *args):
        self.body = begin
        self.filename= None
        self.extend(list(args))
    
    def is_saved(self):
        return True if self.filename != None else False
    
    def insert(self, i, v):
        self.body.insert(i, v)

    def append(self, item):
        self.body.append(item)

    def __call__(self):
        return self.body

    def __len__(self): return len(self.body)

    def __getitem__(self, i): return self.body[i]

    def __delitem__(self, i): del self.body[i]

    def __setitem__(self, i, v):
        self.body[i] = v

    def __str__(self):
        return str(self.body)


"""
Translator: 
performs basic translation opration using ChatGPT. 
Setting temperature to 0 allows better raw results
"""

"""
options:
 - gpt-3.5-turbo: reasonably fast, online, requires openai credit usage 
 - translators 5.6.3 lib: online, excellent, long lags might occcur
 - [default] argostranslator: fast,  offline 
"""

class Translator:
    def __init__(self, model="argostranslator", **kwargs):
        POSSIBLE_MODELS = ['argostranslator','gpt-3.5-turbo', 'translators']
        if model not in POSSIBLE_MODELS:
            raise Exception('this Translation model is not available')
        
        self.DEFAULT_CHAT = [{"role": "system", 
                    "content": "You are a translator. You recieve text and target language as inputs and translate the text to the target language"}]
        self.body = None
        self.model = model
        langs = kwargs['translator_languages']
        self.languages = langs

        # Download and install Argos Translate packages
        argostranslate.package.update_package_index()
        available_packages = argostranslate.package.get_available_packages()
        
        langid.set_languages(langs)

        for i in range(len(langs)):
            for j in range(len(langs)):
                if langs[i]==langs[j]: continue
                
                try:
                    package_to_install = next(
                        filter(
                            lambda x: x.from_code == langs[i] and x.to_code == langs[j], available_packages
                        )
                    )
                except:
                    print(f'failed to add {langs[i]} => {langs[j]}')
                print(f'downloading Argos Translate Language packages...')
                try:
                    argostranslate.package.install_from_path(package_to_install.download())
                except:
                    pass
        

    def translate(self, input, to_language, from_language=None):
        if from_language == to_language: return input

        if from_language == None:
            from_language = langid.classify(input)[0]
            
        if self.model=="gpt-3.5-turbo":
            self.body = self.DEFAULT_CHAT
            self.body.append({"role":"user", "content":f"translate in {to_language}:'{input}'"})
            try:
                API_response = openai.ChatCompletion.create(
                        model=self.model,
                        temperature=0,  
                        messages=self.body)
                
            except Exception as e:
                print(f"couldn't translate {self.body[-1]}")
                print(e)
                return input
            return API_response['choices'][0]['message']['content']
        
        if self.model=='translators':
            try:
                res = ts.translate_text(input, translator='google', to_language=to_language, from_language=from_language)
            except:
                res = input
                self.model = 'argostranslator'
                print('translation using translators switching to argostranslate')
                
            return res
        
        if self.model == 'argostranslator':
            try:
                res = argostranslate.translate.translate(input, from_code=from_language, to_code=to_language)
            except:
                print(f"translation using argostranslate from: {from_language} - to -> {to_language} Failed")
                print(input)
                res= input
            return res

    
"""
LocalSearchEngine:
 - Looks for files in a foder;
 - extracts information;
 - create high value contents that allow for accurate search;

to be implemented:
 - extend reserarch to .pdf and .jpeg (w/ ChatGPT4)
    - extends also to videos;
    - extends also to scientific papers;
"""

class LocalSearchEngine:
    def __init__(self, 
                 embed_model = "text-embedding-ada-002", 
                 tldr_model = "gpt-3.5-turbo",
                 translator_model = "argostranslator",
                 translator_languages = ['en','it','es'],
                 default_dir = os.path.realpath(os.path.join(os.getcwd(),'saved_chats')),
                 irrelevancy_th=0.8):
        
        self.translate_engine = Translator(model=translator_model, translator_languages=translator_languages)
        self.tldr_model  = tldr_model
        self.embed_model = embed_model
        self.default_dir = default_dir
        self.irrelevancy_threshold = irrelevancy_th
    
    def compute_similarity(self, key, text):
        if type(key)==str:  key_embedding = self.compute_embeds(key)
        else: key_embedding = key
        
        if type(text)==str: query_embedding =self.compute_embeds(text)
        else: query_embedding = text

        similarity = cosine_similarity(key_embedding, query_embedding)
        return similarity
    

    def accurate_search(self, key, path=None, n=-1, from_csv=False):
        if path is None:
            path = self.default_dir

        print('\n')
        if 'DATAFRAME.csv' not in os.listdir(path):
            print('> > DATAFRAME.csv not detected building a new one')
            pd.DataFrame({'file_names':['DATAFRAME.csv'], 'similarity':[0],"tags":[None]}).to_csv(os.path.join(path, 'DATAFRAME.csv'))

        if isinstance(key, list) or isinstance(key, tuple):
            key = " ".join(key)
        
        # USE EXISTING DATAFRAME TO MAKE SEARCH FASTER (skip tag generation)
        if from_csv:
            DataFrame = pd.read_csv(os.path.join(path,'DATAFRAME.csv'))
            fnames = DataFrame["file_names"]
            tags = DataFrame["tags"]
            embeds = DataFrame["embeddings"]

            if len(fnames)!=len(os.listdir(path)):
                print('> dataset not updated. Updating it now...')
                
                self.produce_folder_tags() ### I should add a parameter to specify HugginFaceHub (free) embeddings or OpenAI ones ($)
        
        print('> Analyzing DataFrame:')

        results = []
        topics = []
        
        key_embed = {}
        for lang in self.translate_engine.languages:
            transl_key = self.translate_engine.translate(input=key, to_language=langid.classify(lang)[0], from_language=langid.classify(key)[0])
            print(f'> > computing key embedding in {lang} language')
            key_embed[lang]= self.compute_embeds(transl_key)

        for i in tqdm(range(len(fnames))):
            if not(fnames[i].endswith('.txt')):
                results.append(0)
                topics.append('None')
                continue
            
            # extract tags associated to the file
            file_tags = tags[i]
            topics.append(file_tags)

            # extract and parse the saved embeddings
            file_embeds = ast.literal_eval( embeds[i] ) # from "[a, b, c,]" to [a, b, c]

            # take the key embedding from the same language (more accurate) 
            key_embedding = key_embed[langid.classify(file_tags)[0]]
            
            done=False
            while not(done):
                try:
                    relevance = self.compute_similarity(file_embeds, key_embedding)
                    done=True

                except Exception as e:
                    print(e)

            results.append(relevance)

        if n==-1: n=len(fnames)
        df = pd.DataFrame({'file_names':fnames, 'similarity':results,"tags":topics})
        df = df.sort_values(by='similarity', ascending=False)
        df = df.reset_index(drop=True)

        return df.head(n)


    def produce_folder_tags(self, path=None):
        if path is None:
            path = self.default_dir
        
        if ('DATAFRAME.csv' in os.listdir(path)):
            print('> > DataFrame existing')
        else:
            print('> > Creating empty DataFrame')
            pd.DataFrame(columns=['file_names', 'tags', 'embeddings']).to_csv(os.path.join(path,'DATAFRAME.csv'))

        existing_df = pd.read_csv(os.path.join(path, 'DATAFRAME.csv'))

        fnames = os.listdir(path)
        embeds = []
        topics = []
        n_updates = 0

        for filename in fnames:
            # process text files only
            if not(filename.endswith('.txt')):
                embeds.append(math.nan)
                topics.append('NaN')
                continue

            # don't repeat calculation if the file has already been processed 
            has_tags = len(existing_df['tags'][existing_df["file_names"]==filename])>=1
            
            try:
                has_embeds = len(existing_df['embeddings'][existing_df["file_names"]==filename].to_list()[0]) >5 
            except:
                has_embeds = False
        
            f = open(os.path.join(path,filename), 'r')
            text = f.read()
            if count_tokens(text)>4096:
                # keep 2000 words only
                text = " ".join(text.split()[0:2000])

            if has_tags:
                tags= existing_df['tags'][existing_df["file_names"]==filename].to_list()[0]
                topics.append(tags)
            else:
                n_updates +=1
                print(f'> > {filename}: extracting topics')
                done= False
                while not(done):
                    try:    
                        tags = self.extract_tags(text)
                        done= True
                    except:
                        print('> > system overloaded, waiting 5 sec')
                        time.sleep(5)

                topics.append(tags)

            if has_embeds:
                embeds.append(existing_df['embeddings'][existing_df["file_names"]==filename].to_list()[0])
            else:
                n_updates +=1
                print(f'> > {filename}: processing embeddings')
                done = False
                while not(done):
                    try:            
                        embedding = self.compute_embeds(tags)
                        done= True
                    except:
                        print('> > system overloaded, waiting 5 sec')
                        time.sleep(5)
                embeds.append(embedding)

        df = pd.DataFrame({'file_names':fnames, 'tags':topics, 'embeddings':embeds})
        df.to_csv(os.path.join(path,'DATAFRAME.csv'), index=False)
        df = df.reset_index(drop=True)
        print(f"> > # UPDATES applied:{n_updates}")
        return df

    def extract_tags(self, text):
        text = text.split('user:')
        text = "".join(text[1:])

        chat = [{"role": "system", 
                    "content": "You recieve text and extract up to 10 different topic covered in the text. You output the topics separated by a comma (,)"}]
        chat.append({"role": "user", "content":f"extract tags:{text}"})
        API_response = openai.ChatCompletion.create(
                model=self.tldr_model,
                temperature=0,
                messages=chat)
        
        output = API_response['choices'][0]['message']['content']
        if ':' in output:
            output = output.split(':')
            output = "".join(output[1:])
        return output

    # ADD Free alternative (Huggingface Embeds)
    def compute_embeds(self, words):
        return openai.Embedding.create(input=words, engine=self.embed_model)['data'][0]['embedding']

    def DaVinci_tldr(self, text):
        response = openai.Completion.create(
            model="text-davinci-003",
            prompt=f"{text}\n\nTl;dr",
            temperature=0,
            max_tokens=200,
            top_p=1.0,
            frequency_penalty=0.0,
            presence_penalty=0.0
        )
        return response['choices'][0]["text"]
    
    def tldr(self, text, to_language=None, with_model = ''):
        if self.tldr_model == 'gpt-3.5-turbo'or with_model=='gpt-3.5-turbo':
            text = text.replace('\n',' ')
            if to_language != None:
                context =f'tldr in {to_language}:'
                CHAT = [{"role": "system", "content":context},
                        {"role": "user", "content":f"'{text}'"}]
                
                response = openai.ChatCompletion.create(
                            model="gpt-3.5-turbo",
                            temperature=0,
                            max_tokens=200,
                            messages=CHAT)
                
                try:
                    return response['choices'][0]['message']['content']
                except:
                    pass

            else: 
                return self.DaVinci_tldr(text)
            
        
        if self.tldr_model == 'Vicuna' or with_model=='Vicuna':
            try:
                webui.set_text_gen_params(temperature=0.1)
                result = webui.oobabooga_textgen(prompt=f'Text Summarizer [Question]: summarize the following text: {text}\n[Answer]:')
                postprocessed = webui.post_process(result)
                return postprocessed
            except IndexError as e:
                return result
            except Exception as e:
                print(e)
                return ''
    
"""
OnlineSearchEngine:
to be implemented:
 - allows to extract content from the internet with http requests;
 - provide context to the VirtualAssistant
 - find a way to trigger online search
"""

class OnlineSearchEngine:
    # work in progress
    pass

"""
MISCELLANEOUS FUNCTIONS
"""
def count_tokens(vCountTokenStr):
    # Tokenize the input string
    blob = TextBlob(vCountTokenStr)
    tokens = blob.words

    # Count the number of tokens
    num_tokens = len(tokens)
    return num_tokens


def parse_conversation(string_chat):
    split1_chat = string_chat.split('user:')

    rebuilt = []

    for item in split1_chat:
        if 'system:' in item:
            rebuilt.append({"role":"system", "content":f"{item.split('ststem:')[-1]}"})
        if 'assistant:' in item:
            spl_item = item.split("assistant:")
            rebuilt.append({"role":"user", "content":f"{spl_item.pop(0)}"})
            
            while len(spl_item)>=1:
                rebuilt.append({"role":"assistant", "content":f"{spl_item.pop(0)}"})
    
    return rebuilt

def take_last_k_interactions(chat, max_tokens=4000):
    n_tokens = 0
    interactions = []

    for item in chat:
        n_tokens += count_tokens(item['content'])
        if n_tokens>= max_tokens:
            return interactions
        interactions.append(item)

================================================
FILE: Assistant/voice.py
================================================
# imports
import pyttsx3
from ibm_watson import TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from TTS.api import TTS
import os
import elevenlabslib
from contextlib import contextmanager
import pygame
from pydub import AudioSegment
import io
import sys
import langid

class Voice:
    def __init__(self, languages, **kwargs):   
        # IBM CLOUD
        try:
            print('Authorizing IBM Cloud:')
            url = kwargs['ibm_url']
            apikey = kwargs['ibm_api']
            # Setup Service
            print('  1/3: Setting up cloud authenticator...')
            authenticator = IAMAuthenticator(apikey)
            # New tts service
            print('  2/3: Setting up text-to-speech...')
            tts = TextToSpeechV1(authenticator=authenticator)
            # set serive url
            print('  3/3: Setting up cloud service ...')
            tts.set_service_url(url)
            print('    ✓ service established\n')
            self.tts_service = tts
        except:
            print('IBM authentication failed')

        if 'elevenlabs_api' in kwargs:
            try:
                eleven_labs_user = elevenlabslib.ElevenLabsUser(kwargs['elevenlabs_api'])
                
                if 'elevenlabs_voice' in list(kwargs.keys()):
                    if kwargs['elevenlabs_voice'] in (voice.initialName for voice in eleven_labs_user.get_all_voices()):             
                        self.elevenlabs_voice = eleven_labs_user.get_voices_by_name(kwargs['elevenlabs_voice'])[0]

            except:
                print('Couldn t connect with Elevenlabs')
            # <to do: initiate Jarvis cloned voice if available and disable TTS>

        # PYTTSX3 for backup plan
        engine = pyttsx3.init()

        # SYNTHETIC VOICES
        # CoquiAI -  coqui-ai/TTS (https://github.com/coqui-ai/tts)
        synth = TTS(model_name=os.path.join("tts_models/multilingual/multi-dataset/your_tts"), progress_bar=False, gpu=True)  
        
        self.languages = languages
        self.write_dir = kwargs['write_dir']
        self.path = kwargs['voice_id']
        print('cloning voice form:',self.path)
        self.synthetic_voice = synth
        self.offline = engine


    def speak(self, text, VoiceIdx, mode, elevenlabs=False, IBM=False):
        ## delete old last_aswer.wav to avoid conflicts
        if os.path.exists((self.write_dir, "last_answer.wav")): os.remove((self.write_dir, "last_answer.wav"))

        ## generate the speech: last_answer.wav
        if mode == 'online':
            if  elevenlabs==True:
                if  VoiceIdx == 'en':
                    try:
                        audio = self.elevenlabs_voice.generate_audio_bytes(text)
                        audio = AudioSegment.from_file(io.BytesIO(audio), format="mp3")
                        audio.export(os.path.join(self.write_dir, "last_answer.wav"), format="wav")
                    except Exception as e:
                        print(f'Elevenlabs credit might have ended. {e}')
                        raise Exception

                if VoiceIdx == 'jarvis':
                    # to do: use voice duplication from elevenlabs
                    print('(ElevenLabs Jarvis voice not yet available)')
                    raise Exception()

            elif IBM==True:
                with open(os.path.join(self.write_dir, "last_answer.wav"),'wb') as audio_file:
                    try:
                        if VoiceIdx=='jarvis':VoiceIdx='en'
                        res = self.tts_service.synthesize(text, accept='audio/wav', voice=get_ibm_voice_id(VoiceIdx)).get_result()
                        audio_file.write(res.content)
                    except:
                        print('(IBM credit might have ended)')
                        raise Exception

        if mode == 'offline': 
            if VoiceIdx == 'jarvis' and langid.classify(text)[0]=='en':
                LangIdx = 'en'
                print(self.path, LangIdx)
                self.synthetic_voice.tts_to_file(text=text, speaker_wav=self.path[LangIdx], language=LangIdx, file_path=os.path.join(self.write_dir, 'last_answer.wav'))
                
                
                """ Idea for multiple language Text-To-Speech: dictionaries
                if VoiceIdx == 'other-language':
                    self.synthetic_voice['other-language'].tts_to_file(text=text, speaker_wav=self.path, language="en", file_path=os.path.join(self.DIRECTORIES['SOUND_DIR'], 'last_answer.wav'))
                """
            else:
                LangIdx = langid.classify(text)[0]
                self.offline = self.change_offline_lang(lang_id=LangIdx)
                self.offline.say(text)
                self.offline.runAndWait()
                return
        
        # play the generated speech:
        if pygame.mixer.get_init() is None:pygame.mixer.init()
        pygame.mixer.music.load(os.path.join(self.write_dir, 'last_answer.wav'))
        pygame.mixer.music.set_volume(0.5)
        pygame.mixer.music.play()
        while(pygame.mixer.music.get_busy()): pass
        return


    def change_offline_lang(self, lang_id):
        engine = pyttsx3.init()
        try:
            for voice in self.offline.getProperty('voices'):
                if self.languages[lang_id] in voice.name:
                    engine.setProperty('voice', voice.id)
                    return engine
            return engine
        except Exception as e:    
            print('error while switching to lang: ',lang_id,e)
            return engine 

# know more at: https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices
def get_ibm_voice_id(VoiceIdx):
    voices={
        'ar':'ar-MS_OmarVoice',

        'zh':'zh-CN_LiNaVoice',
        'zh':'zh-CN_WangWeiVoice',
        'zh':'zh-CN_ZhangJingVoice',

        'cz':'cs-CZ_AlenaVoice',

        'nl':'nl-BE_AdeleVoice',
        'nl':'nl-BE_BramVoice',
        'nl':'nl-NL_EmmaVoice',
        'nl':'nl-NL_LiamVoice',
        'nl':'nl-NL_MerelV3Voice',

        'en':'en-GB_CharlotteV3Voice',
        'en':'en-GB_JamesV3Voice',
        'en':'en-GB_KateV3Voice',
        'en':'en-US_AllisonV3Voice',
        'en':'en-US_EmilyV3Voice',
        'en':'en-US_HenryV3Voice',
        'en':'en-US_KevinV3Voice',
        'en':'en-US_LisaV3Voice',
        'en':'en-US_MichaelV3Voice',
        'en':'en-US_OliviaV3Voice',

        'fr':'fr-CA_LouiseV3Voice',
        'fr':'fr-FR_NicolasV3Voice',
        'fr':'fr-FR_ReneeV3Voice',

        'de':'de-DE_BirgitV3Voice',
        'de':'de-DE_DieterV3Voice',
        'de':'de-DE_ErikaV3Voice',

        'it':'it-IT_FrancescaV3Voice',
        'ja':'ja-JP_EmiV3Voice',
        
        'ko':'ko-KR_HyunjunVoice',
        'ko':'ko-KR_SiWooVoice',
        'ko':'ko-KR_YoungmiVoice',
        'ko':'ko-KR_YunaVoice',
        'ko':'ko-KR_JinV3Voice',

        'pt':'pt-BR_IsabelaV3Voice',

        'es':'es-ES_EnriqueV3Voice',
        'es':'es-ES_LauraV3Voice',
        'es':'es-LA_SofiaV3Voice',
        'es':'es-US_SofiaV3Voice',

        'sv':'sv-SE_IngridVoice'
        }
    return voices[VoiceIdx]


@contextmanager
def suppress_stdout():
    with open(os.devnull, "w") as devnull:
        old_stdout = sys.stdout
        sys.stdout = devnull
        try:  
            yield
        finally:
            sys.stdout = old_stdout

================================================
FILE: Assistant/webui.py
================================================
import json 
import requests
import re
import langid

SERVER = 'localhost'
TEXT_GEN_PARAMS = {
    'max_new_tokens': 200,
    'do_sample': True,
    'temperature': 0.72,
    'top_p': 0.73,
    'typical_p': 1,
    'repetition_penalty': 1.1,
    'encoder_repetition_penalty': 1.0,
    'top_k': 0,
    'min_length': 0,
    'no_repeat_ngram_size': 0,
    'num_beams': 1,
    'penalty_alpha': 0,
    'length_penalty': 1,
    'early_stopping': False,
    'seed': -1,
    'add_bos_token': True,
    'custom_stopping_strings': [],
    'truncation_length': 2048,
    'ban_eos_token': False,
}

def set_text_gen_params(**kwargs):
    for key in kwargs:
        if key not in list(TEXT_GEN_PARAMS.keys()): raise Exception('no such parameter in oogabooga text generation')
        TEXT_GEN_PARAMS[key]=kwargs[key]


def oobabooga_textgen(prompt, params=TEXT_GEN_PARAMS, server=SERVER):
    ChatMode = True if type(prompt) == list else False
    
    if ChatMode: 
        nMessages = len(prompt)
        prompt = parse_conversation(prompt) 

    payload = json.dumps([prompt, params])
    APIresponse = requests.post(f"http://{server}:7860/run/textgen", json={
        "data": [
            payload
        ]
    }).json()
    reply = APIresponse["data"][0]
    
    # hallucination filter:
    if ChatMode:
        reply = reply.replace("[assistant]:","###")
        reply = reply.replace("[user]:","###")
        reply = reply.replace("[system]:","###")
        reply = reply.split('###')
        reply = " ".join(reply[(nMessages+1):(nMessages+2)])
    
    return reply
    
def post_process(answer):
    allowed = ['Answer','Outcome','Discussion','Conclusion']
    answer = answer.split('[Question]')[-1]

    relevant =''
    for a in allowed:
        if a in answer:
            temp = re.split(r'\[|\]', answer)
            try:
                relevant += temp[temp.index(a)+1].strip(':')
            except:
                print('Failure processing answer')
                pass
        
    print(len(relevant.split()))
    return relevant
    
def parse_conversation(chat):
    linkDetectionRegexStr = "[a-zA-Z0-9]((?i) dot |(?i) dotcom|(?i)dotcom|(?i)dotcom |\.|\. | \.| \. |\,)[a-zA-Z]*((?i) slash |(?i) slash|(?i)slash |(?i)slash|\/|\/ | \/| \/ ).+[a-zA-Z0-9]"
    oobaboogaChatHistory = ""
    for message in chat:
        oobaboogaChatHistory += f"[{str(message['role'])}]:{message['content']}\n"
    oobaboogaChatHistory = re.sub(linkDetectionRegexStr, "<url>", oobaboogaChatHistory)
    return oobaboogaChatHistory

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2023 Gianmarco Guarnier

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# JARVIS-ChatGPT: A conversational assistant equipped with J.A.R.V.I.S's voice
**A voice-based interactive assistant equipped with a variety of synthetic voices (including J.A.R.V.I.S's voice from IronMan)**

![GitHub last commit](https://img.shields.io/github/last-commit/gianmarcoguarnier/JARVIS-ChatGPT?style=for-the-badge)
<p align="center">
  <img src="https://user-images.githubusercontent.com/49094051/227788148-a8ff8e06-86a4-41a6-aa53-8b7d6855360c.png"/>
  <span style=color:grey> <i>image by MidJourney AI </i> </span>
</p>

Ever dreamed to ask hyper-intelligent system tips to improve your armor? Now you can! Well, maybe not the armor part... This project exploits OpenAI Whisper, OpenAI ChatGPT and IBM Watson.
<p align="center"> <strong> PROJECT MOTIVATION:  </strong> </p> 

*Many times ideas come in the worst moment and they fade away before you have the time to explore them better. The objective of this project is to develop a system capable of giving tips and opinions in quasi-real-time about anything you ask. The ultimate assistant will be able to be accessed from any authorized microphone inside your house or your phone, it should run constantly in the background and when summoned should be able to generate meaningful answers (with a badass voice) as well as interface with the pc or a server and save/read/write files that can be accessed later. It should be able to run research, gather material from the internet (extract content from HTML pages, transcribe Youtube videos, find scientific papers...) and provide summaries that can be used as context to make informed decisions. In addition, it might interface with some external gadgets (IoT) but that's extra.*
<br>
<br>
<br>

<p align="center"> <strong> DEMO: </strong> </p> 

https://user-images.githubusercontent.com/49094051/231303323-9859e028-33e1-490d-9967-44852fd0efc5.mp4

<br>

---
## JULY 14th 2023 UPDATE: Research Mode
I can finnaly share the first draft of the Research Mode. This modality was thought for people often dealing with research papers. 
- Switch to research mode by saying *'Switch to Research Mode'*
- :star: Initialize a new workspace like this: *'Initialize a new workspace about Carbon Fiber Applications in the Spacecraft industry'*. A workspace is a folder that collects and organize the results of the research. This protocol is subdivided into 3 sub-routines:
   1. Core Paper identification: Use the **Semantic Scholar API** to identify some strongly relevant papers;
   2. Core Expansion: for each paper, finds some suggestions, then keep only the suggestions that appear to be similar to at least 2 paper;
   3. Refy Expansion: use the refy suggestion package to enlarge the results;
- Find suggestions like: *'find suggestions that are sililar to the paper with title ...'*
- Download: *'download the paper with title ...'*
- :star: Query your database like: *'what is the author of the paper with title ...?'*  *'what are the experimental conditions set for the paper with title ...?'*

PS: This mode is not super stable and needs to be worked on<br>

*PPS: This project will be discontinued for some time since I'll be working on my thesis until 2024. However there are already so many things that can be improved so I'll be back!*
## What you'll need:
<p align="center"><i>DISCLAIMER:<br> The project might consume your OpenAI credit resulting in undesired billing;<br> I don't take responsibility for any unwanted charges;<br>Consider setting limitations on credit consumption at your OpenAI account; </i> </p> 

 - An [OpenAI](https://openai.com) account and API key; (check FAQs below for the alternatives)
 - <i>[PicoVoice](https://picovoice.ai/platform/porcupine/) account and a free AccessKey; (optional) </i>
 - <i>[ElevenLabs](https://beta.elevenlabs.io/) account and free Api Key (optional)</i>;
 - [langChain API keys](https://github.com/hwchase17/langchain/blob/master/docs/modules/agents/tools/getting_started.md) for web surfing (news, weather, serpapi, google-serp, google-search... they are all free)
 - [ffmpeg](https://ffmpeg.org/) ;
 - Python virtual environment (Python>=3.9 and <3.10);
 - <i> Some credit to spend on ChatGPT (you can get three months of free usage by signing up to OpenAI) (suggested)</i>;
 - CUDA version >= 11.2;
 - <i> An IBM Cloud account to exploit their cloud-based text-to-speech models ([tutorial](https://www.youtube.com/watch?v=A9_0OgW1LZU))(optional)</i>;
 - A (reasonably) fast internet connection (most of the code relies on API so a slower connection might result in a longer time to respond);
 - mic and speaker;
 - CUDA capable graphic engine (my Torch Version: 2.0 and CUDA v11.7 ```pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117```);
 - Patience :sweat_smile:

> you can rely on the new ```setup.bat``` that will do most of the things for you.


## GitHub overview
**MAIN** script you should run: `openai_api_chatbot.py` if you want to use the latest version of the OpenAI API Inside the demos folder you'll find some guidance for the packages used in the project, if you have errors you might check these files first to target the problem. Mostly is stored in the Assistant folder: `get_audio.py` stores all the functions to handle mic interactions, `tools.py` implements some basic aspects of the Virtual Assistant, `voice.py` describes a (very) rough Voice class. ```Agents.py``` handle the LangChain part of the system (here you can add or remove tools from the toolkits of the agents)<br> The remaining scripts are supplementary to the voice generation and should not be edited. 

# INSTALLATION TUTORIAL
## Automatic installation
You can run ```setup.bat``` if you are running on Windows/Linux. The script will perform every step of the manual installation in sequence. Refer to those in case the procedure should fail.<br>
The automatic installation will also run the Vicuna installation ([Vicuna Installation Guide](https://hub.tcno.co/ai/text-ai/vicuna/))
## Manual Installation
## Step 1: installation, accounts, APIs... 
### Environment
1. Make a new, empty virtual environment with Python 3.8 and activate it (.\venv_name\Scripts\activate );
2. ```pip install -r venv_requirements.txt```; This might take some time; if you encounter conflicts on specific packages, install them manually without the ```==<version>```;
3. install manually PyTorch according to your CUDA VERSION;
4. Copy and paste the files you'll find in the folder ```whisper_edits``` to the ```whisper``` folder of your environment (.\venv\lib\site-packages\whisper\ ) <span style="color:grey"> these edits will add just an attribute to the whisper model to access its dimension more easily; </span> 
5. install [TTS](https://github.com/coqui-ai/tts);
6. Run [their script](https://github.com/coqui-ai/TTS/blob/dev/README.md#-python-api) and check everything is working (it should download some models) (you can alternatively run ```demos/tts_demo.py```);
7. Rename or delete the TTS folder and download the Assistant and other scripts from this repo 
9. Install Vicuna following the instructions on the Vicuna folder or by running:<br><p align='center'>
```cd Vicuna```<br>
```call vicuna.ps1```<br></p>
<span style="color:grey"> Manual instructions will instruct you to follow the [Vicuna Installation Guide](https://hub.tcno.co/ai/text-ai/vicuna/) </span> 
10. paste all your keys in the ```env.txt``` file and rename it to ```.env``` (yes, remove the txt extension)
11. Check everything works *(following)*
<br>

### Checks
- Verify your graphic engine and CUDA version are compatible with PyTorch by running `torch.cuda.is_available()` and `torch.cuda.get_device_name(0)` inside Pyhton; . 
- run ```tests.py```. This file attempt to perform basic operations that might raise errors;
- [WARNING] Check the FAQs below if you have errors;
- You can check the sources of error by running demos in the demos folder;


## Step 2: Language support
- To have answers spoken in your language you should first check if your language is supported by the speech generator at __https://cloud.ibm.com/docs/text-to-speech?topic=text-to-speech-voices__; 
- If it's supported, add or change the languages inside ```VirtualAssistant.__init__()``` ;<br>

<p align="center">
  <img src="https://user-images.githubusercontent.com/49094051/230505516-4dba0f29-f45a-4311-aa54-1d93fca25de5.PNG"/>
</p>

- Remember: The loaded Whisper is the medium one. If it performs badly in your language, upgrade to the larger one in the ```__main__()``` at `whisper_model = whisper.load_model("large")`; but I hope your GPU memory is large likewise.

## Step 3: Running (`openai_api_chatbot.py`):
When running, you'll see much information being displayed. I'm constantly striving to improve the readability of the execution, the whole project is a huge beta, forgive slight variations from the screens below. Anyway, this is what happens in general terms when you hit 'run':
- Preliminary initializations take place, you should hear a chime when the Assistant is ready;
- When *awaiting for triggering words* is displayed you'll need to say `Jarvis` to summon the assistant. At this point, a conversation will begin and you can speak in whatever language you want (if you followed step 2). The conversation will terminate when you 1) say a [stop word](https://github.com/gianmarcoguarnier/JARVIS-ChatGPT/tree/main#key-words) 2) say something with one word (like 'ok') 3) when you stop making questions for more than 30 seconds <br>
<p align="center">
  <img src="https://user-images.githubusercontent.com/49094051/230505896-c8a2ff80-4265-41e4-a6d5-e9f56d156afa.PNG" /><br>
  <img src="https://user-images.githubusercontent.com/49094051/230506756-287a1d6b-9652-4c66-bea8-cd75380ab45b.PNG" /><br>
</p>

- After the magic word is said, the word *listening...* should then appear. At this point, you can make your question. When you are done just wait (3 seconds) for the answer to be submitted;
- The script will convert the recorded audio to text using Whisper;
- The text will be analyzed and a decision will be made. If the Assistant believes it needs to take some action to respond (like looking for a past conversation) the langchain agents will make a plan and use their tool to answer.
- Elsewise, the script will then expand the `chat_history` with your question, it will send a request with the API and it will update the history as soon as it receives a full answer from ChatGPT (this may take up to 5-10 seconds, consider explicitly asking for a short answer if you are in a hurry);
- The `say()` function will perform voice duplication to speak with Jarvis/Someone's voice; if the argument is not in English, IBM Watson will send the response from one of their nice text-to-speech models. If everything fails, the functions will rely on pyttsx3 which is a fast yet not as cool alternative;
<p align="center">

</p>

- When any of the stop keywords are said, the script will ask ChatGPT to give a title to the conversation and will save the chat in a .txt file with the format 'CurrentDate_Title.txt';
- The assistant will then go back to sleep;
<p align="center">
 <img src='https://user-images.githubusercontent.com/49094051/227788180-b9da0957-a58b-4c1c-bc34-4a4c8a0e0957.PNG'/><br>
  <i><span style="color:grey">I made some prompts and closed the conversation</span> </i>
</p>


# Keywords:
- to stop or save the chat, just say 'THANKS' at some point;
- To summon JARVIS voice just say 'JARVIS' at some point;

<span style="color:grey">*not ideal I know but works for now*</span>


# History:
- [x] [11 - 2022] Deliver chat-like prompts from Python from a keyboard
- [x] [12 - 2022] Deliver chat-like prompts from Python with voice
- [x] [2  - 2023] International language support for prompt and answers
- [x] [3  - 2023] Jarvis voice set up
- [x] [3  - 2023] Save conversation
- [x] [3  - 2023] Background execution & Voice Summoning
- [x] [3  - 2023] Improve output displayed info
- [x] [3  - 2023] Improve JARVIS's voice performances through prompt preprocessing
- [x] [4  - 2023] Introducing: *Project memory* store chats, events, timelines and other relevant information for a given project to be accessed later by the user or the assistant itself 
- [x] [4  - 2023] Create a full stack ```VirtualAssistant``` class with memory and local storage access
- [x] [4  - 2023] Add sound feedback at different stages (chimes, beeps...)
- [x] [4  - 2023] International language support for voice commands (beta)
- [x] [4  - 2023] Making a step-by-step tutorial 
- [x] [4  - 2023] Move some processing locally to reduce credit consumption: [Vicuna: A new, powerful model based on LLaMa, and trained with GPT-4](https://www.youtube.com/watch?v=ByV5w1ES38A&ab_channel=TroubleChute);
- [x] [4  - 2023] Integrate with Eleven Labs Voices for super expressive voices and outstanding voice cloning;
- [x] [4  - 2023] Extending voice commands and *Actions* (make a better active assistant)
- [x] [4  - 2023] Connect the system to the internet
- [x] [6  - 2023] Connect with paper database

currently working on:
- [ ] Extend doc processing tools
- [ ] Find a free alternative for LangChain Agents

following:
- [ ] fixing chat length bug (when the chat is too long it can't be processed by ChatGPT 3.5 Turbo)
- [ ] expanding *Memory* 
- [ ] crash reports   
- [ ] Refine capabilities
<br>
<br>

### waiting for ChatGPT4 to:
- [ ] add multimodal input (i.e. "Do you think 'this' [holding a paper plane] could fly" -> camera -> ChatGPT4 -> "you should improve the tip of the wings" )
- [ ] Extend *project memory* to images, pdfs, papers...

<span style="color:grey">*Check the [UpdateHistory.md](https://github.com/gianmarcoguarnier/JARVIS-ChatGPT/blob/main/UpdateHistory.md) of the project for more insights.*</span>

Have fun!

# ERRORS and FAQs
categories: Install, General, Runtime
### INSTALL: I have conflicting packages while installing *venv_requirements.txt*, what should I do? <br>
1. Make sure you have the right Python version (3.7) on the .venv (>python --version with the virtual environment activated). 
2. Try to edit the _venv_requirements.txt_ and remove the version requirements of the incriminated dependencies. 
3. Straight remove the package from the txt file and install them manually afterward.<br>

### INSTALL: I meet an error when running openai_api_chatbot.py saying: TypeError: LoadLibrary( ) argument 1 must be str, not None what's wrong? <br>
The problem is concerning Whisper. You should re-install it manually  with ```pip install whisper-openai``` <br>

### INSTALL: I can't import 'openai.embeddings_utils'<br>
1. Try to ```pip install --upgrade openai```. 
2. This happens because openai elevated their minimum requirements. I had this problem and solved by manually downloading [embeddings_utils.py](https://github.com/openai/openai-python/blob/main/openai/embeddings_utils.py) inside ./<your_venv>/Lib/site-packages/openai/ 
<br>
3. If the problem persists with ```datalib``` raise an issue and I'll provide you the missing file
4. upgrade to Python 3.8 (create new env and re-install TTS, requirements)

### INSTALL: I encounter the error ModuleNotFoundError: No module named '\<some module\>' <br>
Requirements are not updated every commit. While this might generate errors you can quickly install the missing modules, at the same time it keeps the environment clean from conflicts when I try new packages (and I try LOTS of them) <br>

### RUN TIME: I encounter some OOM memory when loading the Whisper model, what does it mean?<br>
It means the model you selected is too big for your CUDA device memory. Unfortunately, there is not much you can do about it except load a smaller model. If the smaller model does not satisfy you, you might want to speak 'clearer' or make longer prompts to let the model predict more accurately what you are saying. This sounds inconvenient but, in my case, greatly improved my English-speaking :) <br>

### RUN TIME: Max length tokens for ChatGPT-3.5-Turbo is 4096 but received... tokens.<br>
This is a bug still present, don't expect to have ever long conversations with your assistant as it will simply have enough memory to remember the whole conversation at some point. A fix is in development, it might consist of adopting a 'sliding windows' approach even if it might cause repetition of some concepts. <br>

### GENERAL: I finished my OPENAI credit/demo, what can I do? <br>
1. Go online only. The price is not that bad and you might end up paying a few dollars a month since pricing depends on usage (with heavy testing I ended up consuming the equivalent of ~4 dollars a month during my free trial). You can set limits on your monthly tokens consumption. 
2. Use a Hybrid mode where the most credit-intensive tasks are executed locally for free and the rest is done online. 
3. Install Vicuna and run OFFLINE mode only with limited performance. 

### GENERAL: For how long will this project be updated? 
Right now (April 2023) I'm working almost non-stop on this. I will likely take a break in the summer because I'll be working on my thesis. 

If you have questions you can contact me by raising an Issue and I'll do my best to help as soon as possible.
<p align="right"><i>Gianmarco Guarnier<i></p>


================================================
FILE: TTS/.models.json
================================================
{
    "tts_models": {
        "multilingual":{
            "multi-dataset":{
                "your_tts":{
                    "description": "Your TTS model accompanying the paper https://arxiv.org/abs/2112.02418",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.10.1_models/tts_models--multilingual--multi-dataset--your_tts.zip",
                    "default_vocoder": null,
                    "commit": "e9a1953e",
                    "license": "CC BY-NC-ND 4.0",
                    "contact": "egolge@coqui.ai"
                }
            }
        },
        "bg": {
            "cv": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--bg--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "cs": {
            "cv": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--cs--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "da": {
            "cv": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--da--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "et": {
            "cv": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--et--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "ga": {
            "cv": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--ga--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "en": {
            "ek1": {
                "tacotron2": {
                    "description": "EK1 en-rp tacotron2 by NMStoker",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--en--ek1--tacotron2.zip",
                    "default_vocoder": "vocoder_models/en/ek1/wavegrad",
                    "commit": "c802255",
                    "license": "apache 2.0"
                }
            },
            "ljspeech": {
                "tacotron2-DDC": {
                    "description": "Tacotron2 with Double Decoder Consistency.",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--en--ljspeech--tacotron2-DDC.zip",
                    "default_vocoder": "vocoder_models/en/ljspeech/hifigan_v2",
                    "commit": "bae2ad0f",
                    "author": "Eren Gölge @erogol",
                    "license": "apache 2.0",
                    "contact": "egolge@coqui.com"
                },
                "tacotron2-DDC_ph": {
                    "description": "Tacotron2 with Double Decoder Consistency with phonemes.",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--en--ljspeech--tacotron2-DDC_ph.zip",
                    "default_vocoder": "vocoder_models/en/ljspeech/univnet",
                    "commit": "3900448",
                    "author": "Eren Gölge @erogol",
                    "license": "apache 2.0",
                    "contact": "egolge@coqui.com"
                },
                "glow-tts": {
                    "description": "",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--en--ljspeech--glow-tts.zip",
                    "stats_file": null,
                    "default_vocoder": "vocoder_models/en/ljspeech/multiband-melgan",
                    "commit": "",
                    "author": "Eren Gölge @erogol",
                    "license": "MPL",
                    "contact": "egolge@coqui.com"
                },
                "speedy-speech": {
                    "description": "Speedy Speech model trained on LJSpeech dataset using the Alignment Network for learning the durations.",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--en--ljspeech--speedy-speech.zip",
                    "stats_file": null,
                    "default_vocoder": "vocoder_models/en/ljspeech/hifigan_v2",
                    "commit": "4581e3d",
                    "author": "Eren Gölge @erogol",
                    "license": "apache 2.0",
                    "contact": "egolge@coqui.com"
                },
                "tacotron2-DCA": {
                    "description": "",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--en--ljspeech--tacotron2-DCA.zip",
                    "default_vocoder": "vocoder_models/en/ljspeech/multiband-melgan",
                    "commit": "",
                    "author": "Eren Gölge @erogol",
                    "license": "MPL",
                    "contact": "egolge@coqui.com"
                },
                "vits": {
                    "description": "VITS is an End2End TTS model trained on LJSpeech dataset with phonemes.",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--en--ljspeech--vits.zip",
                    "default_vocoder": null,
                    "commit": "3900448",
                    "author": "Eren Gölge @erogol",
                    "license": "apache 2.0",
                    "contact": "egolge@coqui.com"
                },
                "vits--neon": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--en--ljspeech--vits.zip",
                    "default_vocoder": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause",
                    "contact": null,
                    "commit": null
                },
                "fast_pitch": {
                    "description": "FastPitch model trained on LJSpeech using the Aligner Network",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--en--ljspeech--fast_pitch.zip",
                    "default_vocoder": "vocoder_models/en/ljspeech/hifigan_v2",
                    "commit": "b27b3ba",
                    "author": "Eren Gölge @erogol",
                    "license": "apache 2.0",
                    "contact": "egolge@coqui.com"
                },
                "overflow": {
                    "description": "Overflow model trained on LJSpeech",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.10.0_models/tts_models--en--ljspeech--overflow.zip",
                    "default_vocoder": "vocoder_models/en/ljspeech/hifigan_v2",
                    "commit": "3b1a28f",
                    "author": "Eren Gölge @erogol",
                    "license": "apache 2.0",
                    "contact": "egolge@coqui.ai"
                },
                "neural_hmm": {
                    "description": "Neural HMM model trained on LJSpeech",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.11.0_models/tts_models--en--ljspeech--neural_hmm.zip",
                    "default_vocoder": "vocoder_models/en/ljspeech/hifigan_v2",
                    "commit": "3b1a28f",
                    "author": "Shivam Metha @shivammehta25",
                    "license": "apache 2.0",
                    "contact": "d83ee8fe45e3c0d776d4a865aca21d7c2ac324c4"
                }
            },
            "vctk": {
                "vits": {
                    "description": "VITS End2End TTS model trained on VCTK dataset with 109 different speakers with EN accent.",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--en--vctk--vits.zip",
                    "default_vocoder": null,
                    "commit": "3900448",
                    "author": "Eren @erogol",
                    "license": "apache 2.0",
                    "contact": "egolge@coqui.ai"
                },
                "fast_pitch":{
                    "description": "FastPitch model trained on VCTK dataseset.",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--en--vctk--fast_pitch.zip",
                    "default_vocoder": null,
                    "commit": "bdab788d",
                    "author": "Eren @erogol",
                    "license": "CC BY-NC-ND 4.0",
                    "contact": "egolge@coqui.ai"
                }
            },
            "sam": {
                "tacotron-DDC": {
                    "description": "Tacotron2 with Double Decoder Consistency trained with Aceenture's Sam dataset.",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--en--sam--tacotron-DDC.zip",
                    "default_vocoder": "vocoder_models/en/sam/hifigan_v2",
                    "commit": "bae2ad0f",
                    "author": "Eren Gölge @erogol",
                    "license": "apache 2.0",
                    "contact": "egolge@coqui.com"
                }
            },
            "blizzard2013": {
                "capacitron-t2-c50": {
                    "description": "Capacitron additions to Tacotron 2 with Capacity at 50 as in https://arxiv.org/pdf/1906.03402.pdf",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.7.0_models/tts_models--en--blizzard2013--capacitron-t2-c50.zip",
                    "commit": "d6284e7",
                    "default_vocoder": "vocoder_models/en/blizzard2013/hifigan_v2",
                    "author": "Adam Froghyar @a-froghyar",
                    "license": "apache 2.0",
                    "contact": "adamfroghyar@gmail.com"
                },
                "capacitron-t2-c150_v2": {
                    "description": "Capacitron additions to Tacotron 2 with Capacity at 150 as in https://arxiv.org/pdf/1906.03402.pdf",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.7.1_models/tts_models--en--blizzard2013--capacitron-t2-c150_v2.zip",
                    "commit": "a67039d",
                    "default_vocoder": "vocoder_models/en/blizzard2013/hifigan_v2",
                    "author": "Adam Froghyar @a-froghyar",
                    "license": "apache 2.0",
                    "contact": "adamfroghyar@gmail.com"
                }
            }
        },
        "es": {
            "mai": {
                "tacotron2-DDC": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--es--mai--tacotron2-DDC.zip",
                    "default_vocoder": "vocoder_models/universal/libri-tts/fullband-melgan",
                    "commit": "",
                    "author": "Eren Gölge @erogol",
                    "license": "MPL",
                    "contact": "egolge@coqui.com"
                }
            },
            "css10":{
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--es--css10--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "fr": {
            "mai": {
                "tacotron2-DDC": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--fr--mai--tacotron2-DDC.zip",
                    "default_vocoder": "vocoder_models/universal/libri-tts/fullband-melgan",
                    "commit": null,
                    "author": "Eren Gölge @erogol",
                    "license": "MPL",
                    "contact": "egolge@coqui.com"
                }
            },
            "css10":{
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--fr--css10--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "uk":{
            "mai": {
                "glow-tts": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--uk--mai--glow-tts.zip",
                    "author":"@robinhad",
                    "commit": "bdab788d",
                    "license": "MIT",
                    "contact": "",
                    "default_vocoder": "vocoder_models/uk/mai/multiband-melgan"
                },
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--uk--mai--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "zh-CN": {
            "baker": {
                "tacotron2-DDC-GST": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--zh-CN--baker--tacotron2-DDC-GST.zip",
                    "commit": "unknown",
                    "author": "@kirianguiller",
                    "license": "apache 2.0",
                    "default_vocoder": null
                }
            }
        },
        "nl": {
            "mai": {
                "tacotron2-DDC": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--nl--mai--tacotron2-DDC.zip",
                    "author": "@r-dh",
                    "license": "apache 2.0",
                    "default_vocoder": "vocoder_models/nl/mai/parallel-wavegan",
                    "stats_file": null,
                    "commit": "540d811"
                }
            },
            "css10":{
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--nl--css10--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "de": {
            "thorsten": {
                "tacotron2-DCA": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--de--thorsten--tacotron2-DCA.zip",
                    "default_vocoder": "vocoder_models/de/thorsten/fullband-melgan",
                    "author": "@thorstenMueller",
                    "license": "apache 2.0",
                    "commit": "unknown"
                },
                "vits": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.7.0_models/tts_models--de--thorsten--vits.zip",
                    "default_vocoder": null,
                    "author": "@thorstenMueller",
                    "license": "apache 2.0",
                    "commit": "unknown"
                },
                "tacotron2-DDC": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--de--thorsten--tacotron2-DDC.zip",
                    "default_vocoder": "vocoder_models/de/thorsten/hifigan_v1",
                    "description": "Thorsten-Dec2021-22k-DDC",
                    "author": "@thorstenMueller",
                    "license": "apache 2.0",
                    "commit": "unknown"
                }
            },
            "css10": {
                "vits-neon":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--de--css10--vits.zip",
                    "default_vocoder": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause",
                    "commit": null
                }
            }
        },
        "ja": {
            "kokoro": {
                "tacotron2-DDC": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--ja--kokoro--tacotron2-DDC.zip",
                    "default_vocoder": "vocoder_models/ja/kokoro/hifigan_v1",
                    "description": "Tacotron2 with Double Decoder Consistency trained with Kokoro Speech Dataset.",
                    "author": "@kaiidams",
                    "license": "apache 2.0",
                    "commit": "401fbd89"
                }
            }
        },
        "tr":{
            "common-voice": {
                "glow-tts":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--tr--common-voice--glow-tts.zip",
                    "default_vocoder": "vocoder_models/tr/common-voice/hifigan",
                    "license": "MIT",
                    "description": "Turkish GlowTTS model using an unknown speaker from the Common-Voice dataset.",
                    "author": "Fatih Akademi",
                    "commit": null
                }
            }
        },
        "it": {
            "mai_female": {
                "glow-tts":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--it--mai_female--glow-tts.zip",
                    "default_vocoder": null,
                    "description": "GlowTTS model as explained on https://github.com/coqui-ai/TTS/issues/1148.",
                    "author": "@nicolalandro",
                    "license": "apache 2.0",
                    "commit": null
                },
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--it--mai_female--vits.zip",
                    "default_vocoder": null,
                    "description": "GlowTTS model as explained on https://github.com/coqui-ai/TTS/issues/1148.",
                    "author": "@nicolalandro",
                    "license": "apache 2.0",
                    "commit": null
                }
            },
            "mai_male": {
                "glow-tts":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--it--mai_male--glow-tts.zip",
                    "default_vocoder": null,
                    "description": "GlowTTS model as explained on https://github.com/coqui-ai/TTS/issues/1148.",
                    "author": "@nicolalandro",
                    "license": "apache 2.0",
                    "commit": null
                },
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/tts_models--it--mai_male--vits.zip",
                    "default_vocoder": null,
                    "description": "GlowTTS model as explained on https://github.com/coqui-ai/TTS/issues/1148.",
                    "author": "@nicolalandro",
                    "license": "apache 2.0",
                    "commit": null
                }
            }
        },
        "ewe": {
            "openbible": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.2_models/tts_models--ewe--openbible--vits.zip",
                    "default_vocoder": null,
                    "license": "CC-BY-SA 4.0",
                    "description": "Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible.",
                    "author": "@coqui_ai",
                    "commit": "1b22f03"
                }
            }
        },
        "hau": {
            "openbible": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.2_models/tts_models--hau--openbible--vits.zip",
                    "default_vocoder": null,
                    "license": "CC-BY-SA 4.0",
                    "description": "Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible.",
                    "author": "@coqui_ai",
                    "commit": "1b22f03"
                }
            }
        },
        "lin": {
            "openbible": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.2_models/tts_models--lin--openbible--vits.zip",
                    "default_vocoder": null,
                    "license": "CC-BY-SA 4.0",
                    "description": "Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible.",
                    "author": "@coqui_ai",
                    "commit": "1b22f03"
                }
            }
        },
        "tw_akuapem": {
            "openbible": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.2_models/tts_models--tw_akuapem--openbible--vits.zip",
                    "default_vocoder": null,
                    "license": "CC-BY-SA 4.0",
                    "description": "Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible.",
                    "author": "@coqui_ai",
                    "commit": "1b22f03"
                }
            }
        },
        "tw_asante": {
            "openbible": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.2_models/tts_models--tw_asante--openbible--vits.zip",
                    "default_vocoder": null,
                    "license": "CC-BY-SA 4.0",
                    "description": "Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible.",
                    "author": "@coqui_ai",
                    "commit": "1b22f03"
                }
            }
        },
        "yor": {
            "openbible": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.2_models/tts_models--yor--openbible--vits.zip",
                    "default_vocoder": null,
                    "license": "CC-BY-SA 4.0",
                    "description": "Original work (audio and text) by Biblica available for free at www.biblica.com and open.bible.",
                    "author": "@coqui_ai",
                    "commit": "1b22f03"
                }
            }
        },
        "hu": {
            "css10": {
                "vits": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--hu--css10--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "el": {
            "cv": {
                "vits": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--el--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "fi": {
            "css10": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--fi--css10--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "hr": {
            "cv": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--hr--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "lt": {
            "cv": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--lt--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "lv": {
            "cv": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--lv--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "mt": {
            "cv": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--mt--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "pl": {
            "mai_female": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--pl--mai_female--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "pt": {
            "cv": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--pt--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "ro": {
            "cv": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--ro--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "sk": {
            "cv": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--sk--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "sl": {
            "cv": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--sl--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "sv": {
            "cv": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/tts_models--sv--cv--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "author": "@NeonGeckoCom",
                    "license": "bsd-3-clause"
                }
            }
        },
        "ca": {
            "custom": {
                "vits":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.10.1_models/tts_models--ca--custom--vits.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "description": " It is trained from zero with 101460 utterances consisting of 257 speakers, approx 138 hours of speech. We used three datasets;\nFestcat and Google Catalan TTS (both TTS datasets) and also a part of Common Voice 8. It is trained with TTS v0.8.0.\nhttps://github.com/coqui-ai/TTS/discussions/930#discussioncomment-4466345",
                    "author": "@gullabi",
                    "license": "CC-BY-4.0"
                }
            }
        },
        "fa":{
            "custom":{
                "glow-tts": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.10.1_models/tts_models--fa--custom--glow-tts.zip",
                    "default_vocoder": null,
                    "commit": null,
                    "description": "persian-tts-female-glow_tts model for text to speech purposes. Single-speaker female voice Trained on persian-tts-dataset-famale. \nThis model has no compatible vocoder thus the output quality is not very good. \nDataset: https://www.kaggle.com/datasets/magnoliasis/persian-tts-dataset-famale.",
                    "author": "@karim23657",
                    "license": "CC-BY-4.0"
                }
            }
        }
    },
    "vocoder_models": {
        "universal": {
            "libri-tts": {
                "wavegrad": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/vocoder_models--universal--libri-tts--wavegrad.zip",
                    "commit": "ea976b0",
                    "author": "Eren Gölge @erogol",
                    "license": "MPL",
                    "contact": "egolge@coqui.com"
                },
                "fullband-melgan": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/vocoder_models--universal--libri-tts--fullband-melgan.zip",
                    "commit": "4132240",
                    "author": "Eren Gölge @erogol",
                    "license": "MPL",
                    "contact": "egolge@coqui.com"
                }
            }
        },
        "en": {
            "ek1": {
                "wavegrad": {
                    "description": "EK1 en-rp wavegrad by NMStoker",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/vocoder_models--en--ek1--wavegrad.zip",
                    "commit": "c802255",
                    "license": "apache 2.0"
                }
            },
            "ljspeech": {
                "multiband-melgan": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/vocoder_models--en--ljspeech--multiband-melgan.zip",
                    "commit": "ea976b0",
                    "author": "Eren Gölge @erogol",
                    "license": "MPL",
                    "contact": "egolge@coqui.com"
                },
                "hifigan_v2": {
                    "description": "HiFiGAN_v2 LJSpeech vocoder from https://arxiv.org/abs/2010.05646.",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/vocoder_models--en--ljspeech--hifigan_v2.zip",
                    "commit": "bae2ad0f",
                    "author": "@erogol",
                    "license": "apache 2.0",
                    "contact": "egolge@coqui.ai"
                },
                "univnet": {
                    "description": "UnivNet model finetuned on TacotronDDC_ph spectrograms for better compatibility.",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/vocoder_models--en--ljspeech--univnet_v2.zip",
                    "commit": "4581e3d",
                    "author": "Eren @erogol",
                    "license": "apache 2.0",
                    "contact": "egolge@coqui.ai"
                }
            },
            "blizzard2013": {
                "hifigan_v2": {
                    "description": "HiFiGAN_v2 LJSpeech vocoder from https://arxiv.org/abs/2010.05646.",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.7.0_models/vocoder_models--en--blizzard2013--hifigan_v2.zip",
                    "commit": "d6284e7",
                    "author": "Adam Froghyar @a-froghyar",
                    "license": "apache 2.0",
                    "contact": "adamfroghyar@gmail.com"
                }
            },
            "vctk": {
                "hifigan_v2": {
                    "description": "Finetuned and intended to be used with tts_models/en/vctk/sc-glow-tts",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/vocoder_models--en--vctk--hifigan_v2.zip",
                    "commit": "2f07160",
                    "author": "Edresson Casanova",
                    "license": "apache 2.0",
                    "contact": ""
                }
            },
            "sam": {
                "hifigan_v2": {
                    "description": "Finetuned and intended to be used with tts_models/en/sam/tacotron_DDC",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/vocoder_models--en--sam--hifigan_v2.zip",
                    "commit": "2f07160",
                    "author": "Eren Gölge @erogol",
                    "license": "apache 2.0",
                    "contact": "egolge@coqui.ai"
                }
            }
        },
        "nl": {
            "mai": {
                "parallel-wavegan": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/vocoder_models--nl--mai--parallel-wavegan.zip",
                    "author": "@r-dh",
                    "license": "apache 2.0",
                    "commit": "unknown"
                }
            }
        },
        "de": {
            "thorsten": {
                "wavegrad": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/vocoder_models--de--thorsten--wavegrad.zip",
                    "author": "@thorstenMueller",
                    "license": "apache 2.0",
                    "commit": "unknown"
                },
                "fullband-melgan": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/vocoder_models--de--thorsten--fullband-melgan.zip",
                    "author": "@thorstenMueller",
                    "license": "apache 2.0",
                    "commit": "unknown"
                },
                "hifigan_v1": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.8.0_models/vocoder_models--de--thorsten--hifigan_v1.zip",
                    "description": "HifiGAN vocoder model for Thorsten Neutral Dec2021 22k Samplerate Tacotron2 DDC model",
                    "author": "@thorstenMueller",
                    "license": "apache 2.0",
                    "commit": "unknown"
                }
            }
        },
        "ja": {
            "kokoro": {
                "hifigan_v1": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/vocoder_models--ja--kokoro--hifigan_v1.zip",
                    "description": "HifiGAN model trained for kokoro dataset by @kaiidams",
                    "author": "@kaiidams",
                    "license": "apache 2.0",
                    "commit": "3900448"
                }
            }
        },
        "uk": {
            "mai": {
                "multiband-melgan": {
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/vocoder_models--uk--mai--multiband-melgan.zip",
                    "author":"@robinhad",
                    "commit": "bdab788d",
                    "license": "MIT",
                    "contact": ""
                }
            }
        },
        "tr":{
            "common-voice": {
                "hifigan":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.6.1_models/vocoder_models--tr--common-voice--hifigan.zip",
                    "description": "HifiGAN model using an unknown speaker from the Common-Voice dataset.",
                    "author": "Fatih Akademi",
                    "license": "MIT",
                    "commit": null
                }
            }
        }
    },
    "voice_conversion_models":{
        "multilingual":{
            "vctk":{
                "freevc24":{
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.13.0_models/voice_conversion_models--multilingual--vctk--freevc24.zip",
                    "description": "FreeVC model trained on VCTK dataset from https://github.com/OlaWod/FreeVC",
                    "author": "Jing-Yi Li @OlaWod",
                    "license": "MIT",
                    "commit": null
                }
            }
        }
    }
}


================================================
FILE: TTS/VERSION
================================================
0.12.0


================================================
FILE: TTS/__init__.py
================================================
import os

with open(os.path.join(os.path.dirname(__file__), "VERSION"), "r", encoding="utf-8") as f:
    version = f.read().strip()

__version__ = version


================================================
FILE: TTS/api.py
================================================
import tempfile
from pathlib import Path

from TTS.utils.audio.numpy_transforms import save_wav
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer


class TTS:
    """TODO: Add voice conversion and Capacitron support."""

    def __init__(
        self,
        model_name: str = None,
        model_path: str = None,
        config_path: str = None,
        vocoder_path: str = None,
        vocoder_config_path: str = None,
        progress_bar: bool = True,
        gpu=False,
    ):
        """🐸TTS python interface that allows to load and use the released models.

        Example with a multi-speaker model:
            >>> from TTS.api import TTS
            >>> tts = TTS(TTS.list_models()[0])
            >>> wav = tts.tts("This is a test! This is also a test!!", speaker=tts.speakers[0], language=tts.languages[0])
            >>> tts.tts_to_file(text="Hello world!", speaker=tts.speakers[0], language=tts.languages[0], file_path="output.wav")

        Example with a single-speaker model:
            >>> tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False, gpu=False)
            >>> tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path="output.wav")

        Example loading a model from a path:
            >>> tts = TTS(model_path="/path/to/checkpoint_100000.pth", config_path="/path/to/config.json", progress_bar=False, gpu=False)
            >>> tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path="output.wav")

        Example voice cloning with YourTTS in English, French and Portuguese:
            >>> tts = TTS(model_name="tts_models/multilingual/multi-dataset/your_tts", progress_bar=False, gpu=True)
            >>> tts.tts_to_file("This is voice cloning.", speaker_wav="my/cloning/audio.wav", language="en", file_path="thisisit.wav")
            >>> tts.tts_to_file("C'est le clonage de la voix.", speaker_wav="my/cloning/audio.wav", language="fr", file_path="thisisit.wav")
            >>> tts.tts_to_file("Isso é clonagem de voz.", speaker_wav="my/cloning/audio.wav", language="pt", file_path="thisisit.wav")

        Args:
            model_name (str, optional): Model name to load. You can list models by ```tts.models```. Defaults to None.
            model_path (str, optional): Path to the model checkpoint. Defaults to None.
            config_path (str, optional): Path to the model config. Defaults to None.
            vocoder_path (str, optional): Path to the vocoder checkpoint. Defaults to None.
            vocoder_config_path (str, optional): Path to the vocoder config. Defaults to None.
            progress_bar (bool, optional): Whether to pring a progress bar while downloading a model. Defaults to True.
            gpu (bool, optional): Enable/disable GPU. Some models might be too slow on CPU. Defaults to False.
        """
        self.manager = ModelManager(models_file=self.get_models_file_path(), progress_bar=progress_bar, verbose=False)

        self.synthesizer = None
        self.voice_converter = None

        if model_name:
            self.load_tts_model_by_name(model_name, gpu)
        if model_path:
            self.load_tts_model_by_path(
                model_path, config_path, vocoder_path=vocoder_path, vocoder_config=vocoder_config_path, gpu=gpu
            )

    @property
    def models(self):
        return self.manager.list_tts_models()

    @property
    def is_multi_speaker(self):
        if hasattr(self.synthesizer.tts_model, "speaker_manager") and self.synthesizer.tts_model.speaker_manager:
            return self.synthesizer.tts_model.speaker_manager.num_speakers > 1
        return False

    @property
    def is_multi_lingual(self):
        if hasattr(self.synthesizer.tts_model, "language_manager") and self.synthesizer.tts_model.language_manager:
            return self.synthesizer.tts_model.language_manager.num_languages > 1
        return False

    @property
    def speakers(self):
        if not self.is_multi_speaker:
            return None
        return self.synthesizer.tts_model.speaker_manager.speaker_names

    @property
    def languages(self):
        if not self.is_multi_lingual:
            return None
        return self.synthesizer.tts_model.language_manager.language_names

    @staticmethod
    def get_models_file_path():
        return Path(__file__).parent / ".models.json"

    @staticmethod
    def list_models():
        manager = ModelManager(models_file=TTS.get_models_file_path(), progress_bar=False, verbose=False)
        return manager.list_tts_models()

    def download_model_by_name(self, model_name: str):
        model_path, config_path, model_item = self.manager.download_model(model_name)
        if model_item.get("default_vocoder") is None:
            return model_path, config_path, None, None
        vocoder_path, vocoder_config_path, _ = self.manager.download_model(model_item["default_vocoder"])
        return model_path, config_path, vocoder_path, vocoder_config_path

    def load_vc_model_by_name(self, model_name: str, gpu: bool = False):
        """Load one of the voice conversion models by name.

        Args:
            model_name (str): Model name to load. You can list models by ```tts.models```.
            gpu (bool, optional): Enable/disable GPU. Some models might be too slow on CPU. Defaults to False.
        """
        model_path, config_path, _, _ = self.download_model_by_name(model_name)
        self.voice_converter = Synthesizer(vc_checkpoint=model_path, vc_config=config_path, use_cuda=gpu)

    def load_tts_model_by_name(self, model_name: str, gpu: bool = False):
        """Load one of 🐸TTS models by name.

        Args:
            model_name (str): Model name to load. You can list models by ```tts.models```.
            gpu (bool, optional): Enable/disable GPU. Some models might be too slow on CPU. Defaults to False.

        TODO: Add tests
        """

        model_path, config_path, vocoder_path, vocoder_config_path = self.download_model_by_name(model_name)

        # init synthesizer
        # None values are fetch from the model
        self.synthesizer = Synthesizer(
            tts_checkpoint=model_path,
            tts_config_path=config_path,
            tts_speakers_file=None,
            tts_languages_file=None,
            vocoder_checkpoint=vocoder_path,
            vocoder_config=vocoder_config_path,
            encoder_checkpoint=None,
            encoder_config=None,
            use_cuda=gpu,
        )

    def load_tts_model_by_path(
        self, model_path: str, config_path: str, vocoder_path: str = None, vocoder_config: str = None, gpu: bool = False
    ):
        """Load a model from a path.

        Args:
            model_path (str): Path to the model checkpoint.
            config_path (str): Path to the model config.
            vocoder_path (str, optional): Path to the vocoder checkpoint. Defaults to None.
            vocoder_config (str, optional): Path to the vocoder config. Defaults to None.
            gpu (bool, optional): Enable/disable GPU. Some models might be too slow on CPU. Defaults to False.
        """

        self.synthesizer = Synthesizer(
            tts_checkpoint=model_path,
            tts_config_path=config_path,
            tts_speakers_file=None,
            tts_languages_file=None,
            vocoder_checkpoint=vocoder_path,
            vocoder_config=vocoder_config,
            encoder_checkpoint=None,
            encoder_config=None,
            use_cuda=gpu,
        )

    def _check_arguments(self, speaker: str = None, language: str = None, speaker_wav: str = None):
        if self.is_multi_speaker and (speaker is None and speaker_wav is None):
            raise ValueError("Model is multi-speaker but no speaker is provided.")
        if self.is_multi_lingual and language is None:
            raise ValueError("Model is multi-lingual but no language is provided.")
        if not self.is_multi_speaker and speaker is not None:
            raise ValueError("Model is not multi-speaker but speaker is provided.")
        if not self.is_multi_lingual and language is not None:
            raise ValueError("Model is not multi-lingual but language is provided.")

    def tts(self, text: str, speaker: str = None, language: str = None, speaker_wav: str = None):
        """Convert text to speech.

        Args:
            text (str):
                Input text to synthesize.
            speaker (str, optional):
                Speaker name for multi-speaker. You can check whether loaded model is multi-speaker by
                `tts.is_multi_speaker` and list speakers by `tts.speakers`. Defaults to None.
            language (str, optional):
                Language code for multi-lingual models. You can check whether loaded model is multi-lingual
                `tts.is_multi_lingual` and list available languages by `tts.languages`. Defaults to None.
            speaker_wav (str, optional):
                Path to a reference wav file to use for voice cloning with supporting models like YourTTS.
                Defaults to None.
        """
        self._check_arguments(speaker=speaker, language=language, speaker_wav=speaker_wav)

        wav = self.synthesizer.tts(
            text=text,
            speaker_name=speaker,
            language_name=language,
            speaker_wav=speaker_wav,
            reference_wav=None,
            style_wav=None,
            style_text=None,
            reference_speaker_name=None,
        )
        return wav

    def tts_to_file(
        self,
        text: str,
        speaker: str = None,
        language: str = None,
        speaker_wav: str = None,
        file_path: str = "output.wav",
    ):
        """Convert text to speech.

        Args:
            text (str):
                Input text to synthesize.
            speaker (str, optional):
                Speaker name for multi-speaker. You can check whether loaded model is multi-speaker by
                `tts.is_multi_speaker` and list speakers by `tts.speakers`. Defaults to None.
            language (str, optional):
                Language code for multi-lingual models. You can check whether loaded model is multi-lingual
                `tts.is_multi_lingual` and list available languages by `tts.languages`. Defaults to None.
            speaker_wav (str, optional):
                Path to a reference wav file to use for voice cloning with supporting models like YourTTS.
                Defaults to None.
            file_path (str, optional):
                Output file path. Defaults to "output.wav".
        """
        wav = self.tts(text=text, speaker=speaker, language=language, speaker_wav=speaker_wav)
        self.synthesizer.save_wav(wav=wav, path=file_path)

    def voice_conversion(
        self,
        sourve_wav: str,
        target_wav: str,
    ):
        """Voice conversion with FreeVC. Convert source wav to target speaker.

        Args:
            source_wav (str):
                Path to the source wav file.
            target_wav (str):
                Path to the target wav file.
        """
        wav = self.synthesizer.voice_conversion(source_wav=sourve_wav, target_wav=target_wav)
        return wav

    def tts_with_vc(self, text: str, language: str = None, speaker_wav: str = None):
        """Convert text to speech with voice conversion.

        It combines tts with voice conversion to fake voice cloning.

        - Convert text to speech with tts.
        - Convert the output wav to target speaker with voice conversion.

        Args:
            text (str):
                Input text to synthesize.
            language (str, optional):
                Language code for multi-lingual models. You can check whether loaded model is multi-lingual
                `tts.is_multi_lingual` and list available languages by `tts.languages`. Defaults to None.
            speaker_wav (str, optional):
                Path to a reference wav file to use for voice cloning with supporting models like YourTTS.
                Defaults to None.
        """
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as fp:
            # Lazy code... save it to a temp file to resample it while reading it for VC
            self.tts_to_file(text=text, speaker=None, language=language, file_path=fp.name)
        if self.voice_converter is None:
            self.load_vc_model_by_name("voice_conversion_models/multilingual/vctk/freevc24")
        wav = self.voice_converter.voice_conversion(source_wav=fp.name, target_wav=speaker_wav)
        return wav

    def tts_with_vc_to_file(
        self, text: str, language: str = None, speaker_wav: str = None, file_path: str = "output.wav"
    ):
        """Convert text to speech with voice conversion and save to file.

        Check `tts_with_vc` for more details.

        Args:
            text (str):
                Input text to synthesize.
            language (str, optional):
                Language code for multi-lingual models. You can check whether loaded model is multi-lingual
                `tts.is_multi_lingual` and list available languages by `tts.languages`. Defaults to None.
            speaker_wav (str, optional):
                Path to a reference wav file to use for voice cloning with supporting models like YourTTS.
                Defaults to None.
            file_path (str, optional):
                Output file path. Defaults to "output.wav".
        """
        wav = self.tts_with_vc(text=text, language=language, speaker_wav=speaker_wav)
        save_wav(wav=wav, path=file_path, sample_rate=self.voice_converter.vc_config.audio.output_sample_rate)


================================================
FILE: TTS/bin/__init__.py
================================================


================================================
FILE: TTS/bin/collect_env_info.py
================================================
"""Get detailed info about the working environment."""
import os
import platform
import sys

import numpy
import torch

sys.path += [os.path.abspath(".."), os.path.abspath(".")]
import json

import TTS


def system_info():
    return {
        "OS": platform.system(),
        "architecture": platform.architecture(),
        "version": platform.version(),
        "processor": platform.processor(),
        "python": platform.python_version(),
    }


def cuda_info():
    return {
        "GPU": [torch.cuda.get_device_name(i) for i in range(torch.cuda.device_count())],
        "available": torch.cuda.is_available(),
        "version": torch.version.cuda,
    }


def package_info():
    return {
        "numpy": numpy.__version__,
        "PyTorch_version": torch.__version__,
        "PyTorch_debug": torch.version.debug,
        "TTS": TTS.__version__,
    }


def main():
    details = {"System": system_info(), "CUDA": cuda_info(), "Packages": package_info()}
    print(json.dumps(details, indent=4, sort_keys=True))


if __name__ == "__main__":
    main()


================================================
FILE: TTS/bin/compute_attention_masks.py
================================================
import argparse
import importlib
import os
from argparse import RawTextHelpFormatter

import numpy as np
import torch
from torch.utils.data import DataLoader
from tqdm import tqdm

from TTS.config import load_config
from TTS.tts.datasets.TTSDataset import TTSDataset
from TTS.tts.models import setup_model
from TTS.tts.utils.text.characters import make_symbols, phonemes, symbols
from TTS.utils.audio import AudioProcessor
from TTS.utils.io import load_checkpoint

if __name__ == "__main__":
    # pylint: disable=bad-option-value
    parser = argparse.ArgumentParser(
        description="""Extract attention masks from trained Tacotron/Tacotron2 models.
These masks can be used for different purposes including training a TTS model with a Duration Predictor.\n\n"""
        """Each attention mask is written to the same path as the input wav file with ".npy" file extension.
(e.g. path/bla.wav (wav file) --> path/bla.npy (attention mask))\n"""
        """
Example run:
    CUDA_VISIBLE_DEVICE="0" python TTS/bin/compute_attention_masks.py
        --model_path /data/rw/home/Models/ljspeech-dcattn-December-14-2020_11+10AM-9d0e8c7/checkpoint_200000.pth
        --config_path /data/rw/home/Models/ljspeech-dcattn-December-14-2020_11+10AM-9d0e8c7/config.json
        --dataset_metafile metadata.csv
        --data_path /root/LJSpeech-1.1/
        --batch_size 32
        --dataset ljspeech
        --use_cuda True
""",
        formatter_class=RawTextHelpFormatter,
    )
    parser.add_argument("--model_path", type=str, required=True, help="Path to Tacotron/Tacotron2 model file ")
    parser.add_argument(
        "--config_path",
        type=str,
        required=True,
        help="Path to Tacotron/Tacotron2 config file.",
    )
    parser.add_argument(
        "--dataset",
        type=str,
        default="",
        required=True,
        help="Target dataset processor name from TTS.tts.dataset.preprocess.",
    )

    parser.add_argument(
        "--dataset_metafile",
        type=str,
        default="",
        required=True,
        help="Dataset metafile inclusing file paths with transcripts.",
    )
    parser.add_argument("--data_path", type=str, default="", help="Defines the data path. It overwrites config.json.")
    parser.add_argument("--use_cuda", type=bool, default=False, help="enable/disable cuda.")

    parser.add_argument(
        "--batch_size", default=16, type=int, help="Batch size for the model. Use batch_size=1 if you have no CUDA."
    )
    args = parser.parse_args()

    C = load_config(args.config_path)
    ap = AudioProcessor(**C.audio)

    # if the vocabulary was passed, replace the default
    if "characters" in C.keys():
        symbols, phonemes = make_symbols(**C.characters)

    # load the model
    num_chars = len(phonemes) if C.use_phonemes else len(symbols)
    # TODO: handle multi-speaker
    model = setup_model(C)
    model, _ = load_checkpoint(model, args.model_path, args.use_cuda, True)

    # data loader
    preprocessor = importlib.import_module("TTS.tts.datasets.formatters")
    preprocessor = getattr(preprocessor, args.dataset)
    meta_data = preprocessor(args.data_path, args.dataset_metafile)
    dataset = TTSDataset(
        model.decoder.r,
        C.text_cleaner,
        compute_linear_spec=False,
        ap=ap,
        meta_data=meta_data,
        characters=C.characters if "characters" in C.keys() else None,
        add_blank=C["add_blank"] if "add_blank" in C.keys() else False,
        use_phonemes=C.use_phonemes,
        phoneme_cache_path=C.phoneme_cache_path,
        phoneme_language=C.phoneme_language,
        enable_eos_bos=C.enable_eos_bos_chars,
    )

    dataset.sort_and_filter_items(C.get("sort_by_audio_len", default=False))
    loader = DataLoader(
        dataset,
        batch_size=args.batch_size,
        num_workers=4,
        collate_fn=dataset.collate_fn,
        shuffle=False,
        drop_last=False,
    )

    # compute attentions
    file_paths = []
    with torch.no_grad():
        for data in tqdm(loader):
            # setup input data
            text_input = data[0]
            text_lengths = data[1]
            linear_input = data[3]
            mel_input = data[4]
            mel_lengths = data[5]
            stop_targets = data[6]
            item_idxs = data[7]

            # dispatch data to GPU
            if args.use_cuda:
                text_input = text_input.cuda()
                text_lengths = text_lengths.cuda()
                mel_input = mel_input.cuda()
                mel_lengths = mel_lengths.cuda()

            model_outputs = model.forward(text_input, text_lengths, mel_input)

            alignments = model_outputs["alignments"].detach()
            for idx, alignment in enumerate(alignments):
                item_idx = item_idxs[idx]
                # interpolate if r > 1
                alignment = (
                    torch.nn.functional.interpolate(
                        alignment.transpose(0, 1).unsqueeze(0),
                        size=None,
                        scale_factor=model.decoder.r,
                        mode="nearest",
                        align_corners=None,
                        recompute_scale_factor=None,
                    )
                    .squeeze(0)
                    .transpose(0, 1)
                )
                # remove paddings
                alignment = alignment[: mel_lengths[idx], : text_lengths[idx]].cpu().numpy()
                # set file paths
                wav_file_name = os.path.basename(item_idx)
                align_file_name = os.path.splitext(wav_file_name)[0] + "_attn.npy"
                file_path = item_idx.replace(wav_file_name, align_file_name)
                # save output
                wav_file_abs_path = os.path.abspath(item_idx)
                file_abs_path = os.path.abspath(file_path)
                file_paths.append([wav_file_abs_path, file_abs_path])
                np.save(file_path, alignment)

        # ourput metafile
        metafile = os.path.join(args.data_path, "metadata_attn_mask.txt")

        with open(metafile, "w", encoding="utf-8") as f:
            for p in file_paths:
                f.write(f"{p[0]}|{p[1]}\n")
        print(f" >> Metafile created: {metafile}")


================================================
FILE: TTS/bin/compute_embeddings.py
================================================
import argparse
import os
from argparse import RawTextHelpFormatter

import torch
from tqdm import tqdm

from TTS.config import load_config
from TTS.config.shared_configs import BaseDatasetConfig
from TTS.tts.datasets import load_tts_samples
from TTS.tts.utils.managers import save_file
from TTS.tts.utils.speakers import SpeakerManager


def compute_embeddings(
    model_path,
    config_path,
    output_path,
    old_spakers_file=None,
    config_dataset_path=None,
    formatter_name=None,
    dataset_name=None,
    dataset_path=None,
    meta_file_train=None,
    meta_file_val=None,
    disable_cuda=False,
    no_eval=False,
):
    use_cuda = torch.cuda.is_available() and not disable_cuda

    if config_dataset_path is not None:
        c_dataset = load_config(config_dataset_path)
        meta_data_train, meta_data_eval = load_tts_samples(c_dataset.datasets, eval_split=not no_eval)
    else:
        c_dataset = BaseDatasetConfig()
        c_dataset.formatter = formatter_name
        c_dataset.dataset_name = dataset_name
        c_dataset.path = dataset_path
        if meta_file_train is not None:
            c_dataset.meta_file_train = meta_file_train
        if meta_file_val is not None:
            c_dataset.meta_file_val = meta_file_val
        meta_data_train, meta_data_eval = load_tts_samples(c_dataset, eval_split=not no_eval)

    if meta_data_eval is None:
        samples = meta_data_train
    else:
        samples = meta_data_train + meta_data_eval

    encoder_manager = SpeakerManager(
        encoder_model_path=model_path,
        encoder_config_path=config_path,
        d_vectors_file_path=old_spakers_file,
        use_cuda=use_cuda,
    )

    class_name_key = encoder_manager.encoder_config.class_name_key

    # compute speaker embeddings
    speaker_mapping = {}
    for fields in tqdm(samples):
        class_name = fields[class_name_key]
        audio_file = fields["audio_file"]
        embedding_key = fields["audio_unique_name"]

        if old_spakers_file is not None and embedding_key in encoder_manager.clip_ids:
            # get the embedding from the old file
            embedd = encoder_manager.get_embedding_by_clip(embedding_key)
        else:
            # extract the embedding
            embedd = encoder_manager.compute_embedding_from_clip(audio_file)

        # create speaker_mapping if target dataset is defined
        speaker_mapping[embedding_key] = {}
        speaker_mapping[embedding_key]["name"] = class_name
        speaker_mapping[embedding_key]["embedding"] = embedd

    if speaker_mapping:
        # save speaker_mapping if target dataset is defined
        if os.path.isdir(output_path):
            mapping_file_path = os.path.join(output_path, "speakers.pth")
        else:
            mapping_file_path = output_path

        if os.path.dirname(mapping_file_path) != "":
            os.makedirs(os.path.dirname(mapping_file_path), exist_ok=True)

        save_file(speaker_mapping, mapping_file_path)
        print("Speaker embeddings saved at:", mapping_file_path)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="""Compute embedding vectors for each audio file in a dataset and store them keyed by `{dataset_name}#{file_path}` in a .pth file\n\n"""
        """
        Example runs:
        python TTS/bin/compute_embeddings.py --model_path speaker_encoder_model.pth --config_path speaker_encoder_config.json  --config_dataset_path dataset_config.json

        python TTS/bin/compute_embeddings.py --model_path speaker_encoder_model.pth --config_path speaker_encoder_config.json  --formatter_name coqui --dataset_path /path/to/vctk/dataset --dataset_name my_vctk --meta_file_train /path/to/vctk/metafile_train.csv --meta_file_val /path/to/vctk/metafile_eval.csv
        """,
        formatter_class=RawTextHelpFormatter,
    )
    parser.add_argument(
        "--model_path",
        type=str,
        help="Path to model checkpoint file. It defaults to the released speaker encoder.",
        default="https://github.com/coqui-ai/TTS/releases/download/speaker_encoder_model/model_se.pth.tar",
    )
    parser.add_argument(
        "--config_path",
        type=str,
        help="Path to model config file. It defaults to the released speaker encoder config.",
        default="https://github.com/coqui-ai/TTS/releases/download/speaker_encoder_model/config_se.json",
    )
    parser.add_argument(
        "--config_dataset_path",
        type=str,
        help="Path to dataset config file. You either need to provide this or `formatter_name`, `dataset_name` and `dataset_path` arguments.",
        default=None,
    )
    parser.add_argument("--output_path", type=str, help="Path for output `pth` or `json` file.", default="speakers.pth")
    parser.add_argument(
        "--old_file", type=str, help="Previous embedding file to only compute new audios.", default=None
    )
    parser.add_argument("--disable_cuda", type=bool, help="Flag to disable cuda.", default=False)
    parser.add_argument("--no_eval", type=bool, help="Do not compute eval?. Default False", default=False)
    parser.add_argument(
        "--formatter_name",
        type=str,
        help="Name of the formatter to use. You either need to provide this or `config_dataset_path`",
        default=None,
    )
    parser.add_argument(
        "--dataset_name",
        type=str,
        help="Name of the dataset to use. You either need to provide this or `config_dataset_path`",
        default=None,
    )
    parser.add_argument(
        "--dataset_path",
        type=str,
        help="Path to the dataset. You either need to provide this or `config_dataset_path`",
        default=None,
    )
    parser.add_argument(
        "--meta_file_train",
        type=str,
        help="Path to the train meta file. If not set, dataset formatter uses the default metafile if it is defined in the formatter. You either need to provide this or `config_dataset_path`",
        default=None,
    )
    parser.add_argument(
        "--meta_file_val",
        type=str,
        help="Path to the evaluation meta file. If not set, dataset formatter uses the default metafile if it is defined in the formatter. You either need to provide this or `config_dataset_path`",
        default=None,
    )
    args = parser.parse_args()

    compute_embeddings(
        args.model_path,
        args.config_path,
        args.output_path,
        old_spakers_file=args.old_file,
        config_dataset_path=args.config_dataset_path,
        formatter_name=args.formatter_name,
        dataset_name=args.dataset_name,
        dataset_path=args.dataset_path,
        meta_file_train=args.meta_file_train,
        meta_file_val=args.meta_file_val,
        disable_cuda=args.disable_cuda,
        no_eval=args.no_eval,
    )


================================================
FILE: TTS/bin/compute_statistics.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import argparse
import glob
import os

import numpy as np
from tqdm import tqdm

# from TTS.utils.io import load_config
from TTS.config import load_config
from TTS.tts.datasets import load_tts_samples
from TTS.utils.audio import AudioProcessor


def main():
    """Run preprocessing process."""
    parser = argparse.ArgumentParser(description="Compute mean and variance of spectrogtram features.")
    parser.add_argument("config_path", type=str, help="TTS config file path to define audio processin parameters.")
    parser.add_argument("out_path", type=str, help="save path (directory and filename).")
    parser.add_argument(
        "--data_path",
        type=str,
        required=False,
        help="folder including the target set of wavs overriding dataset config.",
    )
    args, overrides = parser.parse_known_args()

    CONFIG = load_config(args.config_path)
    CONFIG.parse_known_args(overrides, relaxed_parser=True)

    # load config
    CONFIG.audio.signal_norm = False  # do not apply earlier normalization
    CONFIG.audio.stats_path = None  # discard pre-defined stats

    # load audio processor
    ap = AudioProcessor(**CONFIG.audio.to_dict())

    # load the meta data of target dataset
    if args.data_path:
        dataset_items = glob.glob(os.path.join(args.data_path, "**", "*.wav"), recursive=True)
    else:
        dataset_items = load_tts_samples(CONFIG.datasets)[0]  # take only train data
    print(f" > There are {len(dataset_items)} files.")

    mel_sum = 0
    mel_square_sum = 0
    linear_sum = 0
    linear_square_sum = 0
    N = 0
    for item in tqdm(dataset_items):
        # compute features
        wav = ap.load_wav(item if isinstance(item, str) else item["audio_file"])
        linear = ap.spectrogram(wav)
        mel = ap.melspectrogram(wav)

        # compute stats
        N += mel.shape[1]
        mel_sum += mel.sum(1)
        linear_sum += linear.sum(1)
        mel_square_sum += (mel**2).sum(axis=1)
        linear_square_sum += (linear**2).sum(axis=1)

    mel_mean = mel_sum / N
    mel_scale = np.sqrt(mel_square_sum / N - mel_mean**2)
    linear_mean = linear_sum / N
    linear_scale = np.sqrt(linear_square_sum / N - linear_mean**2)

    output_file_path = args.out_path
    stats = {}
    stats["mel_mean"] = mel_mean
    stats["mel_std"] = mel_scale
    stats["linear_mean"] = linear_mean
    stats["linear_std"] = linear_scale

    print(f" > Avg mel spec mean: {mel_mean.mean()}")
    print(f" > Avg mel spec scale: {mel_scale.mean()}")
    print(f" > Avg linear spec mean: {linear_mean.mean()}")
    print(f" > Avg linear spec scale: {linear_scale.mean()}")

    # set default config values for mean-var scaling
    CONFIG.audio.stats_path = output_file_path
    CONFIG.audio.signal_norm = True
    # remove redundant values
    del CONFIG.audio.max_norm
    del CONFIG.audio.min_level_db
    del CONFIG.audio.symmetric_norm
    del CONFIG.audio.clip_norm
    stats["audio_config"] = CONFIG.audio.to_dict()
    np.save(output_file_path, stats, allow_pickle=True)
    print(f" > stats saved to {output_file_path}")


if __name__ == "__main__":
    main()


================================================
FILE: TTS/bin/eval_encoder.py
================================================
import argparse
from argparse import RawTextHelpFormatter

import torch
from tqdm import tqdm

from TTS.config import load_config
from TTS.tts.datasets import load_tts_samples
from TTS.tts.utils.speakers import SpeakerManager


def compute_encoder_accuracy(dataset_items, encoder_manager):
    class_name_key = encoder_manager.encoder_config.class_name_key
    map_classid_to_classname = getattr(encoder_manager.encoder_config, "map_classid_to_classname", None)

    class_acc_dict = {}

    # compute embeddings for all wav_files
    for item in tqdm(dataset_items):
        class_name = item[class_name_key]
        wav_file = item["audio_file"]

        # extract the embedding
        embedd = encoder_manager.compute_embedding_from_clip(wav_file)
        if encoder_manager.encoder_criterion is not None and map_classid_to_classname is not None:
            embedding = torch.FloatTensor(embedd).unsqueeze(0)
            if encoder_manager.use_cuda:
                embedding = embedding.cuda()

            class_id = encoder_manager.encoder_criterion.softmax.inference(embedding).item()
            predicted_label = map_classid_to_classname[str(class_id)]
        else:
            predicted_label = None

        if class_name is not None and predicted_label is not None:
            is_equal = int(class_name == predicted_label)
            if class_name not in class_acc_dict:
                class_acc_dict[class_name] = [is_equal]
            else:
                class_acc_dict[class_name].append(is_equal)
        else:
            raise RuntimeError("Error: class_name or/and predicted_label are None")

    acc_avg = 0
    for key, values in class_acc_dict.items():
        acc = sum(values) / len(values)
        print("Class", key, "Accuracy:", acc)
        acc_avg += acc

    print("Average Accuracy:", acc_avg / len(class_acc_dict))


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="""Compute the accuracy of the encoder.\n\n"""
        """
        Example runs:
        python TTS/bin/eval_encoder.py emotion_encoder_model.pth emotion_encoder_config.json  dataset_config.json
        """,
        formatter_class=RawTextHelpFormatter,
    )
    parser.add_argument("model_path", type=str, help="Path to model checkpoint file.")
    parser.add_argument(
        "config_path",
        type=str,
        help="Path to model config file.",
    )

    parser.add_argument(
        "config_dataset_path",
        type=str,
        help="Path to dataset config file.",
    )
    parser.add_argument("--use_cuda", type=bool, help="flag to set cuda.", default=True)
    parser.add_argument("--eval", type=bool, help="compute eval.", default=True)

    args = parser.parse_args()

    c_dataset = load_config(args.config_dataset_path)

    meta_data_train, meta_data_eval = load_tts_samples(c_dataset.datasets, eval_split=args.eval)
    items = meta_data_train + meta_data_eval

    enc_manager = SpeakerManager(
        encoder_model_path=args.model_path, encoder_config_path=args.config_path, use_cuda=args.use_cuda
    )

    compute_encoder_accuracy(items, enc_manager)


================================================
FILE: TTS/bin/extract_tts_spectrograms.py
================================================
#!/usr/bin/env python3
"""Extract Mel spectrograms with teacher forcing."""

import argparse
import os

import numpy as np
import torch
from torch.utils.data import DataLoader
from tqdm import tqdm

from TTS.config import load_config
from TTS.tts.datasets import TTSDataset, load_tts_samples
from TTS.tts.models import setup_model
from TTS.tts.utils.speakers import SpeakerManager
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.utils.audio import AudioProcessor
from TTS.utils.generic_utils import count_parameters

use_cuda = torch.cuda.is_available()


def setup_loader(ap, r, verbose=False):
    tokenizer, _ = TTSTokenizer.init_from_config(c)
    dataset = TTSDataset(
        outputs_per_step=r,
        compute_linear_spec=False,
        samples=meta_data,
        tokenizer=tokenizer,
        ap=ap,
        batch_group_size=0,
        min_text_len=c.min_text_len,
        max_text_len=c.max_text_len,
        min_audio_len=c.min_audio_len,
        max_audio_len=c.max_audio_len,
        phoneme_cache_path=c.phoneme_cache_path,
        precompute_num_workers=0,
        use_noise_augment=False,
        verbose=verbose,
        speaker_id_mapping=speaker_manager.name_to_id if c.use_speaker_embedding else None,
        d_vector_mapping=speaker_manager.embeddings if c.use_d_vector_file else None,
    )

    if c.use_phonemes and c.compute_input_seq_cache:
        # precompute phonemes to have a better estimate of sequence lengths.
        dataset.compute_input_seq(c.num_loader_workers)
    dataset.preprocess_samples()

    loader = DataLoader(
        dataset,
        batch_size=c.batch_size,
        shuffle=False,
        collate_fn=dataset.collate_fn,
        drop_last=False,
        sampler=None,
        num_workers=c.num_loader_workers,
        pin_memory=False,
    )
    return loader


def set_filename(wav_path, out_path):
    wav_file = os.path.basename(wav_path)
    file_name = wav_file.split(".")[0]
    os.makedirs(os.path.join(out_path, "quant"), exist_ok=True)
    os.makedirs(os.path.join(out_path, "mel"), exist_ok=True)
    os.makedirs(os.path.join(out_path, "wav_gl"), exist_ok=True)
    os.makedirs(os.path.join(out_path, "wav"), exist_ok=True)
    wavq_path = os.path.join(out_path, "quant", file_name)
    mel_path = os.path.join(out_path, "mel", file_name)
    wav_gl_path = os.path.join(out_path, "wav_gl", file_name + ".wav")
    wav_path = os.path.join(out_path, "wav", file_name + ".wav")
    return file_name, wavq_path, mel_path, wav_gl_path, wav_path


def format_data(data):
    # setup input data
    text_input = data["token_id"]
    text_lengths = data["token_id_lengths"]
    mel_input = data["mel"]
    mel_lengths = data["mel_lengths"]
    item_idx = data["item_idxs"]
    d_vectors = data["d_vectors"]
    speaker_ids = data["speaker_ids"]
    attn_mask = data["attns"]
    avg_text_length = torch.mean(text_lengths.float())
    avg_spec_length = torch.mean(mel_lengths.float())

    # dispatch data to GPU
    if use_cuda:
        text_input = text_input.cuda(non_blocking=True)
        text_lengths = text_lengths.cuda(non_blocking=True)
        mel_input = mel_input.cuda(non_blocking=True)
        mel_lengths = mel_lengths.cuda(non_blocking=True)
        if speaker_ids is not None:
            speaker_ids = speaker_ids.cuda(non_blocking=True)
        if d_vectors is not None:
            d_vectors = d_vectors.cuda(non_blocking=True)
        if attn_mask is not None:
            attn_mask = attn_mask.cuda(non_blocking=True)
    return (
        text_input,
        text_lengths,
        mel_input,
        mel_lengths,
        speaker_ids,
        d_vectors,
        avg_text_length,
        avg_spec_length,
        attn_mask,
        item_idx,
    )


@torch.no_grad()
def inference(
    model_name,
    model,
    ap,
    text_input,
    text_lengths,
    mel_input,
    mel_lengths,
    speaker_ids=None,
    d_vectors=None,
):
    if model_name == "glow_tts":
        speaker_c = None
        if speaker_ids is not None:
            speaker_c = speaker_ids
        elif d_vectors is not None:
            speaker_c = d_vectors
        outputs = model.inference_with_MAS(
            text_input,
            text_lengths,
            mel_input,
            mel_lengths,
            aux_input={"d_vectors": speaker_c, "speaker_ids": speaker_ids},
        )
        model_output = outputs["model_outputs"]
        model_output = model_output.detach().cpu().numpy()

    elif "tacotron" in model_name:
        aux_input = {"speaker_ids": speaker_ids, "d_vectors": d_vectors}
        outputs = model(text_input, text_lengths, mel_input, mel_lengths, aux_input)
        postnet_outputs = outputs["model_outputs"]
        # normalize tacotron output
        if model_name == "tacotron":
            mel_specs = []
            postnet_outputs = postnet_outputs.data.cpu().numpy()
            for b in range(postnet_outputs.shape[0]):
                postnet_output = postnet_outputs[b]
                mel_specs.append(torch.FloatTensor(ap.out_linear_to_mel(postnet_output.T).T))
            model_output = torch.stack(mel_specs).cpu().numpy()

        elif model_name == "tacotron2":
            model_output = postnet_outputs.detach().cpu().numpy()
    return model_output


def extract_spectrograms(
    data_loader, model, ap, output_path, quantized_wav=False, save_audio=False, debug=False, metada_name="metada.txt"
):
    model.eval()
    export_metadata = []
    for _, data in tqdm(enumerate(data_loader), total=len(data_loader)):
        # format data
        (
            text_input,
            text_lengths,
            mel_input,
            mel_lengths,
            speaker_ids,
            d_vectors,
            _,
            _,
            _,
            item_idx,
        ) = format_data(data)

        model_output = inference(
            c.model.lower(),
            model,
            ap,
            text_input,
            text_lengths,
            mel_input,
            mel_lengths,
            speaker_ids,
            d_vectors,
        )

        for idx in range(text_input.shape[0]):
            wav_file_path = item_idx[idx]
            wav = ap.load_wav(wav_file_path)
            _, wavq_path, mel_path, wav_gl_path, wav_path = set_filename(wav_file_path, output_path)

            # quantize and save wav
            if quantized_wav:
                wavq = ap.quantize(wav)
                np.save(wavq_path, wavq)

            # save TTS mel
            mel = model_output[idx]
            mel_length = mel_lengths[idx]
            mel = mel[:mel_length, :].T
            np.save(mel_path, mel)

            export_metadata.append([wav_file_path, mel_path])
            if save_audio:
                ap.save_wav(wav, wav_path)

            if debug:
                print("Audio for debug saved at:", wav_gl_path)
                wav = ap.inv_melspectrogram(mel)
                ap.save_wav(wav, wav_gl_path)

    with open(os.path.join(output_path, metada_name), "w", encoding="utf-8") as f:
        for data in export_metadata:
            f.write(f"{data[0]}|{data[1]+'.npy'}\n")


def main(args):  # pylint: disable=redefined-outer-name
    # pylint: disable=global-variable-undefined
    global meta_data, speaker_manager

    # Audio processor
    ap = AudioProcessor(**c.audio)

    # load data instances
    meta_data_train, meta_data_eval = load_tts_samples(
        c.datasets, eval_split=args.eval, eval_split_max_size=c.eval_split_max_size, eval_split_size=c.eval_split_size
    )

    # use eval and training partitions
    meta_data = meta_data_train + meta_data_eval

    # init speaker manager
    if c.use_speaker_embedding:
        speaker_manager = SpeakerManager(data_items=meta_data)
    elif c.use_d_vector_file:
        speaker_manager = SpeakerManager(d_vectors_file_path=c.d_vector_file)
    else:
        speaker_manager = None

    # setup model
    model = setup_model(c)

    # restore model
    model.load_checkpoint(c, args.checkpoint_path, eval=True)

    if use_cuda:
        model.cuda()

    num_params = count_parameters(model)
    print("\n > Model has {} parameters".format(num_params), flush=True)
    # set r
    r = 1 if c.model.lower() == "glow_tts" else model.decoder.r
    own_loader = setup_loader(ap, r, verbose=True)

    extract_spectrograms(
        own_loader,
        model,
        ap,
        args.output_path,
        quantized_wav=args.quantized,
        save_audio=args.save_audio,
        debug=args.debug,
        metada_name="metada.txt",
    )


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--config_path", type=str, help="Path to config file for training.", required=True)
    parser.add_argument("--checkpoint_path", type=str, help="Model file to be restored.", required=True)
    parser.add_argument("--output_path", type=str, help="Path to save mel specs", required=True)
    parser.add_argument("--debug", default=False, action="store_true", help="Save audio files for debug")
    parser.add_argument("--save_audio", default=False, action="store_true", help="Save audio files")
    parser.add_argument("--quantized", action="store_true", help="Save quantized audio files")
    parser.add_argument("--eval", type=bool, help="compute eval.", default=True)
    args = parser.parse_args()

    c = load_config(args.config_path)
    c.audio.trim_silence = False
    main(args)


================================================
FILE: TTS/bin/find_unique_chars.py
================================================
"""Find all the unique characters in a dataset"""
import argparse
from argparse import RawTextHelpFormatter

from TTS.config import load_config
from TTS.tts.datasets import load_tts_samples


def main():
    # pylint: disable=bad-option-value
    parser = argparse.ArgumentParser(
        description="""Find all the unique characters or phonemes in a dataset.\n\n"""
        """
    Example runs:

    python TTS/bin/find_unique_chars.py --config_path config.json
    """,
        formatter_class=RawTextHelpFormatter,
    )
    parser.add_argument("--config_path", type=str, help="Path to dataset config file.", required=True)
    args = parser.parse_args()

    c = load_config(args.config_path)

    # load all datasets
    train_items, eval_items = load_tts_samples(
        c.datasets, eval_split=True, eval_split_max_size=c.eval_split_max_size, eval_split_size=c.eval_split_size
    )

    items = train_items + eval_items

    texts = "".join(item["text"] for item in items)
    chars = set(texts)
    lower_chars = filter(lambda c: c.islower(), chars)
    chars_force_lower = [c.lower() for c in chars]
    chars_force_lower = set(chars_force_lower)

    print(f" > Number of unique characters: {len(chars)}")
    print(f" > Unique characters: {''.join(sorted(chars))}")
    print(f" > Unique lower characters: {''.join(sorted(lower_chars))}")
    print(f" > Unique all forced to lower characters: {''.join(sorted(chars_force_lower))}")


if __name__ == "__main__":
    main()


================================================
FILE: TTS/bin/find_unique_phonemes.py
================================================
"""Find all the unique characters in a dataset"""
import argparse
import multiprocessing
from argparse import RawTextHelpFormatter

from tqdm.contrib.concurrent import process_map

from TTS.config import load_config
from TTS.tts.datasets import load_tts_samples
from TTS.tts.utils.text.phonemizers import Gruut


def compute_phonemes(item):
    text = item["text"]
    ph = phonemizer.phonemize(text).replace("|", "")
    return set(list(ph))


def main():
    # pylint: disable=W0601
    global c, phonemizer
    # pylint: disable=bad-option-value
    parser = argparse.ArgumentParser(
        description="""Find all the unique characters or phonemes in a dataset.\n\n"""
        """
    Example runs:

    python TTS/bin/find_unique_phonemes.py --config_path config.json
    """,
        formatter_class=RawTextHelpFormatter,
    )
    parser.add_argument("--config_path", type=str, help="Path to dataset config file.", required=True)
    args = parser.parse_args()

    c = load_config(args.config_path)

    # load all datasets
    train_items, eval_items = load_tts_samples(
        c.datasets, eval_split=True, eval_split_max_size=c.eval_split_max_size, eval_split_size=c.eval_split_size
    )
    items = train_items + eval_items
    print("Num items:", len(items))

    language_list = [item["language"] for item in items]
    is_lang_def = all(language_list)

    if not c.phoneme_language or not is_lang_def:
        raise ValueError("Phoneme language must be defined in config.")

    if not language_list.count(language_list[0]) == len(language_list):
        raise ValueError(
            "Currently, just one phoneme language per config file is supported !! Please split the dataset config into different configs and run it individually for each language !!"
        )

    phonemizer = Gruut(language=language_list[0], keep_puncs=True)

    phonemes = process_map(compute_phonemes, items, max_workers=multiprocessing.cpu_count(), chunksize=15)
    phones = []
    for ph in phonemes:
        phones.extend(ph)

    phones = set(phones)
    lower_phones = filter(lambda c: c.islower(), phones)
    phones_force_lower = [c.lower() for c in phones]
    phones_force_lower = set(phones_force_lower)

    print(f" > Number of unique phonemes: {len(phones)}")
    print(f" > Unique phonemes: {''.join(sorted(phones))}")
    print(f" > Unique lower phonemes: {''.join(sorted(lower_phones))}")
    print(f" > Unique all forced to lower phonemes: {''.join(sorted(phones_force_lower))}")


if __name__ == "__main__":
    main()


================================================
FILE: TTS/bin/remove_silence_using_vad.py
================================================
import argparse
import glob
import os
import pathlib

from tqdm import tqdm

from TTS.utils.vad import get_vad_model_and_utils, remove_silence


def adjust_path_and_remove_silence(audio_path):
    output_path = audio_path.replace(os.path.join(args.input_dir, ""), os.path.join(args.output_dir, ""))
    # ignore if the file exists
    if os.path.exists(output_path) and not args.force:
        return output_path

    # create all directory structure
    pathlib.Path(output_path).parent.mkdir(parents=True, exist_ok=True)
    # remove the silence and save the audio
    output_path, is_speech = remove_silence(
        model_and_utils,
        audio_path,
        output_path,
        trim_just_beginning_and_end=args.trim_just_beginning_and_end,
        use_cuda=args.use_cuda,
    )

    return output_path, is_speech


def preprocess_audios():
    files = sorted(glob.glob(os.path.join(args.input_dir, args.glob), recursive=True))
    print("> Number of files: ", len(files))
    if not args.force:
        print("> Ignoring files that already exist in the output idrectory.")

    if args.trim_just_beginning_and_end:
        print("> Trimming just the beginning and the end with nonspeech parts.")
    else:
        print("> Trimming all nonspeech parts.")

    filtered_files = []
    if files:
        # create threads
        # num_threads = multiprocessing.cpu_count()
        # process_map(adjust_path_and_remove_silence, files, max_workers=num_threads, chunksize=15)
        for f in tqdm(files):
            output_path, is_speech = adjust_path_and_remove_silence(f)
            if not is_speech:
                filtered_files.append(output_path)

        # write files that do not have speech
        with open(os.path.join(args.output_dir, "filtered_files.txt"), "w", encoding="utf-8") as f:
            for file in filtered_files:
                f.write(file + "\n")
    else:
        print("> No files Found !")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="python TTS/bin/remove_silence_using_vad.py -i=VCTK-Corpus/ -o=VCTK-Corpus-removed-silence/ -g=wav48_silence_trimmed/*/*_mic1.flac --trim_just_beginning_and_end True"
    )
    parser.add_argument("-i", "--input_dir", type=str, default="../VCTK-Corpus", help="Dataset root dir")
    parser.add_argument(
        "-o", "--output_dir", type=str, default="../VCTK-Corpus-removed-silence", help="Output Dataset dir"
    )
    parser.add_argument("-f", "--force", default=False, action="store_true", help="Force the replace of exists files")
    parser.add_argument(
        "-g",
        "--glob",
        type=str,
        default="**/*.wav",
        help="path in glob format for acess wavs from input_dir. ex: wav48/*/*.wav",
    )
    parser.add_argument(
        "-t",
        "--trim_just_beginning_and_end",
        type=bool,
        default=True,
        help="If True this script will trim just the beginning and end nonspeech parts. If False all nonspeech parts will be trim. Default True",
    )
    parser.add_argument(
        "-c",
        "--use_cuda",
        type=bool,
        default=False,
        help="If True use cuda",
    )
    args = parser.parse_args()
    # load the model and utils
    model_and_utils = get_vad_model_and_utils(use_cuda=args.use_cuda)
    preprocess_audios()


================================================
FILE: TTS/bin/resample.py
================================================
import argparse
import glob
import os
from argparse import RawTextHelpFormatter
from multiprocessing import Pool
from shutil import copytree

import librosa
import soundfile as sf
from tqdm import tqdm


def resample_file(func_args):
    filename, output_sr = func_args
    y, sr = librosa.load(filename, sr=output_sr)
    sf.write(filename, y, sr)


def resample_files(input_dir, output_sr, output_dir=None, file_ext="wav", n_jobs=10):
    if output_dir:
        print("Recursively copying the input folder...")
        copytree(input_dir, output_dir)
        input_dir = output_dir

    print("Resampling the audio files...")
    audio_files = glob.glob(os.path.join(input_dir, f"**/*.{file_ext}"), recursive=True)
    print(f"Found {len(audio_files)} files...")
    audio_files = list(zip(audio_files, len(audio_files) * [output_sr]))
    with Pool(processes=n_jobs) as p:
        with tqdm(total=len(audio_files)) as pbar:
            for _, _ in enumerate(p.imap_unordered(resample_file, audio_files)):
                pbar.update()

    print("Done !")


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="""Resample a folder recusively with librosa
                       Can be used in place or create a copy of the folder as an output.\n\n
                       Example run:
                            python TTS/bin/resample.py
                                --input_dir /root/LJSpeech-1.1/
                                --output_sr 22050
                                --output_dir /root/resampled_LJSpeech-1.1/
                                --file_ext wav
                                --n_jobs 24
                    """,
        formatter_class=RawTextHelpFormatter,
    )

    parser.add_argument(
        "--input_dir",
        type=str,
        default=None,
        required=True,
        help="Path of the folder containing the audio files to resample",
    )

    parser.add_argument(
        "--output_sr",
        type=int,
        default=22050,
        required=False,
        help="Samlple rate to which the audio files should be resampled",
    )

    parser.add_argument(
        "--output_dir",
        type=str,
        default=None,
        required=False,
        help="Path of the destination folder. If not defined, the operation is done in place",
    )

    parser.add_argument(
        "--file_ext",
        type=str,
        default="wav",
        required=False,
        help="Extension of the audio files to resample",
    )

    parser.add_argument(
        "--n_jobs", type=int, default=None, help="Number of threads to use, by default it uses all cores"
    )

    args = parser.parse_args()

    resample_files(args.input_dir, args.output_sr, args.output_dir, args.file_ext, args.n_jobs)


================================================
FILE: TTS/bin/synthesize.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import argparse
import sys
from argparse import RawTextHelpFormatter

# pylint: disable=redefined-outer-name, unused-argument
from pathlib import Path

from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer


def str2bool(v):
    if isinstance(v, bool):
        return v
    if v.lower() in ("yes", "true", "t", "y", "1"):
        return True
    if v.lower() in ("no", "false", "f", "n", "0"):
        return False
    raise argparse.ArgumentTypeError("Boolean value expected.")


def main():
    description = """Synthesize speech on command line.

You can either use your trained model or choose a model from the provided list.

If you don't specify any models, then it uses LJSpeech based English model.

## Example Runs

### Single Speaker Models

- List provided models:

    ```
    $ tts --list_models
    ```

- Query info for model info by idx:

    ```
    $ tts --model_info_by_idx "<model_type>/<model_query_idx>"
    ```

- Query info for model info by full name:

    ```
    $ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"
    ```

- Run TTS with default models:

    ```
    $ tts --text "Text for TTS"
    ```

- Run a TTS model with its default vocoder model:

    ```
    $ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>
    ```

- Run with specific TTS and vocoder models from the list:

    ```
    $ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --vocoder_name "<model_type>/<language>/<dataset>/<model_name>" --output_path
    ```

- Run your own TTS model (Using Griffin-Lim Vocoder):

    ```
    $ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav
    ```

- Run your own TTS and Vocoder models:
    ```
    $ tts --text "Text for TTS" --model_path path/to/config.json --config_path path/to/model.pth --out_path output/path/speech.wav
        --vocoder_path path/to/vocoder.pth --vocoder_config_path path/to/vocoder_config.json
    ```

### Multi-speaker Models

- List the available speakers and choose as <speaker_id> among them:

    ```
    $ tts --model_name "<language>/<dataset>/<model_name>"  --list_speaker_idxs
    ```

- Run the multi-speaker TTS model with the target speaker ID:

    ```
    $ tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>"  --speaker_idx <speaker_id>
    ```

- Run your own multi-speaker TTS model:

    ```
    $ tts --text "Text for TTS" --out_path output/path/speech.wav --model_path path/to/config.json --config_path path/to/model.pth --speakers_file_path path/to/speaker.json --speaker_idx <speaker_id>
    ```

### Voice Conversion Models

    ```
    $ tts --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>" --source_wav <path/to/speaker/wav> --target_wav <path/to/reference/wav>
    ```
    """
    # We remove Markdown code formatting programmatically here to allow us to copy-and-paste from main README to keep
    # documentation in sync more easily.
    parser = argparse.ArgumentParser(
        description=description.replace("    ```\n", ""),
        formatter_class=RawTextHelpFormatter,
    )

    parser.add_argument(
        "--list_models",
        type=str2bool,
        nargs="?",
        const=True,
        default=False,
        help="list available pre-trained TTS and vocoder models.",
    )

    parser.add_argument(
        "--model_info_by_idx",
        type=str,
        default=None,
        help="model info using query format: <model_type>/<model_query_idx>",
    )

    parser.add_argument(
        "--model_info_by_name",
        type=str,
        default=None,
        help="model info using query format: <model_type>/<language>/<dataset>/<model_name>",
    )

    parser.add_argument("--text", type=str, default=None, help="Text to generate speech.")

    # Args for running pre-trained TTS models.
    parser.add_argument(
        "--model_name",
        type=str,
        default="tts_models/en/ljspeech/tacotron2-DDC",
        help="Name of one of the pre-trained TTS models in format <language>/<dataset>/<model_name>",
    )
    parser.add_argument(
        "--vocoder_name",
        type=str,
        default=None,
        help="Name of one of the pre-trained  vocoder models in format <language>/<dataset>/<model_name>",
    )

    # Args for running custom models
    parser.add_argument("--config_path", default=None, type=str, help="Path to model config file.")
    parser.add_argument(
        "--model_path",
        type=str,
        default=None,
        help="Path to model file.",
    )
    parser.add_argument(
        "--out_path",
        type=str,
        default="tts_output.wav",
        help="Output wav file path.",
    )
    parser.add_argument("--use_cuda", type=bool, help="Run model on CUDA.", default=False)
    parser.add_argument(
        "--vocoder_path",
        type=str,
        help="Path to vocoder model file. If it is not defined, model uses GL as vocoder. Please make sure that you installed vocoder library before (WaveRNN).",
        default=None,
    )
    parser.add_argument("--vocoder_config_path", type=str, help="Path to vocoder model config file.", default=None)
    parser.add_argument(
        "--encoder_path",
        type=str,
        help="Path to speaker encoder model file.",
        default=None,
    )
    parser.add_argument("--encoder_config_path", type=str, help="Path to speaker encoder config file.", default=None)

    # args for multi-speaker synthesis
    parser.add_argument("--speakers_file_path", type=str, help="JSON file for multi-speaker model.", default=None)
    parser.add_argument("--language_ids_file_path", type=str, help="JSON file for multi-lingual model.", default=None)
    parser.add_argument(
        "--speaker_idx",
        type=str,
        help="Target speaker ID for a multi-speaker TTS model.",
        default=None,
    )
    parser.add_argument(
        "--language_idx",
        type=str,
        help="Target language ID for a multi-lingual TTS model.",
        default=None,
    )
    parser.add_argument(
        "--speaker_wav",
        nargs="+",
        help="wav file(s) to condition a multi-speaker TTS model with a Speaker Encoder. You can give multiple file paths. The d_vectors is computed as their average.",
        default=None,
    )
    parser.add_argument("--gst_style", help="Wav path file for GST style reference.", default=None)
    parser.add_argument(
        "--capacitron_style_wav", type=str, help="Wav path file for Capacitron prosody reference.", default=None
    )
    parser.add_argument("--capacitron_style_text", type=str, help="Transcription of the reference.", default=None)
    parser.add_argument(
        "--list_speaker_idxs",
        help="List available speaker ids for the defined multi-speaker model.",
        type=str2bool,
        nargs="?",
        const=True,
        default=False,
    )
    parser.add_argument(
        "--list_language_idxs",
        help="List available language ids for the defined multi-lingual model.",
        type=str2bool,
        nargs="?",
        const=True,
        default=False,
    )
    # aux args
    parser.add_argument(
        "--save_spectogram",
        type=bool,
        help="If true save raw spectogram for further (vocoder) processing in out_path.",
        default=False,
    )
    parser.add_argument(
        "--reference_wav",
        type=str,
        help="Reference wav file to convert in the voice of the speaker_idx or speaker_wav",
        default=None,
    )
    parser.add_argument(
        "--reference_speaker_idx",
        type=str,
        help="speaker ID of the reference_wav speaker (If not provided the embedding will be computed using the Speaker Encoder).",
        default=None,
    )
    parser.add_argument(
        "--progress_bar",
        type=str2bool,
        help="If true shows a progress bar for the model download. Defaults to True",
        default=True,
    )

    # voice conversion args
    parser.add_argument(
        "--source_wav",
        type=str,
        default=None,
        help="Original audio file to convert in the voice of the target_wav",
    )
    parser.add_argument(
        "--target_wav",
        type=str,
        default=None,
        help="Target audio file to convert in the voice of the source_wav",
    )

    args = parser.parse_args()

    # print the description if either text or list_models is not set
    check_args = [
        args.text,
        args.list_models,
        args.list_speaker_idxs,
        args.list_language_idxs,
        args.reference_wav,
        args.model_info_by_idx,
        args.model_info_by_name,
        args.source_wav,
        args.target_wav,
    ]
    if not any(check_args):
        parser.parse_args(["-h"])

    # load model manager
    path = Path(__file__).parent / "../.models.json"
    manager = ModelManager(path, progress_bar=args.progress_bar)

    tts_path = None
    tts_config_path = None
    speakers_file_path = None
    language_ids_file_path = None
    vocoder_path = None
    vocoder_config_path = None
    encoder_path = None
    encoder_config_path = None
    vc_path = None
    vc_config_path = None

    # CASE1 #list : list pre-trained TTS models
    if args.list_models:
        manager.list_models()
        sys.exit()

    # CASE2 #info : model info for pre-trained TTS models
    if args.model_info_by_idx:
        model_query = args.model_info_by_idx
        manager.model_info_by_idx(model_query)
        sys.exit()

    if args.model_info_by_name:
        model_query_full_name = args.model_info_by_name
        manager.model_info_by_full_name(model_query_full_name)
        sys.exit()

    # CASE3: load pre-trained model paths
    if args.model_name is not None and not args.model_path:
        model_path, config_path, model_item = manager.download_model(args.model_name)

        # tts model
        if model_item["model_type"] == "tts_models":
            tts_path = model_path
            tts_config_path = config_path
            if "default_vocoder" in model_item:
                args.vocoder_name = model_item["default_vocoder"] if args.vocoder_name is None else args.vocoder_name

        # voice conversion model
        if model_item["model_type"] == "voice_conversion_models":
            vc_path = model_path
            vc_config_path = config_path

    # load vocoder
    if args.vocoder_name is not None and not args.vocoder_path:
        vocoder_path, vocoder_config_path, _ = manager.download_model(args.vocoder_name)

    # CASE4: set custom model paths
    if args.model_path is not None:
        tts_path = args.model_path
        tts_config_path = args.config_path
        speakers_file_path = args.speakers_file_path
        language_ids_file_path = args.language_ids_file_path

    if args.vocoder_path is not None:
        vocoder_path = args.vocoder_path
        vocoder_config_path = args.vocoder_config_path

    if args.encoder_path is not None:
        encoder_path = args.encoder_path
        encoder_config_path = args.encoder_config_path

    # load models
    synthesizer = Synthesizer(
        tts_path,
        tts_config_path,
        speakers_file_path,
        language_ids_file_path,
        vocoder_path,
        vocoder_config_path,
        encoder_path,
        encoder_config_path,
        vc_path,
        vc_config_path,
        args.use_cuda,
    )

    # query speaker ids of a multi-speaker model.
    if args.list_speaker_idxs:
        print(
            " > Available speaker ids: (Set --speaker_idx flag to one of these values to use the multi-speaker model."
        )
        print(synthesizer.tts_model.speaker_manager.name_to_id)
        return

    # query langauge ids of a multi-lingual model.
    if args.list_language_idxs:
        print(
            " > Available language ids: (Set --language_idx flag to one of these values to use the multi-lingual model."
        )
        print(synthesizer.tts_model.language_manager.name_to_id)
        return

    # check the arguments against a multi-speaker model.
    if synthesizer.tts_speakers_file and (not args.speaker_idx and not args.speaker_wav):
        print(
            " [!] Looks like you use a multi-speaker model. Define `--speaker_idx` to "
            "select the target speaker. You can list the available speakers for this model by `--list_speaker_idxs`."
        )
        return

    # RUN THE SYNTHESIS
    if args.text:
        print(" > Text: {}".format(args.text))

    # kick it
    if tts_path is not None:
        wav = synthesizer.tts(
            args.text,
            args.speaker_idx,
            args.language_idx,
            args.speaker_wav,
            reference_wav=args.reference_wav,
            style_wav=args.capacitron_style_wav,
            style_text=args.capacitron_style_text,
            reference_speaker_name=args.reference_speaker_idx,
        )
    elif vc_path is not None:
        wav = synthesizer.voice_conversion(
            source_wav=args.source_wav,
            target_wav=args.target_wav,
        )

    # save the results
    print(" > Saving output to {}".format(args.out_path))
    synthesizer.save_wav(wav, args.out_path)


if __name__ == "__main__":
    main()


================================================
FILE: TTS/bin/train_encoder.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import os
import sys
import time
import traceback

import torch
from torch.utils.data import DataLoader
from trainer.torch import NoamLR
from trainer.trainer_utils import get_optimizer

from TTS.encoder.dataset import EncoderDataset
from TTS.encoder.utils.generic_utils import save_best_model, save_checkpoint, setup_encoder_model
from TTS.encoder.utils.training import init_training
from TTS.encoder.utils.visual import plot_embeddings
from TTS.tts.datasets import load_tts_samples
from TTS.utils.audio import AudioProcessor
from TTS.utils.generic_utils import count_parameters, remove_experiment_folder
from TTS.utils.io import copy_model_files
from TTS.utils.samplers import PerfectBatchSampler
from TTS.utils.training import check_update

torch.backends.cudnn.enabled = True
torch.backends.cudnn.benchmark = True
torch.manual_seed(54321)
use_cuda = torch.cuda.is_available()
num_gpus = torch.cuda.device_count()
print(" > Using CUDA: ", use_cuda)
print(" > Number of GPUs: ", num_gpus)


def setup_loader(ap: AudioProcessor, is_val: bool = False, verbose: bool = False):
    num_utter_per_class = c.num_utter_per_class if not is_val else c.eval_num_utter_per_class
    num_classes_in_batch = c.num_classes_in_batch if not is_val else c.eval_num_classes_in_batch

    dataset = EncoderDataset(
        c,
        ap,
        meta_data_eval if is_val else meta_data_train,
        voice_len=c.voice_len,
        num_utter_per_class=num_utter_per_class,
        num_classes_in_batch=num_classes_in_batch,
        verbose=verbose,
        augmentation_config=c.audio_augmentation if not is_val else None,
        use_torch_spec=c.model_params.get("use_torch_spec", False),
    )
    # get classes list
    classes = dataset.get_class_list()

    sampler = PerfectBatchSampler(
        dataset.items,
        classes,
        batch_size=num_classes_in_batch * num_utter_per_class,  # total batch size
        num_classes_in_batch=num_classes_in_batch,
        num_gpus=1,
        shuffle=not is_val,
        drop_last=True,
    )

    if len(classes) < num_classes_in_batch:
        if is_val:
            raise RuntimeError(
                f"config.eval_num_classes_in_batch ({num_classes_in_batch}) need to be <= {len(classes)} (Number total of Classes in the Eval dataset) !"
            )
        raise RuntimeError(
            f"config.num_classes_in_batch ({num_classes_in_batch}) need to be <= {len(classes)} (Number total of Classes in the Train dataset) !"
        )

    # set the classes to avoid get wrong class_id when the number of training and eval classes are not equal
    if is_val:
        dataset.set_classes(train_classes)

    loader = DataLoader(
        dataset,
        num_workers=c.num_loader_workers,
        batch_sampler=sampler,
        collate_fn=dataset.collate_fn,
    )

    return loader, classes, dataset.get_map_classid_to_classname()


def evaluation(model, criterion, data_loader, global_step):
    eval_loss = 0
    for _, data in enumerate(data_loader):
        with torch.no_grad():
            # setup input data
            inputs, labels = data

            # agroup samples of each class in the batch. perfect sampler produces [3,2,1,3,2,1] we need [3,3,2,2,1,1]
            labels = torch.transpose(
                labels.view(c.eval_num_utter_per_class, c.eval_num_classes_in_batch), 0, 1
            ).reshape(labels.shape)
            inputs = torch.transpose(
                inputs.view(c.eval_num_utter_per_class, c.eval_num_classes_in_batch, -1), 0, 1
            ).reshape(inputs.shape)

            # dispatch data to GPU
            if use_cuda:
                inputs = inputs.cuda(non_blocking=True)
                labels = labels.cuda(non_blocking=True)

            # forward pass model
            outputs = model(inputs)

            # loss computation
            loss = criterion(
                outputs.view(c.eval_num_classes_in_batch, outputs.shape[0] // c.eval_num_classes_in_batch, -1), labels
            )

            eval_loss += loss.item()

    eval_avg_loss = eval_loss / len(data_loader)
    # save stats
    dashboard_logger.eval_stats(global_step, {"loss": eval_avg_loss})
    # plot the last batch in the evaluation
    figures = {
        "UMAP Plot": plot_embeddings(outputs.detach().cpu().numpy(), c.num_classes_in_batch),
    }
    dashboard_logger.eval_figures(global_step, figures)
    return eval_avg_loss


def train(model, optimizer, scheduler, criterion, data_loader, eval_data_loader, global_step):
    model.train()
    best_loss = float("inf")
    avg_loader_time = 0
    end_time = time.time()
    for epoch in range(c.epochs):
        tot_loss = 0
        epoch_time = 0
        for _, data in enumerate(data_loader):
            start_time = time.time()

            # setup input data
            inputs, labels = data
            # agroup samples of each class in the batch. perfect sampler produces [3,2,1,3,2,1] we need [3,3,2,2,1,1]
            labels = torch.transpose(labels.view(c.num_utter_per_class, c.num_classes_in_batch), 0, 1).reshape(
                labels.shape
            )
            inputs = torch.transpose(inputs.view(c.num_utter_per_class, c.num_classes_in_batch, -1), 0, 1).reshape(
                inputs.shape
            )
            # ToDo: move it to a unit test
            # labels_converted = torch.transpose(labels.view(c.num_utter_per_class, c.num_classes_in_batch), 0, 1).reshape(labels.shape)
            # inputs_converted = torch.transpose(inputs.view(c.num_utter_per_class, c.num_classes_in_batch, -1), 0, 1).reshape(inputs.shape)
            # idx = 0
            # for j in range(0, c.num_classes_in_batch, 1):
            #     for i in range(j, len(labels), c.num_classes_in_batch):
            #         if not torch.all(labels[i].eq(labels_converted[idx])) or not torch.all(inputs[i].eq(inputs_converted[idx])):
            #             print("Invalid")
            #             print(labels)
            #             exit()
            #         idx += 1
            # labels = labels_converted
            # inputs = inputs_converted

            loader_time = time.time() - end_time
            global_step += 1

            # setup lr
            if c.lr_decay:
                scheduler.step()
            optimizer.zero_grad()

            # dispatch data to GPU
            if use_cuda:
                inputs = inputs.cuda(non_blocking=True)
                labels = labels.cuda(non_blocking=True)

            # forward pass model
            outputs = model(inputs)

            # loss computation
            loss = criterion(
                outputs.view(c.num_classes_in_batch, outputs.shape[0] // c.num_classes_in_batch, -1), labels
            )
            loss.backward()
            grad_norm, _ = check_update(model, c.grad_clip)
            optimizer.step()

            step_time = time.time() - start_time
            epoch_time += step_time

            # acumulate the total epoch loss
            tot_loss += loss.item()

            # Averaged Loader Time
            num_loader_workers = c.num_loader_workers if c.num_loader_workers > 0 else 1
            avg_loader_time = (
                1 / num_loader_workers * loader_time + (num_loader_workers - 1) / num_loader_workers * avg_loader_time
                if avg_loader_time != 0
                else loader_time
            )
            current_lr = optimizer.param_groups[0]["lr"]

            if global_step % c.steps_plot_stats == 0:
                # Plot Training Epoch Stats
                train_stats = {
                    "loss": loss.item(),
                    "lr": current_lr,
                    "grad_norm": grad_norm,
                    "step_time": step_time,
                    "avg_loader_time": avg_loader_time,
                }
                dashboard_logger.train_epoch_stats(global_step, train_stats)
                figures = {
                    "UMAP Plot": plot_embeddings(outputs.detach().cpu().numpy(), c.num_classes_in_batch),
                }
                dashboard_logger.train_figures(global_step, figures)

            if global_step % c.print_step == 0:
                print(
                    "   | > Step:{}  Loss:{:.5f}  GradNorm:{:.5f}  "
                    "StepTime:{:.2f}  LoaderTime:{:.2f}  AvGLoaderTime:{:.2f}  LR:{:.6f}".format(
                        global_step, loss.item(), grad_norm, step_time, loader_time, avg_loader_time, current_lr
                    ),
                    flush=True,
                )

            if global_step % c.save_step == 0:
                # save model
                save_checkpoint(model, optimizer, criterion, loss.item(), OUT_PATH, global_step, epoch)

            end_time = time.time()

        print("")
        print(
            ">>> Epoch:{}  AvgLoss: {:.5f} GradNorm:{:.5f}  "
            "EpochTime:{:.2f} AvGLoaderTime:{:.2f} ".format(
                epoch, tot_loss / len(data_loader), grad_norm, epoch_time, avg_loader_time
            ),
            flush=True,
        )
        # evaluation
        if c.run_eval:
            model.eval()
            eval_loss = evaluation(model, criterion, eval_data_loader, global_step)
            print("\n\n")
            print("--> EVAL PERFORMANCE")
            print(
                "   | > Epoch:{}  AvgLoss: {:.5f} ".format(epoch, eval_loss),
                flush=True,
            )
            # save the best checkpoint
            best_loss = save_best_model(model, optimizer, criterion, eval_loss, best_loss, OUT_PATH, global_step, epoch)
            model.train()

    return best_loss, global_step


def main(args):  # pylint: disable=redefined-outer-name
    # pylint: disable=global-variable-undefined
    global meta_data_train
    global meta_data_eval
    global train_classes

    ap = AudioProcessor(**c.audio)
    model = setup_encoder_model(c)

    optimizer = get_optimizer(c.optimizer, c.optimizer_params, c.lr, model)

    # pylint: disable=redefined-outer-name
    meta_data_train, meta_data_eval = load_tts_samples(c.datasets, eval_split=True)

    train_data_loader, train_classes, map_classid_to_classname = setup_loader(ap, is_val=False, verbose=True)
    if c.run_eval:
        eval_data_loader, _, _ = setup_loader(ap, is_val=True, verbose=True)
    else:
        eval_data_loader = None

    num_classes = len(train_classes)
    criterion = model.get_criterion(c, num_classes)

    if c.loss == "softmaxproto" and c.model != "speaker_encoder":
        c.map_classid_to_classname = map_classid_to_classname
        copy_model_files(c, OUT_PATH)

    if args.restore_path:
        criterion, args.restore_step = model.load_checkpoint(
            c, args.restore_path, eval=False, use_cuda=use_cuda, criterion=criterion
        )
        print(" > Model restored from step %d" % args.restore_step, flush=True)
    else:
        args.restore_step = 0

    if c.lr_decay:
        scheduler = NoamLR(optimizer, warmup_steps=c.warmup_steps, last_epoch=args.restore_step - 1)
    else:
        scheduler = None

    num_params = count_parameters(model)
    print("\n > Model has {} parameters".format(num_params), flush=True)

    if use_cuda:
        model = model.cuda()
        criterion.cuda()

    global_step = args.restore_step
    _, global_step = train(model, optimizer, scheduler, criterion, train_data_loader, eval_data_loader, global_step)


if __name__ == "__main__":
    args, c, OUT_PATH, AUDIO_PATH, c_logger, dashboard_logger = init_training()

    try:
        main(args)
    except KeyboardInterrupt:
        remove_experiment_folder(OUT_PATH)
        try:
            sys.exit(0)
        except SystemExit:
            os._exit(0)  # pylint: disable=protected-access
    except Exception:  # pylint: disable=broad-except
        remove_experiment_folder(OUT_PATH)
        traceback.print_exc()
        sys.exit(1)


================================================
FILE: TTS/bin/train_tts.py
================================================
import os
from dataclasses import dataclass, field

from trainer import Trainer, TrainerArgs

from TTS.config import load_config, register_config
from TTS.tts.datasets import load_tts_samples
from TTS.tts.models import setup_model


@dataclass
class TrainTTSArgs(TrainerArgs):
    config_path: str = field(default=None, metadata={"help": "Path to the config file."})


def main():
    """Run `tts` model training directly by a `config.json` file."""
    # init trainer args
    train_args = TrainTTSArgs()
    parser = train_args.init_argparse(arg_prefix="")

    # override trainer args from comman-line args
    args, config_overrides = parser.parse_known_args()
    train_args.parse_args(args)

    # load config.json and register
    if args.config_path or args.continue_path:
        if args.config_path:
            # init from a file
            config = load_config(args.config_path)
            if len(config_overrides) > 0:
                config.parse_known_args(config_overrides, relaxed_parser=True)
        elif args.continue_path:
            # continue from a prev experiment
            config = load_config(os.path.join(args.continue_path, "config.json"))
            if len(config_overrides) > 0:
                config.parse_known_args(config_overrides, relaxed_parser=True)
        else:
            # init from console args
            from TTS.config.shared_configs import BaseTrainingConfig  # pylint: disable=import-outside-toplevel

            config_base = BaseTrainingConfig()
            config_base.parse_known_args(config_overrides)
            config = register_config(config_base.model)()

    # load training samples
    train_samples, eval_samples = load_tts_samples(
        config.datasets,
        eval_split=True,
        eval_split_max_size=config.eval_split_max_size,
        eval_split_size=config.eval_split_size,
    )

    # init the model from config
    model = setup_model(config, train_samples + eval_samples)

    # init the trainer and 🚀
    trainer = Trainer(
        train_args,
        model.config,
        config.output_path,
        model=model,
        train_samples=train_samples,
        eval_samples=eval_samples,
        parse_command_line_args=False,
    )
    trainer.fit()


if __name__ == "__main__":
    main()


================================================
FILE: TTS/bin/train_vocoder.py
================================================
import os
from dataclasses import dataclass, field

from trainer import Trainer, TrainerArgs

from TTS.config import load_config, register_config
from TTS.utils.audio import AudioProcessor
from TTS.vocoder.datasets.preprocess import load_wav_data, load_wav_feat_data
from TTS.vocoder.models import setup_model


@dataclass
class TrainVocoderArgs(TrainerArgs):
    config_path: str = field(default=None, metadata={"help": "Path to the config file."})


def main():
    """Run `tts` model training directly by a `config.json` file."""
    # init trainer args
    train_args = TrainVocoderArgs()
    parser = train_args.init_argparse(arg_prefix="")

    # override trainer args from comman-line args
    args, config_overrides = parser.parse_known_args()
    train_args.parse_args(args)

    # load config.json and register
    if args.config_path or args.continue_path:
        if args.config_path:
            # init from a file
            config = load_config(args.config_path)
            if len(config_overrides) > 0:
                config.parse_known_args(config_overrides, relaxed_parser=True)
        elif args.continue_path:
            # continue from a prev experiment
            config = load_config(os.path.join(args.continue_path, "config.json"))
            if len(config_overrides) > 0:
                config.parse_known_args(config_overrides, relaxed_parser=True)
        else:
            # init from console args
            from TTS.config.shared_configs import BaseTrainingConfig  # pylint: disable=import-outside-toplevel

            config_base = BaseTrainingConfig()
            config_base.parse_known_args(config_overrides)
            config = register_config(config_base.model)()

    # load training samples
    if "feature_path" in config and config.feature_path:
        # load pre-computed features
        print(f" > Loading features from: {config.feature_path}")
        eval_samples, train_samples = load_wav_feat_data(config.data_path, config.feature_path, config.eval_split_size)
    else:
        # load data raw wav files
        eval_samples, train_samples = load_wav_data(config.data_path, config.eval_split_size)

    # setup audio processor
    ap = AudioProcessor(**config.audio)

    # init the model from config
    model = setup_model(config)

    # init the trainer and 🚀
    trainer = Trainer(
        train_args,
        config,
        config.output_path,
        model=model,
        train_samples=train_samples,
        eval_samples=eval_samples,
        training_assets={"audio_processor": ap},
        parse_command_line_args=False,
    )
    trainer.fit()


if __name__ == "__main__":
    main()


================================================
FILE: TTS/bin/tune_wavegrad.py
================================================
"""Search a good noise schedule for WaveGrad for a given number of inference iterations"""
import argparse
from itertools import product as cartesian_product

import numpy as np
import torch
from torch.utils.data import DataLoader
from tqdm import tqdm

from TTS.config import load_config
from TTS.utils.audio import AudioProcessor
from TTS.vocoder.datasets.preprocess import load_wav_data
from TTS.vocoder.datasets.wavegrad_dataset import WaveGradDataset
from TTS.vocoder.models import setup_model

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--model_path", type=str, help="Path to model checkpoint.")
    parser.add_argument("--config_path", type=str, help="Path to model config file.")
    parser.add_argument("--data_path", type=str, help="Path to data directory.")
    parser.add_argument("--output_path", type=str, help="path for output file including file name and extension.")
    parser.add_argument(
        "--num_iter",
        type=int,
        help="Number of model inference iterations that you like to optimize noise schedule for.",
    )
    parser.add_argument("--use_cuda", action="store_true", help="enable CUDA.")
    parser.add_argument("--num_samples", type=int, default=1, help="Number of datasamples used for inference.")
    parser.add_argument(
        "--search_depth",
        type=int,
        default=3,
        help="Search granularity. Increasing this increases the run-time exponentially.",
    )

    # load config
    args = parser.parse_args()
    config = load_config(args.config_path)

    # setup audio processor
    ap = AudioProcessor(**config.audio)

    # load dataset
    _, train_data = load_wav_data(args.data_path, 0)
    train_data = train_data[: args.num_samples]
    dataset = WaveGradDataset(
        ap=ap,
        items=train_data,
        seq_len=-1,
        hop_len=ap.hop_length,
        pad_short=config.pad_short,
        conv_pad=config.conv_pad,
        is_training=True,
        return_segments=False,
        use_noise_augment=False,
        use_cache=False,
        verbose=True,
    )
    loader = DataLoader(
        dataset,
        batch_size=1,
        shuffle=False,
        collate_fn=dataset.collate_full_clips,
        drop_last=False,
        num_workers=config.num_loader_workers,
        pin_memory=False,
    )

    # setup the model
    model = setup_model(config)
    if args.use_cuda:
        model.cuda()

    # setup optimization parameters
    base_values = sorted(10 * np.random.uniform(size=args.search_depth))
    print(f" > base values: {base_values}")
    exponents = 10 ** np.linspace(-6, -1, num=args.num_iter)
    best_error = float("inf")
    best_schedule = None  # pylint: disable=C0103
    total_search_iter = len(base_values) ** args.num_iter
    for base in tqdm(cartesian_product(base_values, repeat=args.num_iter), total=total_search_iter):
        beta = exponents * base
        model.compute_noise_level(beta)
        for data in loader:
            mel, audio = data
            y_hat = model.inference(mel.cuda() if args.use_cuda else mel)

            if args.use_cuda:
                y_hat = y_hat.cpu()
            y_hat = y_hat.numpy()

            mel_hat = []
            for i in range(y_hat.shape[0]):
                m = ap.melspectrogram(y_hat[i, 0])[:, :-1]
                mel_hat.append(torch.from_numpy(m))

            mel_hat = torch.stack(mel_hat)
            mse = torch.sum((mel - mel_hat) ** 2).mean()
            if mse.item() < best_error:
                best_error = mse.item()
                best_schedule = {"beta": beta}
                print(f" > Found a better schedule. - MSE: {mse.item()}")
                np.save(args.output_path, best_schedule)


================================================
FILE: TTS/config/__init__.py
================================================
import json
import os
import re
from typing import Dict

import fsspec
import yaml
from coqpit import Coqpit

from TTS.config.shared_configs import *
from TTS.utils.generic_utils import find_module


def read_json_with_comments(json_path):
    """for backward compat."""
    # fallback to json
    with fsspec.open(json_path, "r", encoding="utf-8") as f:
        input_str = f.read()
    # handle comments
    input_str = re.sub(r"\\\n", "", input_str)
    input_str = re.sub(r"//.*\n", "\n", input_str)
    data = json.loads(input_str)
    return data


def register_config(model_name: str) -> Coqpit:
    """Find the right config for the given model name.

    Args:
        model_name (str): Model name.

    Raises:
        ModuleNotFoundError: No matching config for the model name.

    Returns:
        Coqpit: config class.
    """
    config_class = None
    config_name = model_name + "_config"
    paths = ["TTS.tts.configs", "TTS.vocoder.configs", "TTS.encoder.configs", "TTS.vc.configs"]
    for path in paths:
        try:
            config_class = find_module(path, config_name)
        except ModuleNotFoundError:
            pass
    if config_class is None:
        raise ModuleNotFoundError(f" [!] Config for {model_name} cannot be found.")
    return config_class


def _process_model_name(config_dict: Dict) -> str:
    """Format the model name as expected. It is a band-aid for the old `vocoder` model names.

    Args:
        config_dict (Dict): A dictionary including the config fields.

    Returns:
        str: Formatted modelname.
    """
    model_name = config_dict["model"] if "model" in config_dict else config_dict["generator_model"]
    model_name = model_name.replace("_generator", "").replace("_discriminator", "")
    return model_name


def load_config(config_path: str) -> Coqpit:
    """Import `json` or `yaml` files as TTS configs. First, load the input file as a `dict` and check the model name
    to find the corresponding Config class. Then initialize the Config.

    Args:
        config_path (str): path to the config file.

    Raises:
        TypeError: given config file has an unknown type.

    Returns:
        Coqpit: TTS config object.
    """
    config_dict = {}
    ext = os.path.splitext(config_path)[1]
    if ext in (".yml", ".yaml"):
        with fsspec.open(config_path, "r", encoding="utf-8") as f:
            data = yaml.safe_load(f)
    elif ext == ".json":
        try:
            with fsspec.open(config_path, "r", encoding="utf-8") as f:
                data = json.load(f)
        except json.decoder.JSONDecodeError:
            # backwards compat.
            data = read_json_with_comments(config_path)
    else:
        raise TypeError(f" [!] Unknown config file type {ext}")
    config_dict.update(data)
    model_name = _process_model_name(config_dict)
    config_class = register_config(model_name.lower())
    config = config_class()
    config.from_dict(config_dict)
    return config


def check_config_and_model_args(config, arg_name, value):
    """Check the give argument in `config.model_args` if exist or in `config` for
    the given value.

    Return False if the argument does not exist in `config.model_args` or `config`.
    This is to patch up the compatibility between models with and without `model_args`.

    TODO: Remove this in the future with a unified approach.
    """
    if hasattr(config, "model_args"):
        if arg_name in config.model_args:
            return config.model_args[arg_name] == value
    if hasattr(config, arg_name):
        return config[arg_name] == value
    return False


def get_from_config_or_model_args(config, arg_name):
    """Get the given argument from `config.model_args` if exist or in `config`."""
    if hasattr(config, "model_args"):
        if arg_name in config.model_args:
            return config.model_args[arg_name]
    return config[arg_name]


def get_from_config_or_model_args_with_default(config, arg_name, def_val):
    """Get the given argument from `config.model_args` if exist or in `config`."""
    if hasattr(config, "model_args"):
        if arg_name in config.model_args:
            return config.model_args[arg_name]
    if hasattr(config, arg_name):
        return config[arg_name]
    return def_val


================================================
FILE: TTS/config/shared_configs.py
================================================
from dataclasses import asdict, dataclass
from typing import List

from coqpit import Coqpit, check_argument
from trainer import TrainerConfig


@dataclass
class BaseAudioConfig(Coqpit):
    """Base config to definge audio processing parameters. It is used to initialize
    ```TTS.utils.audio.AudioProcessor.```

    Args:
        fft_size (int):
            Number of STFT frequency levels aka.size of the linear spectogram frame. Defaults to 1024.

        win_length (int):
            Each frame of audio is windowed by window of length ```win_length``` and then padded with zeros to match
            ```fft_size```. Defaults to 1024.

        hop_length (int):
            Number of audio samples between adjacent STFT columns. Defaults to 1024.

        frame_shift_ms (int):
            Set ```hop_length``` based on milliseconds and sampling rate.

        frame_length_ms (int):
            Set ```win_length``` based on milliseconds and sampling rate.

        stft_pad_mode (str):
            Padding method used in STFT. 'reflect' or 'center'. Defaults to 'reflect'.

        sample_rate (int):
            Audio sampling rate. Defaults to 22050.

        resample (bool):
            Enable / Disable resampling audio to ```sample_rate```. Defaults to ```False```.

        preemphasis (float):
            Preemphasis coefficient. Defaults to 0.0.

        ref_level_db (int): 20
            Reference Db level to rebase the audio signal and ignore the level below. 20Db is assumed the sound of air.
            Defaults to 20.

        do_sound_norm (bool):
            Enable / Disable sound normalization to reconcile the volume differences among samples. Defaults to False.

        log_func (str):
            Numpy log function used for amplitude to DB conversion. Defaults to 'np.log10'.

        do_trim_silence (bool):
            Enable / Disable trimming silences at the beginning and the end of the audio clip. Defaults to ```True```.

        do_amp_to_db_linear (bool, optional):
            enable/disable amplitude to dB conversion of linear spectrograms. Defaults to True.

        do_amp_to_db_mel (bool, optional):
            enable/disable amplitude to dB conversion of mel spectrograms. Defaults to True.

        pitch_fmax (float, optional):
            Maximum frequency of the F0 frames. Defaults to ```640```.

        pitch_fmin (float, optional):
            Minimum frequency of the F0 frames. Defaults to ```1```.

        trim_db (int):
            Silence threshold used for silence trimming. Defaults to 45.

        do_rms_norm (bool, optional):
            enable/disable RMS volume normalization when loading an audio file. Defaults to False.

        db_level (int, optional):
            dB level used for rms normalization. The range is -99 to 0. Defaults to None.

        power (float):
            Exponent used for expanding spectrogra levels before running Griffin Lim. It helps to reduce the
            artifacts in the synthesized voice. Defaults to 1.5.

        griffin_lim_iters (int):
            Number of Griffing Lim iterations. Defaults to 60.

        num_mels (int):
            Number of mel-basis frames that defines the frame lengths of each mel-spectrogram frame. Defaults to 80.

        mel_fmin (float): Min frequency level used for the mel-basis filters. ~50 for male and ~95 for female voices.
            It needs to be adjusted for a dataset. Defaults to 0.

        mel_fmax (float):
            Max frequency level used for the mel-basis filters. It needs to be adjusted for a dataset.

        spec_gain (int):
            Gain applied when converting amplitude to DB. Defaults to 20.

        signal_norm (bool):
            enable/disable signal normalization. Defaults to True.

        min_level_db (int):
            minimum db threshold for the computed melspectrograms. Defaults to -100.

        symmetric_norm (bool):
            enable/disable symmetric normalization. If set True normalization is performed in the range [-k, k] else
            [0, k], Defaults to True.

        max_norm (float):
            ```k``` defining the normalization range. Defaults to 4.0.

        clip_norm (bool):
            enable/disable clipping the our of range values in the normalized audio signal. Defaults to True.

        stats_path (str):
            Path to the computed stats file. Defaults to None.
    """

    # stft parameters
    fft_size: int = 1024
    win_length: int = 1024
    hop_length: int = 256
    frame_shift_ms: int = None
    frame_length_ms: int = None
    stft_pad_mode: str = "reflect"
    # audio processing parameters
    sample_rate: int = 22050
    resample: bool = False
    preemphasis: float = 0.0
    ref_level_db: int = 20
    do_sound_norm: bool = False
    log_func: str = "np.log10"
    # silence trimming
    do_trim_silence: bool = True
    trim_db: int = 45
    # rms volume normalization
    do_rms_norm: bool = False
    db_level: float = None
    # griffin-lim params
    power: float = 1.5
    griffin_lim_iters: int = 60
    # mel-spec params
    num_mels: int = 80
    mel_fmin: float = 0.0
    mel_fmax: float = None
    spec_gain: int = 20
    do_amp_to_db_linear: bool = True
    do_amp_to_db_mel: bool = True
    # f0 params
    pitch_fmax: float = 640.0
    pitch_fmin: float = 1.0
    # normalization params
    signal_norm: bool = True
    min_level_db: int = -100
    symmetric_norm: bool = True
    max_norm: float = 4.0
    clip_norm: bool = True
    stats_path: str = None

    def check_values(
        self,
    ):
        """Check config fields"""
        c = asdict(self)
        check_argument("num_mels", c, restricted=True, min_val=10, max_val=2056)
        check_argument("fft_size", c, restricted=True, min_val=128, max_val=4058)
        check_argument("sample_rate", c, restricted=True, min_val=512, max_val=100000)
        check_argument(
            "frame_length_ms",
            c,
            restricted=True,
            min_val=10,
            max_val=1000,
            alternative="win_length",
        )
        check_argument("frame_shift_ms", c, restricted=True, min_val=1, max_val=1000, alternative="hop_length")
        check_argument("preemphasis", c, restricted=True, min_val=0, max_val=1)
        check_argument("min_level_db", c, restricted=True, min_val=-1000, max_val=10)
        check_argument("ref_level_db", c, restricted=True, min_val=0, max_val=1000)
        check_argument("power", c, restricted=True, min_val=1, max_val=5)
        check_argument("griffin_lim_iters", c, restricted=True, min_val=10, max_val=1000)

        # normalization parameters
        check_argument("signal_norm", c, restricted=True)
        check_argument("symmetric_norm", c, restricted=True)
        check_argument("max_norm", c, restricted=True, min_val=0.1, max_val=1000)
        check_argument("clip_norm", c, restricted=True)
        check_argument("mel_fmin", c, restricted=True, min_val=0.0, max_val=1000)
        check_argument("mel_fmax", c, restricted=True, min_val=500.0, allow_none=True)
        check_argument("spec_gain", c, restricted=True, min_val=1, max_val=100)
        check_argument("do_trim_silence", c, restricted=True)
        check_argument("trim_db", c, restricted=True)


@dataclass
class BaseDatasetConfig(Coqpit):
    """Base config for TTS datasets.

    Args:
        formatter (str):
            Formatter name that defines used formatter in ```TTS.tts.datasets.formatter```. Defaults to `""`.

        dataset_name (str):
            Unique name for the dataset. Defaults to `""`.

        path (str):
            Root path to the dataset files. Defaults to `""`.

        meta_file_train (str):
            Name of the dataset meta file. Or a list of speakers to be ignored at training for multi-speaker datasets.
            Defaults to `""`.

        ignored_speakers (List):
            List of speakers IDs that are not used at the training. Default None.

        language (str):
            Language code of the dataset. If defined, it overrides `phoneme_language`. Defaults to `""`.

        phonemizer (str):
            Phonemizer used for that dataset's language. By default it uses `DEF_LANG_TO_PHONEMIZER`. Defaults to `""`.

        meta_file_val (str):
            Name of the dataset meta file that defines the instances used at validation.

        meta_file_attn_mask (str):
            Path to the file that lists the attention mask files used with models that require attention masks to
            train the duration predictor.
    """

    formatter: str = ""
    dataset_name: str = ""
    path: str = ""
    meta_file_train: str = ""
    ignored_speakers: List[str] = None
    language: str = ""
    phonemizer: str = ""
    meta_file_val: str = ""
    meta_file_attn_mask: str = ""

    def check_values(
        self,
    ):
        """Check config fields"""
        c = asdict(self)
        check_argument("formatter", c, restricted=True)
        check_argument("path", c, restricted=True)
        check_argument("meta_file_train", c, restricted=True)
        check_argument("meta_file_val", c, restricted=False)
        check_argument("meta_file_attn_mask", c, restricted=False)


@dataclass
class BaseTrainingConfig(TrainerConfig):
    """Base config to define the basic 🐸TTS training parameters that are shared
    among all the models. It is based on ```Trainer.TrainingConfig```.

    Args:
        model (str):
            Name of the model that is used in the training.

        num_loader_workers (int):
            Number of workers for training time dataloader.

        num_eval_loader_workers (int):
            Number of workers for evaluation time dataloader.
    """

    model: str = None
    # dataloading
    num_loader_workers: int = 0
    num_eval_loader_workers: int = 0
    use_noise_augment: bool = False


================================================
FILE: TTS/encoder/README.md
================================================
### Speaker Encoder

This is an implementation of https://arxiv.org/abs/1710.10467. This model can be used for voice and speaker embedding.

With the code here you can generate d-vectors for both multi-speaker and single-speaker TTS datasets, then visualise and explore them along with the associated audio files in an interactive chart.

Below is an example showing embedding results of various speakers. You can generate the same plot with the provided notebook as demonstrated in [this video](https://youtu.be/KW3oO7JVa7Q).

![](umap.png)

Download a pretrained model from [Released Models](https://github.com/mozilla/TTS/wiki/Released-Models) page.

To run the code, you need to follow the same flow as in TTS.

- Define 'config.json' for your needs. Note that, audio parameters should match your TTS model.
- Example training call ```python speaker_encoder/train.py --config_path speaker_encoder/config.json --data_path ~/Data/Libri-TTS/train-clean-360```
- Generate embedding vectors ```python speaker_encoder/compute_embeddings.py --use_cuda true /model/path/best_model.pth model/config/path/config.json dataset/path/ output_path``` . This code parses all .wav files at the given dataset path and generates the same folder structure under the output path with the generated embedding files.
- Watch training on Tensorboard as in TTS


================================================
FILE: TTS/encoder/__init__.py
================================================


================================================
FILE: TTS/encoder/configs/base_encoder_config.py
================================================
from dataclasses import asdict, dataclass, field
from typing import Dict, List

from coqpit import MISSING

from TTS.config.shared_configs import BaseAudioConfig, BaseDatasetConfig, BaseTrainingConfig


@dataclass
class BaseEncoderConfig(BaseTrainingConfig):
    """Defines parameters for a Generic Encoder model."""

    model: str = None
    audio: BaseAudioConfig = field(default_factory=BaseAudioConfig)
    datasets: List[BaseDatasetConfig] = field(default_factory=lambda: [BaseDatasetConfig()])
    # model params
    model_params: Dict = field(
        default_factory=lambda: {
            "model_name": "lstm",
            "input_dim": 80,
            "proj_dim": 256,
            "lstm_dim": 768,
            "num_lstm_layers": 3,
            "use_lstm_with_projection": True,
        }
    )

    audio_augmentation: Dict = field(default_factory=lambda: {})

    # training params
    epochs: int = 10000
    loss: str = "angleproto"
    grad_clip: float = 3.0
    lr: float = 0.0001
    optimizer: str = "radam"
    optimizer_params: Dict = field(default_factory=lambda: {"betas": [0.9, 0.999], "weight_decay": 0})
    lr_decay: bool = False
    warmup_steps: int = 4000

    # logging params
    tb_model_param_stats: bool = False
    steps_plot_stats: int = 10
    save_step: int = 1000
    print_step: int = 20
    run_eval: bool = False

    # data loader
    num_classes_in_batch: int = MISSING
    num_utter_per_class: int = MISSING
    eval_num_classes_in_batch: int = None
    eval_num_utter_per_class: int = None

    num_loader_workers: int = MISSING
    voice_len: float = 1.6

    def check_values(self):
        super().check_values()
        c = asdict(self)
        assert (
            c["model_params"]["input_dim"] == self.audio.num_mels
        ), " [!] model input dimendion must be equal to melspectrogram dimension."


================================================
FILE: TTS/encoder/configs/emotion_encoder_config.py
================================================
from dataclasses import asdict, dataclass

from TTS.encoder.configs.base_encoder_config import BaseEncoderConfig


@dataclass
class EmotionEncoderConfig(BaseEncoderConfig):
    """Defines parameters for Emotion Encoder model."""

    model: str = "emotion_encoder"
    map_classid_to_classname: dict = None
    class_name_key: str = "emotion_name"


================================================
FILE: TTS/encoder/configs/speaker_encoder_config.py
================================================
from dataclasses import asdict, dataclass

from TTS.encoder.configs.base_encoder_config import BaseEncoderConfig


@dataclass
class SpeakerEncoderConfig(BaseEncoderConfig):
    """Defines parameters for Speaker Encoder model."""

    model: str = "speaker_encoder"
    class_name_key: str = "speaker_name"


================================================
FILE: TTS/encoder/dataset.py
================================================
import random

import torch
from torch.utils.data import Dataset

from TTS.encoder.utils.generic_utils import AugmentWAV


class EncoderDataset(Dataset):
    def __init__(
        self,
        config,
        ap,
        meta_data,
        voice_len=1.6,
        num_classes_in_batch=64,
        num_utter_per_class=10,
        verbose=False,
        augmentation_config=None,
        use_torch_spec=None,
    ):
        """
        Args:
            ap (TTS.tts.utils.AudioProcessor): audio processor object.
            meta_data (list): list of dataset instances.
            seq_len (int): voice segment length in seconds.
            verbose (bool): print diagnostic information.
        """
        super().__init__()
        self.config = config
        self.items = meta_data
        self.sample_rate = ap.sample_rate
        self.seq_len = int(voice_len * self.sample_rate)
        self.num_utter_per_class = num_utter_per_class
        self.ap = ap
        self.verbose = verbose
        self.use_torch_spec = use_torch_spec
        self.classes, self.items = self.__parse_items()

        self.classname_to_classid = {key: i for i, key in enumerate(self.classes)}

        # Data Augmentation
        self.augmentator = None
        self.gaussian_augmentation_config = None
        if augmentation_config:
            self.data_augmentation_p = augmentation_config["p"]
            if self.data_augmentation_p and ("additive" in augmentation_config or "rir" in augmentation_config):
                self.augmentator = AugmentWAV(ap, augmentation_config)

            if "gaussian" in augmentation_config.keys():
                self.gaussian_augmentation_config = augmentation_config["gaussian"]

        if self.verbose:
            print("\n > DataLoader initialization")
            print(f" | > Classes per Batch: {num_classes_in_batch}")
            print(f" | > Number of instances : {len(self.items)}")
            print(f" | > Sequence length: {self.seq_len}")
            print(f" | > Num Classes: {len(self.classes)}")
            print(f" | > Classes: {self.classes}")

    def load_wav(self, filename):
        audio = self.ap.load_wav(filename, sr=self.ap.sample_rate)
        return audio

    def __parse_items(self):
        class_to_utters = {}
        for item in self.items:
            path_ = item["audio_file"]
            class_name = item[self.config.class_name_key]
            if class_name in class_to_utters.keys():
                class_to_utters[class_name].append(path_)
            else:
                class_to_utters[class_name] = [
                    path_,
                ]

        # skip classes with number of samples >= self.num_utter_per_class
        class_to_utters = {k: v for (k, v) in class_to_utters.items() if len(v) >= self.num_utter_per_class}

        classes = list(class_to_utters.keys())
        classes.sort()

        new_items = []
        for item in self.items:
            path_ = item["audio_file"]
            class_name = item["emotion_name"] if self.config.model == "emotion_encoder" else item["speaker_name"]
            # ignore filtered classes
            if class_name not in classes:
                continue
            # ignore small audios
            if self.load_wav(path_).shape[0] - self.seq_len <= 0:
                continue

            new_items.append({"wav_file_path": path_, "class_name": class_name})

        return classes, new_items

    def __len__(self):
        return len(self.items)

    def get_num_classes(self):
        return len(self.classes)

    def get_class_list(self):
        return self.classes

    def set_classes(self, classes):
        self.classes = classes
        self.classname_to_classid = {key: i for i, key in enumerate(self.classes)}

    def get_map_classid_to_classname(self):
        return dict((c_id, c_n) for c_n, c_id in self.classname_to_classid.items())

    def __getitem__(self, idx):
        return self.items[idx]

    def collate_fn(self, batch):
        # get the batch class_ids
        labels = []
        feats = []
        for item in batch:
            utter_path = item["wav_file_path"]
            class_name = item["class_name"]

            # get classid
            class_id = self.classname_to_classid[class_name]
            # load wav file
            wav = self.load_wav(utter_path)
            offset = random.randint(0, wav.shape[0] - self.seq_len)
            wav = wav[offset : offset + self.seq_len]

            if self.augmentator is not None and self.data_augmentation_p:
                if random.random() < self.data_augmentation_p:
                    wav = self.augmentator.apply_one(wav)

            if not self.use_torch_spec:
                mel = self.ap.melspectrogram(wav)
                feats.append(torch.FloatTensor(mel))
            else:
                feats.append(torch.FloatTensor(wav))

            labels.append(class_id)

        feats = torch.stack(feats)
        labels = torch.LongTensor(labels)

        return feats, labels


================================================
FILE: TTS/encoder/losses.py
================================================
import torch
import torch.nn.functional as F
from torch import nn


# adapted from https://github.com/cvqluu/GE2E-Loss
class GE2ELoss(nn.Module):
    def __init__(self, init_w=10.0, init_b=-5.0, loss_method="softmax"):
        """
        Implementation of the Generalized End-to-End loss defined in https://arxiv.org/abs/1710.10467 [1]
        Accepts an input of size (N, M, D)
            where N is the number of speakers in the batch,
            M is the number of utterances per speaker,
            and D is the dimensionality of the embedding vector (e.g. d-vector)
        Args:
            - init_w (float): defines the initial value of w in Equation (5) of [1]
            - init_b (float): definies the initial value of b in Equation (5) of [1]
        """
        super().__init__()
        # pylint: disable=E1102
        self.w = nn.Parameter(torch.tensor(init_w))
        # pylint: disable=E1102
        self.b = nn.Parameter(torch.tensor(init_b))
        self.loss_method = loss_method

        print(" > Initialized Generalized End-to-End loss")

        assert self.loss_method in ["softmax", "contrast"]

        if self.loss_method == "softmax":
            self.embed_loss = self.embed_loss_softmax
        if self.loss_method == "contrast":
            self.embed_loss = self.embed_loss_contrast

    # pylint: disable=R0201
    def calc_new_centroids(self, dvecs, centroids, spkr, utt):
        """
        Calculates the new centroids excluding the reference utterance
        """
        excl = torch.cat((dvecs[spkr, :utt], dvecs[spkr, utt + 1 :]))
        excl = torch.mean(excl, 0)
        new_centroids = []
        for i, centroid in enumerate(centroids):
            if i == spkr:
                new_centroids.append(excl)
            else:
                new_centroids.append(centroid)
        return torch.stack(new_centroids)

    def calc_cosine_sim(self, dvecs, centroids):
        """
        Make the cosine similarity matrix with dims (N,M,N)
        """
        cos_sim_matrix = []
        for spkr_idx, speaker in enumerate(dvecs):
            cs_row = []
            for utt_idx, utterance in enumerate(speaker):
                new_centroids = self.calc_new_centroids(dvecs, centroids, spkr_idx, utt_idx)
                # vector based cosine similarity for speed
                cs_row.append(
                    torch.clamp(
                        torch.mm(
                            utterance.unsqueeze(1).transpose(0, 1),
                            new_centroids.transpose(0, 1),
                        )
                        / (torch.norm(utterance) * torch.norm(new_centroids, dim=1)),
                        1e-6,
                    )
                )
            cs_row = torch.cat(cs_row, dim=0)
            cos_sim_matrix.append(cs_row)
        return torch.stack(cos_sim_matrix)

    # pylint: disable=R0201
    def embed_loss_softmax(self, dvecs, cos_sim_matrix):
        """
        Calculates the loss on each embedding $L(e_{ji})$ by taking softmax
        """
        N, M, _ = dvecs.shape
        L = []
        for j in range(N):
            L_row = []
            for i in range(M):
                L_row.append(-F.log_softmax(cos_sim_matrix[j, i], 0)[j])
            L_row = torch.stack(L_row)
            L.append(L_row)
        return torch.stack(L)

    # pylint: disable=R0201
    def embed_loss_contrast(self, dvecs, cos_sim_matrix):
        """
        Calculates the loss on each embedding $L(e_{ji})$ by contrast loss with closest centroid
        """
        N, M, _ = dvecs.shape
        L = []
        for j in range(N):
            L_row = []
            for i in range(M):
                centroids_sigmoids = torch.sigmoid(cos_sim_matrix[j, i])
                excl_centroids_sigmoids = torch.cat((centroids_sigmoids[:j], centroids_sigmoids[j + 1 :]))
                L_row.append(1.0 - torch.sigmoid(cos_sim_matrix[j, i, j]) + torch.max(excl_centroids_sigmoids))
            L_row = torch.stack(L_row)
            L.append(L_row)
        return torch.stack(L)

    def forward(self, x, _label=None):
        """
        Calculates the GE2E loss for an input of dimensions (num_speakers, num_utts_per_speaker, dvec_feats)
        """

        assert x.size()[1] >= 2

        centroids = torch.mean(x, 1)
        cos_sim_matrix = self.calc_cosine_sim(x, centroids)
        torch.clamp(self.w, 1e-6)
        cos_sim_matrix = self.w * cos_sim_matrix + self.b
        L = self.embed_loss(x, cos_sim_matrix)
        return L.mean()


# adapted from https://github.com/clovaai/voxceleb_trainer/blob/master/loss/angleproto.py
class AngleProtoLoss(nn.Module):
    """
    Implementation of the Angular Prototypical loss defined in https://arxiv.org/abs/2003.11982
        Accepts an input of size (N, M, D)
            where N is the number of speakers in the batch,
            M is the number of utterances per speaker,
            and D is the dimensionality of the embedding vector
        Args:
            - init_w (float): defines the initial value of w
            - init_b (float): definies the initial value of b
    """

    def __init__(self, init_w=10.0, init_b=-5.0):
        super().__init__()
        # pylint: disable=E1102
        self.w = nn.Parameter(torch.tensor(init_w))
        # pylint: disable=E1102
        self.b = nn.Parameter(torch.tensor(init_b))
        self.criterion = torch.nn.CrossEntropyLoss()

        print(" > Initialized Angular Prototypical loss")

    def forward(self, x, _label=None):
        """
        Calculates the AngleProto loss for an input of dimensions (num_speakers, num_utts_per_speaker, dvec_feats)
        """

        assert x.size()[1] >= 2

        out_anchor = torch.mean(x[:, 1:, :], 1)
        out_positive = x[:, 0, :]
        num_speakers = out_anchor.size()[0]

        cos_sim_matrix = F.cosine_similarity(
            out_positive.unsqueeze(-1).expand(-1, -1, num_speakers),
            out_anchor.unsqueeze(-1).expand(-1, -1, num_speakers).transpose(0, 2),
        )
        torch.clamp(self.w, 1e-6)
        cos_sim_matrix = cos_sim_matrix * self.w + self.b
        label = torch.arange(num_speakers).to(cos_sim_matrix.device)
        L = self.criterion(cos_sim_matrix, label)
        return L


class SoftmaxLoss(nn.Module):
    """
    Implementation of the Softmax loss as defined in https://arxiv.org/abs/2003.11982
        Args:
            - embedding_dim (float): speaker embedding dim
            - n_speakers (float): number of speakers
    """

    def __init__(self, embedding_dim, n_speakers):
        super().__init__()

        self.criterion = torch.nn.CrossEntropyLoss()
        self.fc = nn.Linear(embedding_dim, n_speakers)

        print("Initialised Softmax Loss")

    def forward(self, x, label=None):
        # reshape for compatibility
        x = x.reshape(-1, x.size()[-1])
        label = label.reshape(-1)

        x = self.fc(x)
        L = self.criterion(x, label)

        return L

    def inference(self, embedding):
        x = self.fc(embedding)
        activations = torch.nn.functional.softmax(x, dim=1).squeeze(0)
        class_id = torch.argmax(activations)
        return class_id


class SoftmaxAngleProtoLoss(nn.Module):
    """
    Implementation of the Softmax AnglePrototypical loss as defined in https://arxiv.org/abs/2009.14153
        Args:
            - embedding_dim (float): speaker embedding dim
            - n_speakers (float): number of speakers
            - init_w (float): defines the initial value of w
            - init_b (float): definies the initial value of b
    """

    def __init__(self, embedding_dim, n_speakers, init_w=10.0, init_b=-5.0):
        super().__init__()

        self.softmax = SoftmaxLoss(embedding_dim, n_speakers)
        self.angleproto = AngleProtoLoss(init_w, init_b)

        print("Initialised SoftmaxAnglePrototypical Loss")

    def forward(self, x, label=None):
        """
        Calculates the SoftmaxAnglePrototypical loss for an input of dimensions (num_speakers, num_utts_per_speaker, dvec_feats)
        """

        Lp = self.angleproto(x)

        Ls = self.softmax(x, label)

        return Ls + Lp


================================================
FILE: TTS/encoder/models/base_encoder.py
================================================
import numpy as np
import torch
import torchaudio
from coqpit import Coqpit
from torch import nn

from TTS.encoder.losses import AngleProtoLoss, GE2ELoss, SoftmaxAngleProtoLoss
from TTS.utils.generic_utils import set_init_dict
from TTS.utils.io import load_fsspec


class PreEmphasis(nn.Module):
    def __init__(self, coefficient=0.97):
        super().__init__()
        self.coefficient = coefficient
        self.register_buffer("filter", torch.FloatTensor([-self.coefficient, 1.0]).unsqueeze(0).unsqueeze(0))

    def forward(self, x):
        assert len(x.size()) == 2

        x = torch.nn.functional.pad(x.unsqueeze(1), (1, 0), "reflect")
        return torch.nn.functional.conv1d(x, self.filter).squeeze(1)


class BaseEncoder(nn.Module):
    """Base `encoder` class. Every new `encoder` model must inherit this.

    It defines common `encoder` specific functions.
    """

    # pylint: disable=W0102
    def __init__(self):
        super(BaseEncoder, self).__init__()

    def get_torch_mel_spectrogram_class(self, audio_config):
        return torch.nn.Sequential(
            PreEmphasis(audio_config["preemphasis"]),
            # TorchSTFT(
            #     n_fft=audio_config["fft_size"],
            #     hop_length=audio_config["hop_length"],
            #     win_length=audio_config["win_length"],
            #     sample_rate=audio_config["sample_rate"],
            #     window="hamming_window",
            #     mel_fmin=0.0,
            #     mel_fmax=None,
            #     use_htk=True,
            #     do_amp_to_db=False,
            #     n_mels=audio_config["num_mels"],
            #     power=2.0,
            #     use_mel=True,
            #     mel_norm=None,
            # )
            torchaudio.transforms.MelSpectrogram(
                sample_rate=audio_config["sample_rate"],
                n_fft=audio_config["fft_size"],
                win_length=audio_config["win_length"],
                hop_length=audio_config["hop_length"],
                window_fn=torch.hamming_window,
                n_mels=audio_config["num_mels"],
            ),
        )

    @torch.no_grad()
    def inference(self, x, l2_norm=True):
        return self.forward(x, l2_norm)

    @torch.no_grad()
    def compute_embedding(self, x, num_frames=250, num_eval=10, return_mean=True, l2_norm=True):
        """
        Generate embeddings for a batch of utterances
        x: 1xTxD
        """
        # map to the waveform size
        if self.use_torch_spec:
            num_frames = num_frames * self.audio_config["hop_length"]

        max_len = x.shape[1]

        if max_len < num_frames:
            num_frames = max_len

        offsets = np.linspace(0, max_len - num_frames, num=num_eval)

        frames_batch = []
        for offset in offsets:
            offset = int(offset)
            end_offset = int(offset + num_frames)
            frames = x[:, offset:end_offset]
            frames_batch.append(frames)

        frames_batch = torch.cat(frames_batch, dim=0)
        embeddings = self.inference(frames_batch, l2_norm=l2_norm)

        if return_mean:
            embeddings = torch.mean(embeddings, dim=0, keepdim=True)
        return embeddings

    def get_criterion(self, c: Coqpit, num_classes=None):
        if c.loss == "ge2e":
            criterion = GE2ELoss(loss_method="softmax")
        elif c.loss == "angleproto":
            criterion = AngleProtoLoss()
        elif c.loss == "softmaxproto":
            criterion = SoftmaxAngleProtoLoss(c.model_params["proj_dim"], num_classes)
        else:
            raise Exception("The %s  not is a loss supported" % c.loss)
        return criterion

    def load_checkpoint(
        self,
        config: Coqpit,
        checkpoint_path: str,
        eval: bool = False,
        use_cuda: bool = False,
        criterion=None,
        cache=False,
    ):
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
        try:
            self.load_state_dict(state["model"])
            print(" > Model fully restored. ")
        except (KeyError, RuntimeError) as error:
            # If eval raise the error
            if eval:
                raise error

            print(" > Partial model initialization.")
            model_dict = self.state_dict()
            model_dict = set_init_dict(model_dict, state["model"], c)
            self.load_state_dict(model_dict)
            del model_dict

        # load the criterion for restore_path
        if criterion is not None and "criterion" in state:
            try:
                criterion.load_state_dict(state["criterion"])
            except (KeyError, RuntimeError) as error:
                print(" > Criterion load ignored because of:", error)

        # instance and load the criterion for the encoder classifier in inference time
        if (
            eval
            and criterion is None
            and "criterion" in state
            and getattr(config, "map_classid_to_classname", None) is not None
        ):
            criterion = self.get_criterion(config, len(config.map_classid_to_classname))
            criterion.load_state_dict(state["criterion"])

        if use_cuda:
            self.cuda()
            if criterion is not None:
                criterion = criterion.cuda()

        if eval:
            self.eval()
            assert not self.training

        if not eval:
            return criterion, state["step"]
        return criterion


================================================
FILE: TTS/encoder/models/lstm.py
================================================
import torch
from torch import nn

from TTS.encoder.models.base_encoder import BaseEncoder


class LSTMWithProjection(nn.Module):
    def __init__(self, input_size, hidden_size, proj_size):
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.proj_size = proj_size
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
        self.linear = nn.Linear(hidden_size, proj_size, bias=False)

    def forward(self, x):
        self.lstm.flatten_parameters()
        o, (_, _) = self.lstm(x)
        return self.linear(o)


class LSTMWithoutProjection(nn.Module):
    def __init__(self, input_dim, lstm_dim, proj_dim, num_lstm_layers):
        super().__init__()
        self.lstm = nn.LSTM(input_size=input_dim, hidden_size=lstm_dim, num_layers=num_lstm_layers, batch_first=True)
        self.linear = nn.Linear(lstm_dim, proj_dim, bias=True)
        self.relu = nn.ReLU()

    def forward(self, x):
        _, (hidden, _) = self.lstm(x)
        return self.relu(self.linear(hidden[-1]))


class LSTMSpeakerEncoder(BaseEncoder):
    def __init__(
        self,
        input_dim,
        proj_dim=256,
        lstm_dim=768,
        num_lstm_layers=3,
        use_lstm_with_projection=True,
        use_torch_spec=False,
        audio_config=None,
    ):
        super().__init__()
        self.use_lstm_with_projection = use_lstm_with_projection
        self.use_torch_spec = use_torch_spec
        self.audio_config = audio_config
        self.proj_dim = proj_dim

        layers = []
        # choise LSTM layer
        if use_lstm_with_projection:
            layers.append(LSTMWithProjection(input_dim, lstm_dim, proj_dim))
            for _ in range(num_lstm_layers - 1):
                layers.append(LSTMWithProjection(proj_dim, lstm_dim, proj_dim))
            self.layers = nn.Sequential(*layers)
        else:
            self.layers = LSTMWithoutProjection(input_dim, lstm_dim, proj_dim, num_lstm_layers)

        self.instancenorm = nn.InstanceNorm1d(input_dim)

        if self.use_torch_spec:
            self.torch_spec = self.get_torch_mel_spectrogram_class(audio_config)
        else:
            self.torch_spec = None

        self._init_layers()

    def _init_layers(self):
        for name, param in self.layers.named_parameters():
            if "bias" in name:
                nn.init.constant_(param, 0.0)
            elif "weight" in name:
                nn.init.xavier_normal_(param)

    def forward(self, x, l2_norm=True):
        """Forward pass of the model.

        Args:
            x (Tensor): Raw waveform signal or spectrogram frames. If input is a waveform, `torch_spec` must be `True`
                to compute the spectrogram on-the-fly.
            l2_norm (bool): Whether to L2-normalize the outputs.

        Shapes:
            - x: :math:`(N, 1, T_{in})` or :math:`(N, D_{spec}, T_{in})`
        """
        with torch.no_grad():
            with torch.cuda.amp.autocast(enabled=False):
                if self.use_torch_spec:
                    x.squeeze_(1)
                    x = self.torch_spec(x)
                x = self.instancenorm(x).transpose(1, 2)
        d = self.layers(x)
        if self.use_lstm_with_projection:
            d = d[:, -1]
        if l2_norm:
            d = torch.nn.functional.normalize(d, p=2, dim=1)
        return d


================================================
FILE: TTS/encoder/models/resnet.py
================================================
import torch
from torch import nn

# from TTS.utils.audio.torch_transforms import TorchSTFT
from TTS.encoder.models.base_encoder import BaseEncoder


class SELayer(nn.Module):
    def __init__(self, channel, reduction=8):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel),
            nn.Sigmoid(),
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y


class SEBasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None, reduction=8):
        super(SEBasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.se = SELayer(planes, reduction)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.relu(out)
        out = self.bn1(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.se(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)
        return out


class ResNetSpeakerEncoder(BaseEncoder):
    """Implementation of the model H/ASP without batch normalization in speaker embedding. This model was proposed in: https://arxiv.org/abs/2009.14153
    Adapted from: https://github.com/clovaai/voxceleb_trainer
    """

    # pylint: disable=W0102
    def __init__(
        self,
        input_dim=64,
        proj_dim=512,
        layers=[3, 4, 6, 3],
        num_filters=[32, 64, 128, 256],
        encoder_type="ASP",
        log_input=False,
        use_torch_spec=False,
        audio_config=None,
    ):
        super(ResNetSpeakerEncoder, self).__init__()

        self.encoder_type = encoder_type
        self.input_dim = input_dim
        self.log_input = log_input
        self.use_torch_spec = use_torch_spec
        self.audio_config = audio_config
        self.proj_dim = proj_dim

        self.conv1 = nn.Conv2d(1, num_filters[0], kernel_size=3, stride=1, padding=1)
        self.relu = nn.ReLU(inplace=True)
        self.bn1 = nn.BatchNorm2d(num_filters[0])

        self.inplanes = num_filters[0]
        self.layer1 = self.create_layer(SEBasicBlock, num_filters[0], layers[0])
        self.layer2 = self.create_layer(SEBasicBlock, num_filters[1], layers[1], stride=(2, 2))
        self.layer3 = self.create_layer(SEBasicBlock, num_filters[2], layers[2], stride=(2, 2))
        self.layer4 = self.create_layer(SEBasicBlock, num_filters[3], layers[3], stride=(2, 2))

        self.instancenorm = nn.InstanceNorm1d(input_dim)

        if self.use_torch_spec:
            self.torch_spec = self.get_torch_mel_spectrogram_class(audio_config)
        else:
            self.torch_spec = None

        outmap_size = int(self.input_dim / 8)

        self.attention = nn.Sequential(
            nn.Conv1d(num_filters[3] * outmap_size, 128, kernel_size=1),
            nn.ReLU(),
            nn.BatchNorm1d(128),
            nn.Conv1d(128, num_filters[3] * outmap_size, kernel_size=1),
            nn.Softmax(dim=2),
        )

        if self.encoder_type == "SAP":
            out_dim = num_filters[3] * outmap_size
        elif self.encoder_type == "ASP":
            out_dim = num_filters[3] * outmap_size * 2
        else:
            raise ValueError("Undefined encoder")

        self.fc = nn.Linear(out_dim, proj_dim)

        self._init_layers()

    def _init_layers(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

    def create_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    # pylint: disable=R0201
    def new_parameter(self, *size):
        out = nn.Parameter(torch.FloatTensor(*size))
        nn.init.xavier_normal_(out)
        return out

    def forward(self, x, l2_norm=False):
        """Forward pass of the model.

        Args:
            x (Tensor): Raw waveform signal or spectrogram frames. If input is a waveform, `torch_spec` must be `True`
                to compute the spectrogram on-the-fly.
            l2_norm (bool): Whether to L2-normalize the outputs.

        Shapes:
            - x: :math:`(N, 1, T_{in})` or :math:`(N, D_{spec}, T_{in})`
        """
        x.squeeze_(1)
        # if you torch spec compute it otherwise use the mel spec computed by the AP
        if self.use_torch_spec:
            x = self.torch_spec(x)

        if self.log_input:
            x = (x + 1e-6).log()
        x = self.instancenorm(x).unsqueeze(1)

        x = self.conv1(x)
        x = self.relu(x)
        x = self.bn1(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = x.reshape(x.size()[0], -1, x.size()[-1])

        w = self.attention(x)

        if self.encoder_type == "SAP":
            x = torch.sum(x * w, dim=2)
        elif self.encoder_type == "ASP":
            mu = torch.sum(x * w, dim=2)
            sg = torch.sqrt((torch.sum((x**2) * w, dim=2) - mu**2).clamp(min=1e-5))
            x = torch.cat((mu, sg), 1)

        x = x.view(x.size()[0], -1)
        x = self.fc(x)

        if l2_norm:
            x = torch.nn.functional.normalize(x, p=2, dim=1)
        return x


================================================
FILE: TTS/encoder/requirements.txt
================================================
umap-learn
numpy>=1.17.0


================================================
FILE: TTS/encoder/utils/__init__.py
================================================


================================================
FILE: TTS/encoder/utils/generic_utils.py
================================================
import datetime
import glob
import os
import random
import re

import numpy as np
from scipy import signal

from TTS.encoder.models.lstm import LSTMSpeakerEncoder
from TTS.encoder.models.resnet import ResNetSpeakerEncoder
from TTS.utils.io import save_fsspec


class AugmentWAV(object):
    def __init__(self, ap, augmentation_config):
        self.ap = ap
        self.use_additive_noise = False

        if "additive" in augmentation_config.keys():
            self.additive_noise_config = augmentation_config["additive"]
            additive_path = self.additive_noise_config["sounds_path"]
            if additive_path:
                self.use_additive_noise = True
                # get noise types
                self.additive_noise_types = []
                for key in self.additive_noise_config.keys():
                    if isinstance(self.additive_noise_config[key], dict):
                        self.additive_noise_types.append(key)

                additive_files = glob.glob(os.path.join(additive_path, "**/*.wav"), recursive=True)

                self.noise_list = {}

                for wav_file in additive_files:
                    noise_dir = wav_file.replace(additive_path, "").split(os.sep)[0]
                    # ignore not listed directories
                    if noise_dir not in self.additive_noise_types:
                        continue
                    if not noise_dir in self.noise_list:
                        self.noise_list[noise_dir] = []
                    self.noise_list[noise_dir].append(wav_file)

                print(
                    f" | > Using Additive Noise Augmentation: with {len(additive_files)} audios instances from {self.additive_noise_types}"
                )

        self.use_rir = False

        if "rir" in augmentation_config.keys():
            self.rir_config = augmentation_config["rir"]
            if self.rir_config["rir_path"]:
                self.rir_files = glob.glob(os.path.join(self.rir_config["rir_path"], "**/*.wav"), recursive=True)
                self.use_rir = True

            print(f" | > Using RIR Noise Augmentation: with {len(self.rir_files)} audios instances")

        self.create_augmentation_global_list()

    def create_augmentation_global_list(self):
        if self.use_additive_noise:
            self.global_noise_list = self.additive_noise_types
        else:
            self.global_noise_list = []
        if self.use_rir:
            self.global_noise_list.append("RIR_AUG")

    def additive_noise(self, noise_type, audio):
        clean_db = 10 * np.log10(np.mean(audio**2) + 1e-4)

        noise_list = random.sample(
            self.noise_list[noise_type],
            random.randint(
                self.additive_noise_config[noise_type]["min_num_noises"],
                self.additive_noise_config[noise_type]["max_num_noises"],
            ),
        )

        audio_len = audio.shape[0]
        noises_wav = None
        for noise in noise_list:
            noiseaudio = self.ap.load_wav(noise, sr=self.ap.sample_rate)[:audio_len]

            if noiseaudio.shape[0] < audio_len:
                continue

            noise_snr = random.uniform(
                self.additive_noise_config[noise_type]["min_snr_in_db"],
                self.additive_noise_config[noise_type]["max_num_noises"],
            )
            noise_db = 10 * np.log10(np.mean(noiseaudio**2) + 1e-4)
            noise_wav = np.sqrt(10 ** ((clean_db - noise_db - noise_snr) / 10)) * noiseaudio

            if noises_wav is None:
                noises_wav = noise_wav
            else:
                noises_wav += noise_wav

        # if all possible files is less than audio, choose other files
        if noises_wav is None:
            return self.additive_noise(noise_type, audio)

        return audio + noises_wav

    def reverberate(self, audio):
        audio_len = audio.shape[0]

        rir_file = random.choice(self.rir_files)
        rir = self.ap.load_wav(rir_file, sr=self.ap.sample_rate)
        rir = rir / np.sqrt(np.sum(rir**2))
        return signal.convolve(audio, rir, mode=self.rir_config["conv_mode"])[:audio_len]

    def apply_one(self, audio):
        noise_type = random.choice(self.global_noise_list)
        if noise_type == "RIR_AUG":
            return self.reverberate(audio)

        return self.additive_noise(noise_type, audio)


def to_camel(text):
    text = text.capitalize()
    return re.sub(r"(?!^)_([a-zA-Z])", lambda m: m.group(1).upper(), text)


def setup_encoder_model(config: "Coqpit"):
    if config.model_params["model_name"].lower() == "lstm":
        model = LSTMSpeakerEncoder(
            config.model_params["input_dim"],
            config.model_params["proj_dim"],
            config.model_params["lstm_dim"],
            config.model_params["num_lstm_layers"],
            use_torch_spec=config.model_params.get("use_torch_spec", False),
            audio_config=config.audio,
        )
    elif config.model_params["model_name"].lower() == "resnet":
        model = ResNetSpeakerEncoder(
            input_dim=config.model_params["input_dim"],
            proj_dim=config.model_params["proj_dim"],
            log_input=config.model_params.get("log_input", False),
            use_torch_spec=config.model_params.get("use_torch_spec", False),
            audio_config=config.audio,
        )
    return model


def save_checkpoint(model, optimizer, criterion, model_loss, out_path, current_step, epoch):
    checkpoint_path = "checkpoint_{}.pth".format(current_step)
    checkpoint_path = os.path.join(out_path, checkpoint_path)
    print(" | | > Checkpoint saving : {}".format(checkpoint_path))

    new_state_dict = model.state_dict()
    state = {
        "model": new_state_dict,
        "optimizer": optimizer.state_dict() if optimizer is not None else None,
        "criterion": criterion.state_dict(),
        "step": current_step,
        "epoch": epoch,
        "loss": model_loss,
        "date": datetime.date.today().strftime("%B %d, %Y"),
    }
    save_fsspec(state, checkpoint_path)


def save_best_model(model, optimizer, criterion, model_loss, best_loss, out_path, current_step, epoch):
    if model_loss < best_loss:
        new_state_dict = model.state_dict()
        state = {
            "model": new_state_dict,
            "optimizer": optimizer.state_dict(),
            "criterion": criterion.state_dict(),
            "step": current_step,
            "epoch": epoch,
            "loss": model_loss,
            "date": datetime.date.today().strftime("%B %d, %Y"),
        }
        best_loss = model_loss
        bestmodel_path = "best_model.pth"
        bestmodel_path = os.path.join(out_path, bestmodel_path)
        print("\n > BEST MODEL ({0:.5f}) : {1:}".format(model_loss, bestmodel_path))
        save_fsspec(state, bestmodel_path)
    return best_loss


================================================
FILE: TTS/encoder/utils/io.py
================================================
import datetime
import os

from TTS.utils.io import save_fsspec


def save_checkpoint(model, optimizer, model_loss, out_path, current_step):
    checkpoint_path = "checkpoint_{}.pth".format(current_step)
    checkpoint_path = os.path.join(out_path, checkpoint_path)
    print(" | | > Checkpoint saving : {}".format(checkpoint_path))

    new_state_dict = model.state_dict()
    state = {
        "model": new_state_dict,
        "optimizer": optimizer.state_dict() if optimizer is not None else None,
        "step": current_step,
        "loss": model_loss,
        "date": datetime.date.today().strftime("%B %d, %Y"),
    }
    save_fsspec(state, checkpoint_path)


def save_best_model(model, optimizer, model_loss, best_loss, out_path, current_step):
    if model_loss < best_loss:
        new_state_dict = model.state_dict()
        state = {
            "model": new_state_dict,
            "optimizer": optimizer.state_dict(),
            "step": current_step,
            "loss": model_loss,
            "date": datetime.date.today().strftime("%B %d, %Y"),
        }
        best_loss = model_loss
        bestmodel_path = "best_model.pth"
        bestmodel_path = os.path.join(out_path, bestmodel_path)
        print("\n > BEST MODEL ({0:.5f}) : {1:}".format(model_loss, bestmodel_path))
        save_fsspec(state, bestmodel_path)
    return best_loss


================================================
FILE: TTS/encoder/utils/prepare_voxceleb.py
================================================
# coding=utf-8
# Copyright (C) 2020 ATHENA AUTHORS; Yiping Peng; Ne Luo
# All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
# Only support eager mode and TF>=2.0.0
# pylint: disable=no-member, invalid-name, relative-beyond-top-level
# pylint: disable=too-many-locals, too-many-statements, too-many-arguments, too-many-instance-attributes
""" voxceleb 1 & 2 """

import hashlib
import os
import subprocess
import sys
import zipfile

import pandas
import soundfile as sf
from absl import logging

SUBSETS = {
    "vox1_dev_wav": [
        "https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partaa",
        "https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partab",
        "https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partac",
        "https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_dev_wav_partad",
    ],
    "vox1_test_wav": ["https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_test_wav.zip"],
    "vox2_dev_aac": [
        "https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox2_dev_aac_partaa",
        "https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox2_dev_aac_partab",
        "https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox2_dev_aac_partac",
        "https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox2_dev_aac_partad",
        "https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox2_dev_aac_partae",
        "https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox2_dev_aac_partaf",
        "https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox2_dev_aac_partag",
        "https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox2_dev_aac_partah",
    ],
    "vox2_test_aac": ["https://thor.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox2_test_aac.zip"],
}

MD5SUM = {
    "vox1_dev_wav": "ae63e55b951748cc486645f532ba230b",
    "vox2_dev_aac": "bbc063c46078a602ca71605645c2a402",
    "vox1_test_wav": "185fdc63c3c739954633d50379a3d102",
    "vox2_test_aac": "0d2b3ea430a821c33263b5ea37ede312",
}

USER = {"user": "", "password": ""}

speaker_id_dict = {}


def download_and_extract(directory, subset, urls):
    """Download and extract the given split of dataset.

    Args:
        directory: the directory where to put the downloaded data.
        subset: subset name of the corpus.
        urls: the list of urls to download the data file.
    """
    os.makedirs(directory, exist_ok=True)

    try:
        for url in urls:
            zip_filepath = os.path.join(directory, url.split("/")[-1])
            if os.path.exists(zip_filepath):
                continue
            logging.info("Downloading %s to %s" % (url, zip_filepath))
            subprocess.call(
                "wget %s --user %s --password %s -O %s" % (url, USER["user"], USER["password"], zip_filepath),
                shell=True,
            )

            statinfo = os.stat(zip_filepath)
            logging.info("Successfully downloaded %s, size(bytes): %d" % (url, statinfo.st_size))

        # concatenate all parts into zip files
        if ".zip" not in zip_filepath:
            zip_filepath = "_".join(zip_filepath.split("_")[:-1])
            subprocess.call("cat %s* > %s.zip" % (zip_filepath, zip_filepath), shell=True)
            zip_filepath += ".zip"
        extract_path = zip_filepath.strip(".zip")

        # check zip file md5sum
        with open(zip_filepath, "rb") as f_zip:
            md5 = hashlib.md5(f_zip.read()).hexdigest()
        if md5 != MD5SUM[subset]:
            raise ValueError("md5sum of %s mismatch" % zip_filepath)

        with zipfile.ZipFile(zip_filepath, "r") as zfile:
            zfile.extractall(directory)
            extract_path_ori = os.path.join(directory, zfile.infolist()[0].filename)
            subprocess.call("mv %s %s" % (extract_path_ori, extract_path), shell=True)
    finally:
        # os.remove(zip_filepath)
        pass


def exec_cmd(cmd):
    """Run a command in a subprocess.
    Args:
        cmd: command line to be executed.
    Return:
        int, the return code.
    """
    try:
        retcode = subprocess.call(cmd, shell=True)
        if retcode < 0:
            logging.info(f"Child was terminated by signal {retcode}")
    except OSError as e:
        logging.info(f"Execution failed: {e}")
        retcode = -999
    return retcode


def decode_aac_with_ffmpeg(aac_file, wav_file):
    """Decode a given AAC file into WAV using ffmpeg.
    Args:
        aac_file: file path to input AAC file.
        wav_file: file path to output WAV file.
    Return:
        bool, True if success.
    """
    cmd = f"ffmpeg -i {aac_file} {wav_file}"
    logging.info(f"Decoding aac file using command line: {cmd}")
    ret = exec_cmd(cmd)
    if ret != 0:
        logging.error(f"Failed to decode aac file with retcode {ret}")
        logging.error("Please check your ffmpeg installation.")
        return False
    return True


def convert_audio_and_make_label(input_dir, subset, output_dir, output_file):
    """Optionally convert AAC to WAV and make speaker labels.
    Args:
        input_dir: the directory which holds the input dataset.
        subset: the name of the specified subset. e.g. vox1_dev_wav
        output_dir: the directory to place the newly generated csv files.
        output_file: the name of the newly generated csv file. e.g. vox1_dev_wav.csv
    """

    logging.info("Preprocessing audio and label for subset %s" % subset)
    source_dir = os.path.join(input_dir, subset)

    files = []
    # Convert all AAC file into WAV format. At the same time, generate the csv
    for root, _, filenames in os.walk(source_dir):
        for filename in filenames:
            name, ext = os.path.splitext(filename)
            if ext.lower() == ".wav":
                _, ext2 = os.path.splitext(name)
                if ext2:
                    continue
                wav_file = os.path.join(root, filename)
            elif ext.lower() == ".m4a":
                # Convert AAC to WAV.
                aac_file = os.path.join(root, filename)
                wav_file = aac_file + ".wav"
                if not os.path.exists(wav_file):
                    if not decode_aac_with_ffmpeg(aac_file, wav_file):
                        raise RuntimeError("Audio decoding failed.")
            else:
                continue
            speaker_name = root.split(os.path.sep)[-2]
            if speaker_name not in speaker_id_dict:
                num = len(speaker_id_dict)
                speaker_id_dict[speaker_name] = num
            # wav_filesize = os.path.getsize(wav_file)
            wav_length = len(sf.read(wav_file)[0])
            files.append((os.path.abspath(wav_file), wav_length, speaker_id_dict[speaker_name], speaker_name))

    # Write to CSV file which contains four columns:
    # "wav_filename", "wav_length_ms", "speaker_id", "speaker_name".
    csv_file_path = os.path.join(output_dir, output_file)
    df = pandas.DataFrame(data=files, columns=["wav_filename", "wav_length_ms", "speaker_id", "speaker_name"])
    df.to_csv(csv_file_path, index=False, sep="\t")
    logging.info("Successfully generated csv file {}".format(csv_file_path))


def processor(directory, subset, force_process):
    """download and process"""
    urls = SUBSETS
    if subset not in urls:
        raise ValueError(subset, "is not in voxceleb")

    subset_csv = os.path.join(directory, subset + ".csv")
    if not force_process and os.path.exists(subset_csv):
        return subset_csv

    logging.info("Downloading and process the voxceleb in %s", directory)
    logging.info("Preparing subset %s", subset)
    download_and_extract(directory, subset, urls[subset])
    convert_audio_and_make_label(directory, subset, directory, subset + ".csv")
    logging.info("Finished downloading and processing")
    return subset_csv


if __name__ == "__main__":
    logging.set_verbosity(logging.INFO)
    if len(sys.argv) != 4:
        print("Usage: python prepare_data.py save_directory user password")
        sys.exit()

    DIR, USER["user"], USER["password"] = sys.argv[1], sys.argv[2], sys.argv[3]
    for SUBSET in SUBSETS:
        processor(DIR, SUBSET, False)


================================================
FILE: TTS/encoder/utils/training.py
================================================
import os
from dataclasses import dataclass, field

from coqpit import Coqpit
from trainer import TrainerArgs, get_last_checkpoint
from trainer.logging import logger_factory
from trainer.logging.console_logger import ConsoleLogger

from TTS.config import load_config, register_config
from TTS.tts.utils.text.characters import parse_symbols
from TTS.utils.generic_utils import get_experiment_folder_path, get_git_branch
from TTS.utils.io import copy_model_files


@dataclass
class TrainArgs(TrainerArgs):
    config_path: str = field(default=None, metadata={"help": "Path to the config file."})


def getarguments():
    train_config = TrainArgs()
    parser = train_config.init_argparse(arg_prefix="")
    return parser


def process_args(args, config=None):
    """Process parsed comand line arguments and initialize the config if not provided.
    Args:
        args (argparse.Namespace or dict like): Parsed input arguments.
        config (Coqpit): Model config. If none, it is generated from `args`. Defaults to None.
    Returns:
        c (TTS.utils.io.AttrDict): Config paramaters.
        out_path (str): Path to save models and logging.
        audio_path (str): Path to save generated test audios.
        c_logger (TTS.utils.console_logger.ConsoleLogger): Class that does
            logging to the console.
        dashboard_logger (WandbLogger or TensorboardLogger): Class that does the dashboard Logging
    TODO:
        - Interactive config definition.
    """
    if isinstance(args, tuple):
        args, coqpit_overrides = args
    if args.continue_path:
        # continue a previous training from its output folder
        experiment_path = args.continue_path
        args.config_path = os.path.join(args.continue_path, "config.json")
        args.restore_path, best_model = get_last_checkpoint(args.continue_path)
        if not args.best_path:
            args.best_path = best_model
    # init config if not already defined
    if config is None:
        if args.config_path:
            # init from a file
            config = load_config(args.config_path)
        else:
            # init from console args
            from TTS.config.shared_configs import BaseTrainingConfig  # pylint: disable=import-outside-toplevel

            config_base = BaseTrainingConfig()
            config_base.parse_known_args(coqpit_overrides)
            config = register_config(config_base.model)()
    # override values from command-line args
    config.parse_known_args(coqpit_overrides, relaxed_parser=True)
    experiment_path = args.continue_path
    if not experiment_path:
        experiment_path = get_experiment_folder_path(config.output_path, config.run_name)
    audio_path = os.path.join(experiment_path, "test_audios")
    config.output_log_path = experiment_path
    # setup rank 0 process in distributed training
    dashboard_logger = None
    if args.rank == 0:
        new_fields = {}
        if args.restore_path:
            new_fields["restore_path"] = args.restore_path
        new_fields["github_branch"] = get_git_branch()
        # if model characters are not set in the config file
        # save the default set to the config file for future
        # compatibility.
        if config.has("characters") and config.characters is None:
            used_characters = parse_symbols()
            new_fields["characters"] = used_characters
        copy_model_files(config, experiment_path, new_fields)
        dashboard_logger = logger_factory(config, experiment_path)
    c_logger = ConsoleLogger()
    return config, experiment_path, audio_path, c_logger, dashboard_logger


def init_arguments():
    train_config = TrainArgs()
    parser = train_config.init_argparse(arg_prefix="")
    return parser


def init_training(config: Coqpit = None):
    """Initialization of a training run."""
    parser = init_arguments()
    args = parser.parse_known_args()
    config, OUT_PATH, AUDIO_PATH, c_logger, dashboard_logger = process_args(args, config)
    return args[0], config, OUT_PATH, AUDIO_PATH, c_logger, dashboard_logger


================================================
FILE: TTS/encoder/utils/visual.py
================================================
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import umap

matplotlib.use("Agg")


colormap = (
    np.array(
        [
            [76, 255, 0],
            [0, 127, 70],
            [255, 0, 0],
            [255, 217, 38],
            [0, 135, 255],
            [165, 0, 165],
            [255, 167, 255],
            [0, 255, 255],
            [255, 96, 38],
            [142, 76, 0],
            [33, 0, 127],
            [0, 0, 0],
            [183, 183, 183],
        ],
        dtype=np.float,
    )
    / 255
)


def plot_embeddings(embeddings, num_classes_in_batch):
    num_utter_per_class = embeddings.shape[0] // num_classes_in_batch

    # if necessary get just the first 10 classes
    if num_classes_in_batch > 10:
        num_classes_in_batch = 10
        embeddings = embeddings[: num_classes_in_batch * num_utter_per_class]

    model = umap.UMAP()
    projection = model.fit_transform(embeddings)
    ground_truth = np.repeat(np.arange(num_classes_in_batch), num_utter_per_class)
    colors = [colormap[i] for i in ground_truth]
    fig, ax = plt.subplots(figsize=(16, 10))
    _ = ax.scatter(projection[:, 0], projection[:, 1], c=colors)
    plt.gca().set_aspect("equal", "datalim")
    plt.title("UMAP projection")
    plt.tight_layout()
    plt.savefig("umap")
    return fig


================================================
FILE: TTS/model.py
================================================
from abc import abstractmethod
from typing import Dict

import torch
from coqpit import Coqpit
from trainer import TrainerModel

# pylint: skip-file


class BaseTrainerModel(TrainerModel):
    """BaseTrainerModel model expanding TrainerModel with required functions by 🐸TTS.

    Every new 🐸TTS model must inherit it.
    """

    @staticmethod
    @abstractmethod
    def init_from_config(config: Coqpit):
        """Init the model and all its attributes from the given config.

        Override this depending on your model.
        """
        ...

    @abstractmethod
    def inference(self, input: torch.Tensor, aux_input={}) -> Dict:
        """Forward pass for inference.

        It must return a dictionary with the main model output and all the auxiliary outputs. The key ```model_outputs```
        is considered to be the main output and you can add any other auxiliary outputs as you want.

        We don't use `*kwargs` since it is problematic with the TorchScript API.

        Args:
            input (torch.Tensor): [description]
            aux_input (Dict): Auxiliary inputs like speaker embeddings, durations etc.

        Returns:
            Dict: [description]
        """
        outputs_dict = {"model_outputs": None}
        ...
        return outputs_dict

    @abstractmethod
    def load_checkpoint(
        self, config: Coqpit, checkpoint_path: str, eval: bool = False, strict: bool = True, cache=False
    ) -> None:
        """Load a model checkpoint gile and get ready for training or inference.

        Args:
            config (Coqpit): Model configuration.
            checkpoint_path (str): Path to the model checkpoint file.
            eval (bool, optional): If true, init model for inference else for training. Defaults to False.
            strict (bool, optional): Match all checkpoint keys to model's keys. Defaults to True.
            cache (bool, optional): If True, cache the file locally for subsequent calls. It is cached under `get_user_data_dir()/tts_cache`. Defaults to False.
        """
        ...


================================================
FILE: TTS/server/README.md
================================================
# :frog: TTS demo server
Before you use the server, make sure you [install](https://github.com/coqui-ai/TTS/tree/dev#install-tts)) :frog: TTS properly. Then, you can follow the steps below.

**Note:** If you install :frog:TTS using ```pip```, you can also use the ```tts-server``` end point on the terminal.

Examples runs:

List officially released models.
```python TTS/server/server.py  --list_models ```

Run the server with the official models.
```python TTS/server/server.py  --model_name tts_models/en/ljspeech/tacotron2-DCA --vocoder_name vocoder_models/en/ljspeech/multiband-melgan```

Run the server with the official models on a GPU.
```CUDA_VISIBLE_DEVICES="0" python TTS/server/server.py  --model_name tts_models/en/ljspeech/tacotron2-DCA --vocoder_name vocoder_models/en/ljspeech/multiband-melgan --use_cuda True```

Run the server with a custom models.
```python TTS/server/server.py  --tts_checkpoint /path/to/tts/model.pth --tts_config /path/to/tts/config.json --vocoder_checkpoint /path/to/vocoder/model.pth --vocoder_config /path/to/vocoder/config.json```


================================================
FILE: TTS/server/__init__.py
================================================


================================================
FILE: TTS/server/conf.json
================================================
{
    "tts_path":"/media/erogol/data_ssd/Models/libri_tts/5049/",  // tts model root folder
    "tts_file":"best_model.pth",     // tts checkpoint file
    "tts_config":"config.json",     // tts config.json file
    "tts_speakers": null,           // json file listing speaker ids. null if no speaker embedding.
    "vocoder_config":null,
    "vocoder_file": null,
    "is_wavernn_batched":true,
    "port": 5002,
    "use_cuda": true,
    "debug": true
}


================================================
FILE: TTS/server/server.py
================================================
#!flask/bin/python
import argparse
import io
import json
import os
import sys
from pathlib import Path
from threading import Lock
from typing import Union
from urllib.parse import parse_qs

from flask import Flask, render_template, render_template_string, request, send_file

from TTS.config import load_config
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer


def create_argparser():
    def convert_boolean(x):
        return x.lower() in ["true", "1", "yes"]

    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--list_models",
        type=convert_boolean,
        nargs="?",
        const=True,
        default=False,
        help="list available pre-trained tts and vocoder models.",
    )
    parser.add_argument(
        "--model_name",
        type=str,
        default="tts_models/en/ljspeech/tacotron2-DDC",
        help="Name of one of the pre-trained tts models in format <language>/<dataset>/<model_name>",
    )
    parser.add_argument("--vocoder_name", type=str, default=None, help="name of one of the released vocoder models.")

    # Args for running custom models
    parser.add_argument("--config_path", default=None, type=str, help="Path to model config file.")
    parser.add_argument(
        "--model_path",
        type=str,
        default=None,
        help="Path to model file.",
    )
    parser.add_argument(
        "--vocoder_path",
        type=str,
        help="Path to vocoder model file. If it is not defined, model uses GL as vocoder. Please make sure that you installed vocoder library before (WaveRNN).",
        default=None,
    )
    parser.add_argument("--vocoder_config_path", type=str, help="Path to vocoder model config file.", default=None)
    parser.add_argument("--speakers_file_path", type=str, help="JSON file for multi-speaker model.", default=None)
    parser.add_argument("--port", type=int, default=5002, help="port to listen on.")
    parser.add_argument("--use_cuda", type=convert_boolean, default=False, help="true to use CUDA.")
    parser.add_argument("--debug", type=convert_boolean, default=False, help="true to enable Flask debug mode.")
    parser.add_argument("--show_details", type=convert_boolean, default=False, help="Generate model detail page.")
    return parser


# parse the args
args = create_argparser().parse_args()

path = Path(__file__).parent / "../.models.json"
manager = ModelManager(path)

if args.list_models:
    manager.list_models()
    sys.exit()

# update in-use models to the specified released models.
model_path = None
config_path = None
speakers_file_path = None
vocoder_path = None
vocoder_config_path = None

# CASE1: list pre-trained TTS models
if args.list_models:
    manager.list_models()
    sys.exit()

# CASE2: load pre-trained model paths
if args.model_name is not None and not args.model_path:
    model_path, config_path, model_item = manager.download_model(args.model_name)
    args.vocoder_name = model_item["default_vocoder"] if args.vocoder_name is None else args.vocoder_name

if args.vocoder_name is not None and not args.vocoder_path:
    vocoder_path, vocoder_config_path, _ = manager.download_model(args.vocoder_name)

# CASE3: set custom model paths
if args.model_path is not None:
    model_path = args.model_path
    config_path = args.config_path
    speakers_file_path = args.speakers_file_path

if args.vocoder_path is not None:
    vocoder_path = args.vocoder_path
    vocoder_config_path = args.vocoder_config_path

# load models
synthesizer = Synthesizer(
    tts_checkpoint=model_path,
    tts_config_path=config_path,
    tts_speakers_file=speakers_file_path,
    tts_languages_file=None,
    vocoder_checkpoint=vocoder_path,
    vocoder_config=vocoder_config_path,
    encoder_checkpoint="",
    encoder_config="",
    use_cuda=args.use_cuda,
)

use_multi_speaker = hasattr(synthesizer.tts_model, "num_speakers") and (
    synthesizer.tts_model.num_speakers > 1 or synthesizer.tts_speakers_file is not None
)
speaker_manager = getattr(synthesizer.tts_model, "speaker_manager", None)

use_multi_language = hasattr(synthesizer.tts_model, "num_languages") and (
    synthesizer.tts_model.num_languages > 1 or synthesizer.tts_languages_file is not None
)
language_manager = getattr(synthesizer.tts_model, "language_manager", None)

# TODO: set this from SpeakerManager
use_gst = synthesizer.tts_config.get("use_gst", False)
app = Flask(__name__)


def style_wav_uri_to_dict(style_wav: str) -> Union[str, dict]:
    """Transform an uri style_wav, in either a string (path to wav file to be use for style transfer)
    or a dict (gst tokens/values to be use for styling)

    Args:
        style_wav (str): uri

    Returns:
        Union[str, dict]: path to file (str) or gst style (dict)
    """
    if style_wav:
        if os.path.isfile(style_wav) and style_wav.endswith(".wav"):
            return style_wav  # style_wav is a .wav file located on the server

        style_wav = json.loads(style_wav)
        return style_wav  # style_wav is a gst dictionary with {token1_id : token1_weigth, ...}
    return None


@app.route("/")
def index():
    return render_template(
        "index.html",
        show_details=args.show_details,
        use_multi_speaker=use_multi_speaker,
        use_multi_language=use_multi_language,
        speaker_ids=speaker_manager.name_to_id if speaker_manager is not None else None,
        language_ids=language_manager.name_to_id if language_manager is not None else None,
        use_gst=use_gst,
    )


@app.route("/details")
def details():
    model_config = load_config(args.tts_config)
    if args.vocoder_config is not None and os.path.isfile(args.vocoder_config):
        vocoder_config = load_config(args.vocoder_config)
    else:
        vocoder_config = None

    return render_template(
        "details.html",
        show_details=args.show_details,
        model_config=model_config,
        vocoder_config=vocoder_config,
        args=args.__dict__,
    )


lock = Lock()


@app.route("/api/tts", methods=["GET"])
def tts():
    with lock:
        text = request.args.get("text")
        speaker_idx = request.args.get("speaker_id", "")
        language_idx = request.args.get("language_id", "")
        style_wav = request.args.get("style_wav", "")
        style_wav = style_wav_uri_to_dict(style_wav)
        print(f" > Model input: {text}")
        print(f" > Speaker Idx: {speaker_idx}")
        print(f" > Language Idx: {language_idx}")
        wavs = synthesizer.tts(text, speaker_name=speaker_idx, language_name=language_idx, style_wav=style_wav)
        out = io.BytesIO()
        synthesizer.save_wav(wavs, out)
    return send_file(out, mimetype="audio/wav")


# Basic MaryTTS compatibility layer


@app.route("/locales", methods=["GET"])
def mary_tts_api_locales():
    """MaryTTS-compatible /locales endpoint"""
    # NOTE: We currently assume there is only one model active at the same time
    if args.model_name is not None:
        model_details = args.model_name.split("/")
    else:
        model_details = ["", "en", "", "default"]
    return render_template_string("{{ locale }}\n", locale=model_details[1])


@app.route("/voices", methods=["GET"])
def mary_tts_api_voices():
    """MaryTTS-compatible /voices endpoint"""
    # NOTE: We currently assume there is only one model active at the same time
    if args.model_name is not None:
        model_details = args.model_name.split("/")
    else:
        model_details = ["", "en", "", "default"]
    return render_template_string(
        "{{ name }} {{ locale }} {{ gender }}\n", name=model_details[3], locale=model_details[1], gender="u"
    )


@app.route("/process", methods=["GET", "POST"])
def mary_tts_api_process():
    """MaryTTS-compatible /process endpoint"""
    with lock:
        if request.method == "POST":
            data = parse_qs(request.get_data(as_text=True))
            # NOTE: we ignore param. LOCALE and VOICE for now since we have only one active model
            text = data.get("INPUT_TEXT", [""])[0]
        else:
            text = request.args.get("INPUT_TEXT", "")
        print(f" > Model input: {text}")
        wavs = synthesizer.tts(text)
        out = io.BytesIO()
        synthesizer.save_wav(wavs, out)
    return send_file(out, mimetype="audio/wav")


def main():
    app.run(debug=args.debug, host="::", port=args.port)


if __name__ == "__main__":
    main()


================================================
FILE: TTS/server/templates/details.html
================================================
<!DOCTYPE html>
<html lang="en">

<head>

  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
  <meta name="description" content="">
  <meta name="author" content="">

  <title>TTS engine</title>

  <!-- Bootstrap core CSS -->
  <link href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.1/css/bootstrap.min.css"
    integrity="sha384-WskhaSGFgHYWDcbwN70/dfYBj47jz9qbsMId/iRN3ewGhXQFZCSftd1LZCfmhktB" crossorigin="anonymous"
    rel="stylesheet">

  <!-- Custom styles for this template -->
  <style>
    body {
      padding-top: 54px;
    }

    @media (min-width: 992px) {
      body {
        padding-top: 56px;
      }
    }
  </style>
</head>

<body>
  <a href="https://github.com/mozilla/TTS"><img style="position: absolute; z-index:1000; top: 0; left: 0; border: 0;"
      src="https://s3.amazonaws.com/github/ribbons/forkme_left_darkblue_121621.png" alt="Fork me on GitHub"></a>

  {% if show_details == true %}

  <div class="container">
    <b>Model details</b>
  </div>

  <div class="container">
    <details>
      <summary>CLI arguments:</summary>
      <table border="1" align="center" width="75%">
        <tr>
          <td> CLI key </td>
          <td> Value </td>
        </tr>

        {% for key, value in args.items() %}

        <tr>
          <td>{{ key }}</td>
          <td>{{ value }}</td>
        </tr>

        {% endfor %}
      </table>
    </details>
  </div></br>

  <div class="container">

    {% if model_config != None %}

    <details>
      <summary>Model config:</summary>

      <table border="1" align="center" width="75%">
        <tr>
          <td> Key </td>
          <td> Value </td>
        </tr>


        {% for key, value in model_config.items() %}

        <tr>
          <td>{{ key }}</td>
          <td>{{ value }}</td>
        </tr>

        {% endfor %}

      </table>
    </details>

    {% endif %}

  </div></br>


  <div class="container">
    {% if vocoder_config != None %}
    <details>
      <summary>Vocoder model config:</summary>

      <table border="1" align="center" width="75%">
        <tr>
          <td> Key </td>
          <td> Value </td>
        </tr>


        {% for key, value in vocoder_config.items() %}

        <tr>
          <td>{{ key }}</td>
          <td>{{ value }}</td>
        </tr>

        {% endfor %}


      </table>
    </details>
    {% endif %}
  </div></br>

  {% else %}
  <div class="container">
    <b>Please start server with --show_details=true to see details.</b>
  </div>

  {% endif %}

</body>

</html>

================================================
FILE: TTS/server/templates/index.html
================================================
<!DOCTYPE html>
<html lang="en">

<head>

    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <meta name="description" content="🐸Coqui AI TTS demo server.">
    <meta name="author" content="🐸Coqui AI TTS">

    <title>TTS engine</title>

    <!-- Bootstrap core CSS -->
    <link href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.1/css/bootstrap.min.css"
        integrity="sha384-WskhaSGFgHYWDcbwN70/dfYBj47jz9qbsMId/iRN3ewGhXQFZCSftd1LZCfmhktB" crossorigin="anonymous"
        rel="stylesheet">

    <!-- Custom styles for this template -->
    <style>
        body {
            padding-top: 54px;
        }

        @media (min-width: 992px) {
            body {
                padding-top: 56px;
            }
        }
    </style>
</head>

<body>
    <a href="https://github.com/coqui-ai/TTS"><img style="position: absolute; z-index:1000; top: 0; left: 0; border: 0;"
            src="https://s3.amazonaws.com/github/ribbons/forkme_left_darkblue_121621.png" alt="Fork me on GitHub"></a>

    <!-- Navigation -->
    <!--
    <nav class="navbar navbar-expand-lg navbar-dark bg-dark fixed-top">
      <div class="container">
        <a class="navbar-brand" href="#">Coqui TTS</a>
        <button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarResponsive" aria-controls="navbarResponsive" aria-expanded="false" aria-label="Toggle navigation">
          <span class="navbar-toggler-icon"></span>
        </button>
        <div class="collapse navbar-collapse" id="navbarResponsive">
          <ul class="navbar-nav ml-auto">
            <li class="nav-item active">
              <a class="nav-link" href="#">Home
                <span class="sr-only">(current)</span>
              </a>
            </li>
          </ul>
        </div>
      </div>
    </nav>
    -->

    <!-- Page Content -->
    <div class="container">
        <div class="row">
            <div class="col-lg-12 text-center">
                <img class="mt-5" src="{{url_for('static', filename='coqui-log-green-TTS.png')}}" align="middle"
                    width="512" />

                <ul class="list-unstyled">
                </ul>

                {%if use_gst%}
                <input value='{"0": 0.1}' id="style_wav" placeholder="style wav (dict or path to wav).." size=45
                    type="text" name="style_wav">
                {%endif%}

                <input id="text" placeholder="Type here..." size=45 type="text" name="text">
                <button id="speak-button" name="speak">Speak</button><br /><br />

                {%if use_multi_speaker%}
                Choose a speaker:
                <select id="speaker_id" name=speaker_id method="GET" action="/">
                    {% for speaker_id in speaker_ids %}
                    <option value="{{speaker_id}}" SELECTED>{{speaker_id}}</option>"
                    {% endfor %}
                </select><br /><br />
                {%endif%}

                {%if use_multi_language%}
                Choose a language:
                <select id="language_id" name=language_id method="GET" action="/">
                    {% for language_id in language_ids %}
                    <option value="{{language_id}}" SELECTED>{{language_id}}</option>"
                    {% endfor %}
                </select><br /><br />
                {%endif%}


                {%if show_details%}
                <button id="details-button" onclick="location.href = 'details'" name="model-details">Model
                    Details</button><br /><br />
                {%endif%}
                <audio id="audio" controls autoplay hidden></audio>
                <p id="message"></p>
            </div>
        </div>
    </div>

    <!-- Bootstrap core JavaScript -->
    <script>
        function getTextValue(textId) {
            const container = q(textId)
            if (container) {
                return container.value
            }
            return ""
        }
        function q(selector) { return document.querySelector(selector) }
        q('#text').focus()
        function do_tts(e) {
            const text = q('#text').value
            const speaker_id = getTextValue('#speaker_id')
            const style_wav = getTextValue('#style_wav')
            const language_id = getTextValue('#language_id')
            if (text) {
                q('#message').textContent = 'Synthesizing...'
                q('#speak-button').disabled = true
                q('#audio').hidden = true
                synthesize(text, speaker_id, style_wav, language_id)
            }
            e.preventDefault()
            return false
        }
        q('#speak-button').addEventListener('click', do_tts)
        q('#text').addEventListener('keyup', function (e) {
            if (e.keyCode == 13) { // enter
                do_tts(e)
            }
        })
        function synthesize(text, speaker_id = "", style_wav = "", language_id = "") {
            fetch(`/api/tts?text=${encodeURIComponent(text)}&speaker_id=${encodeURIComponent(speaker_id)}&style_wav=${encodeURIComponent(style_wav)}&language_id=${encodeURIComponent(language_id)}`, { cache: 'no-cache' })
                .then(function (res) {
                    if (!res.ok) throw Error(res.statusText)
                    return res.blob()
                }).then(function (blob) {
                    q('#message').textContent = ''
                    q('#speak-button').disabled = false
                    q('#audio').src = URL.createObjectURL(blob)
                    q('#audio').hidden = false
                }).catch(function (err) {
                    q('#message').textContent = 'Error: ' + err.message
                    q('#speak-button').disabled = false
                })
        }
    </script>

</body>

</html>

================================================
FILE: TTS/tts/__init__.py
================================================


================================================
FILE: TTS/tts/configs/__init__.py
================================================
import importlib
import os
from inspect import isclass

# import all files under configs/
# configs_dir = os.path.dirname(__file__)
# for file in os.listdir(configs_dir):
#     path = os.path.join(configs_dir, file)
#     if not file.startswith("_") and not file.startswith(".") and (file.endswith(".py") or os.path.isdir(path)):
#         config_name = file[: file.find(".py")] if file.endswith(".py") else file
#         module = importlib.import_module("TTS.tts.configs." + config_name)
#         for attribute_name in dir(module):
#             attribute = getattr(module, attribute_name)

#             if isclass(attribute):
#                 # Add the class to this package's variables
#                 globals()[attribute_name] = attribute


================================================
FILE: TTS/tts/configs/align_tts_config.py
================================================
from dataclasses import dataclass, field
from typing import List

from TTS.tts.configs.shared_configs import BaseTTSConfig
from TTS.tts.models.align_tts import AlignTTSArgs


@dataclass
class AlignTTSConfig(BaseTTSConfig):
    """Defines parameters for AlignTTS model.
    Example:

        >>> from TTS.tts.configs.align_tts_config import AlignTTSConfig
        >>> config = AlignTTSConfig()

    Args:
        model(str):
            Model name used for selecting the right model at initialization. Defaults to `align_tts`.
        positional_encoding (bool):
            enable / disable positional encoding applied to the encoder output. Defaults to True.
        hidden_channels (int):
            Base number of hidden channels. Defines all the layers expect ones defined by the specific encoder or decoder
            parameters. Defaults to 256.
        hidden_channels_dp (int):
            Number of hidden channels of the duration predictor's layers. Defaults to 256.
        encoder_type (str):
            Type of the encoder used by the model. Look at `TTS.tts.layers.feed_forward.encoder` for more details.
            Defaults to `fftransformer`.
        encoder_params (dict):
            Parameters used to define the encoder network. Look at `TTS.tts.layers.feed_forward.encoder` for more details.
            Defaults to `{"hidden_channels_ffn": 1024, "num_heads": 2, "num_layers": 6, "dropout_p": 0.1}`.
        decoder_type (str):
            Type of the decoder used by the model. Look at `TTS.tts.layers.feed_forward.decoder` for more details.
            Defaults to `fftransformer`.
        decoder_params (dict):
            Parameters used to define the decoder network. Look at `TTS.tts.layers.feed_forward.decoder` for more details.
            Defaults to `{"hidden_channels_ffn": 1024, "num_heads": 2, "num_layers": 6, "dropout_p": 0.1}`.
        phase_start_steps (List[int]):
            A list of number of steps required to start the next training phase. AlignTTS has 4 different training
            phases. Thus you need to define 4 different values to enable phase based training. If None, it
            trains the whole model together. Defaults to None.
        ssim_alpha (float):
            Weight for the SSIM loss. If set <= 0, disables the SSIM loss. Defaults to 1.0.
        duration_loss_alpha (float):
            Weight for the duration predictor's loss. Defaults to 1.0.
        mdn_alpha (float):
            Weight for the MDN loss. Defaults to 1.0.
        spec_loss_alpha (float):
            Weight for the MSE spectrogram loss. If set <= 0, disables the L1 loss. Defaults to 1.0.
        use_speaker_embedding (bool):
            enable / disable using speaker embeddings for multi-speaker models. If set True, the model is
            in the multi-speaker mode. Defaults to False.
        use_d_vector_file (bool):
            enable /disable using external speaker embeddings in place of the learned embeddings. Defaults to False.
        d_vector_file (str):
            Path to the file including pre-computed speaker embeddings. Defaults to None.
        noam_schedule (bool):
            enable / disable the use of Noam LR scheduler. Defaults to False.
        warmup_steps (int):
            Number of warm-up steps for the Noam scheduler. Defaults 4000.
        lr (float):
            Initial learning rate. Defaults to `1e-3`.
        wd (float):
            Weight decay coefficient. Defaults to `1e-7`.
        min_seq_len (int):
            Minimum input sequence length to be used at training.
        max_seq_len (int):
            Maximum input sequence length to be used at training. Larger values result in more VRAM usage."""

    model: str = "align_tts"
    # model specific params
    model_args: AlignTTSArgs = field(default_factory=AlignTTSArgs)
    phase_start_steps: List[int] = None

    ssim_alpha: float = 1.0
    spec_loss_alpha: float = 1.0
    dur_loss_alpha: float = 1.0
    mdn_alpha: float = 1.0

    # multi-speaker settings
    use_speaker_embedding: bool = False
    use_d_vector_file: bool = False
    d_vector_file: str = False

    # optimizer parameters
    optimizer: str = "Adam"
    optimizer_params: dict = field(default_factory=lambda: {"betas": [0.9, 0.998], "weight_decay": 1e-6})
    lr_scheduler: str = None
    lr_scheduler_params: dict = None
    lr: float = 1e-4
    grad_clip: float = 5.0

    # overrides
    min_seq_len: int = 13
    max_seq_len: int = 200
    r: int = 1

    # testing
    test_sentences: List[str] = field(
        default_factory=lambda: [
            "It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
            "Be a voice, not an echo.",
            "I'm sorry Dave. I'm afraid I can't do that.",
            "This cake is great. It's so delicious and moist.",
            "Prior to November 22, 1963.",
        ]
    )


================================================
FILE: TTS/tts/configs/fast_pitch_config.py
================================================
from dataclasses import dataclass, field
from typing import List

from TTS.tts.configs.shared_configs import BaseTTSConfig
from TTS.tts.models.forward_tts import ForwardTTSArgs


@dataclass
class FastPitchConfig(BaseTTSConfig):
    """Configure `ForwardTTS` as FastPitch model.

    Example:

        >>> from TTS.tts.configs.fast_pitch_config import FastPitchConfig
        >>> config = FastPitchConfig()

    Args:
        model (str):
            Model name used for selecting the right model at initialization. Defaults to `fast_pitch`.

        base_model (str):
            Name of the base model being configured as this model so that 🐸 TTS knows it needs to initiate
            the base model rather than searching for the `model` implementation. Defaults to `forward_tts`.

        model_args (Coqpit):
            Model class arguments. Check `FastPitchArgs` for more details. Defaults to `FastPitchArgs()`.

        data_dep_init_steps (int):
            Number of steps used for computing normalization parameters at the beginning of the training. GlowTTS uses
            Activation Normalization that pre-computes normalization stats at the beginning and use the same values
            for the rest. Defaults to 10.

        speakers_file (str):
            Path to the file containing the list of speakers. Needed at inference for loading matching speaker ids to
            speaker names. Defaults to `None`.

        use_speaker_embedding (bool):
            enable / disable using speaker embeddings for multi-speaker models. If set True, the model is
            in the multi-speaker mode. Defaults to False.

        use_d_vector_file (bool):
            enable /disable using external speaker embeddings in place of the learned embeddings. Defaults to False.

        d_vector_file (str):
            Path to the file including pre-computed speaker embeddings. Defaults to None.

        d_vector_dim (int):
            Dimension of the external speaker embeddings. Defaults to 0.

        optimizer (str):
            Name of the model optimizer. Defaults to `Adam`.

        optimizer_params (dict):
            Arguments of the model optimizer. Defaults to `{"betas": [0.9, 0.998], "weight_decay": 1e-6}`.

        lr_scheduler (str):
            Name of the learning rate scheduler. Defaults to `Noam`.

        lr_scheduler_params (dict):
            Arguments of the learning rate scheduler. Defaults to `{"warmup_steps": 4000}`.

        lr (float):
            Initial learning rate. Defaults to `1e-3`.

        grad_clip (float):
            Gradient norm clipping value. Defaults to `5.0`.

        spec_loss_type (str):
            Type of the spectrogram loss. Check `ForwardTTSLoss` for possible values. Defaults to `mse`.

        duration_loss_type (str):
            Type of the duration loss. Check `ForwardTTSLoss` for possible values. Defaults to `mse`.

        use_ssim_loss (bool):
            Enable/disable the use of SSIM (Structural Similarity) loss. Defaults to True.

        wd (float):
            Weight decay coefficient. Defaults to `1e-7`.

        ssim_loss_alpha (float):
            Weight for the SSIM loss. If set 0, disables the SSIM loss. Defaults to 1.0.

        dur_loss_alpha (float):
            Weight for the duration predictor's loss. If set 0, disables the huber loss. Defaults to 1.0.

        spec_loss_alpha (float):
            Weight for the L1 spectrogram loss. If set 0, disables the L1 loss. Defaults to 1.0.

        pitch_loss_alpha (float):
            Weight for the pitch predictor's loss. If set 0, disables the pitch predictor. Defaults to 1.0.

        binary_align_loss_alpha (float):
            Weight for the binary loss. If set 0, disables the binary loss. Defaults to 1.0.

        binary_loss_warmup_epochs (float):
            Number of epochs to gradually increase the binary loss impact. Defaults to 150.

        min_seq_len (int):
            Minimum input sequence length to be used at training.

        max_seq_len (int):
            Maximum input sequence length to be used at training. Larger values result in more VRAM usage.

        # dataset configs
        compute_f0(bool):
            Compute pitch. defaults to True

        f0_cache_path(str):
            pith cache path. defaults to None
    """

    model: str = "fast_pitch"
    base_model: str = "forward_tts"

    # model specific params
    model_args: ForwardTTSArgs = ForwardTTSArgs()

    # multi-speaker settings
    num_speakers: int = 0
    speakers_file: str = None
    use_speaker_embedding: bool = False
    use_d_vector_file: bool = False
    d_vector_file: str = False
    d_vector_dim: int = 0

    # optimizer parameters
    optimizer: str = "Adam"
    optimizer_params: dict = field(default_factory=lambda: {"betas": [0.9, 0.998], "weight_decay": 1e-6})
    lr_scheduler: str = "NoamLR"
    lr_scheduler_params: dict = field(default_factory=lambda: {"warmup_steps": 4000})
    lr: float = 1e-4
    grad_clip: float = 5.0

    # loss params
    spec_loss_type: str = "mse"
    duration_loss_type: str = "mse"
    use_ssim_loss: bool = True
    ssim_loss_alpha: float = 1.0
    spec_loss_alpha: float = 1.0
    aligner_loss_alpha: float = 1.0
    pitch_loss_alpha: float = 0.1
    dur_loss_alpha: float = 0.1
    binary_align_loss_alpha: float = 0.1
    binary_loss_warmup_epochs: int = 150

    # overrides
    min_seq_len: int = 13
    max_seq_len: int = 200
    r: int = 1  # DO NOT CHANGE

    # dataset configs
    compute_f0: bool = True
    f0_cache_path: str = None

    # testing
    test_sentences: List[str] = field(
        default_factory=lambda: [
            "It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
            "Be a voice, not an echo.",
            "I'm sorry Dave. I'm afraid I can't do that.",
            "This cake is great. It's so delicious and moist.",
            "Prior to November 22, 1963.",
        ]
    )

    def __post_init__(self):
        # Pass multi-speaker parameters to the model args as `model.init_multispeaker()` looks for it there.
        if self.num_speakers > 0:
            self.model_args.num_speakers = self.num_speakers

        # speaker embedding settings
        if self.use_speaker_embedding:
            self.model_args.use_speaker_embedding = True
        if self.speakers_file:
            self.model_args.speakers_file = self.speakers_file

        # d-vector settings
        if self.use_d_vector_file:
            self.model_args.use_d_vector_file = True
        if self.d_vector_dim is not None and self.d_vector_dim > 0:
            self.model_args.d_vector_dim = self.d_vector_dim
        if self.d_vector_file:
            self.model_args.d_vector_file = self.d_vector_file


================================================
FILE: TTS/tts/configs/fast_speech_config.py
================================================
from dataclasses import dataclass, field
from typing import List

from TTS.tts.configs.shared_configs import BaseTTSConfig
from TTS.tts.models.forward_tts import ForwardTTSArgs


@dataclass
class FastSpeechConfig(BaseTTSConfig):
    """Configure `ForwardTTS` as FastSpeech model.

    Example:

        >>> from TTS.tts.configs.fast_speech_config import FastSpeechConfig
        >>> config = FastSpeechConfig()

    Args:
        model (str):
            Model name used for selecting the right model at initialization. Defaults to `fast_pitch`.

        base_model (str):
            Name of the base model being configured as this model so that 🐸 TTS knows it needs to initiate
            the base model rather than searching for the `model` implementation. Defaults to `forward_tts`.

        model_args (Coqpit):
            Model class arguments. Check `FastSpeechArgs` for more details. Defaults to `FastSpeechArgs()`.

        data_dep_init_steps (int):
            Number of steps used for computing normalization parameters at the beginning of the training. GlowTTS uses
            Activation Normalization that pre-computes normalization stats at the beginning and use the same values
            for the rest. Defaults to 10.

        speakers_file (str):
            Path to the file containing the list of speakers. Needed at inference for loading matching speaker ids to
            speaker names. Defaults to `None`.


        use_speaker_embedding (bool):
            enable / disable using speaker embeddings for multi-speaker models. If set True, the model is
            in the multi-speaker mode. Defaults to False.

        use_d_vector_file (bool):
            enable /disable using external speaker embeddings in place of the learned embeddings. Defaults to False.

        d_vector_file (str):
            Path to the file including pre-computed speaker embeddings. Defaults to None.

        d_vector_dim (int):
            Dimension of the external speaker embeddings. Defaults to 0.

        optimizer (str):
            Name of the model optimizer. Defaults to `Adam`.

        optimizer_params (dict):
            Arguments of the model optimizer. Defaults to `{"betas": [0.9, 0.998], "weight_decay": 1e-6}`.

        lr_scheduler (str):
            Name of the learning rate scheduler. Defaults to `Noam`.

        lr_scheduler_params (dict):
            Arguments of the learning rate scheduler. Defaults to `{"warmup_steps": 4000}`.

        lr (float):
            Initial learning rate. Defaults to `1e-3`.

        grad_clip (float):
            Gradient norm clipping value. Defaults to `5.0`.

        spec_loss_type (str):
            Type of the spectrogram loss. Check `ForwardTTSLoss` for possible values. Defaults to `mse`.

        duration_loss_type (str):
            Type of the duration loss. Check `ForwardTTSLoss` for possible values. Defaults to `mse`.

        use_ssim_loss (bool):
            Enable/disable the use of SSIM (Structural Similarity) loss. Defaults to True.

        wd (float):
            Weight decay coefficient. Defaults to `1e-7`.

        ssim_loss_alpha (float):
            Weight for the SSIM loss. If set 0, disables the SSIM loss. Defaults to 1.0.

        dur_loss_alpha (float):
            Weight for the duration predictor's loss. If set 0, disables the huber loss. Defaults to 1.0.

        spec_loss_alpha (float):
            Weight for the L1 spectrogram loss. If set 0, disables the L1 loss. Defaults to 1.0.

        pitch_loss_alpha (float):
            Weight for the pitch predictor's loss. If set 0, disables the pitch predictor. Defaults to 1.0.

        binary_loss_alpha (float):
            Weight for the binary loss. If set 0, disables the binary loss. Defaults to 1.0.

        binary_loss_warmup_epochs (float):
            Number of epochs to gradually increase the binary loss impact. Defaults to 150.

        min_seq_len (int):
            Minimum input sequence length to be used at training.

        max_seq_len (int):
            Maximum input sequence length to be used at training. Larger values result in more VRAM usage.
    """

    model: str = "fast_speech"
    base_model: str = "forward_tts"

    # model specific params
    model_args: ForwardTTSArgs = ForwardTTSArgs(use_pitch=False)

    # multi-speaker settings
    num_speakers: int = 0
    speakers_file: str = None
    use_speaker_embedding: bool = False
    use_d_vector_file: bool = False
    d_vector_file: str = False
    d_vector_dim: int = 0

    # optimizer parameters
    optimizer: str = "Adam"
    optimizer_params: dict = field(default_factory=lambda: {"betas": [0.9, 0.998], "weight_decay": 1e-6})
    lr_scheduler: str = "NoamLR"
    lr_scheduler_params: dict = field(default_factory=lambda: {"warmup_steps": 4000})
    lr: float = 1e-4
    grad_clip: float = 5.0

    # loss params
    spec_loss_type: str = "mse"
    duration_loss_type: str = "mse"
    use_ssim_loss: bool = True
    ssim_loss_alpha: float = 1.0
    dur_loss_alpha: float = 1.0
    spec_loss_alpha: float = 1.0
    pitch_loss_alpha: float = 0.0
    aligner_loss_alpha: float = 1.0
    binary_align_loss_alpha: float = 1.0
    binary_loss_warmup_epochs: int = 150

    # overrides
    min_seq_len: int = 13
    max_seq_len: int = 200
    r: int = 1  # DO NOT CHANGE

    # dataset configs
    compute_f0: bool = False
    f0_cache_path: str = None

    # testing
    test_sentences: List[str] = field(
        default_factory=lambda: [
            "It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
            "Be a voice, not an echo.",
            "I'm sorry Dave. I'm afraid I can't do that.",
            "This cake is great. It's so delicious and moist.",
            "Prior to November 22, 1963.",
        ]
    )

    def __post_init__(self):
        # Pass multi-speaker parameters to the model args as `model.init_multispeaker()` looks for it there.
        if self.num_speakers > 0:
            self.model_args.num_speakers = self.num_speakers

        # speaker embedding settings
        if self.use_speaker_embedding:
            self.model_args.use_speaker_embedding = True
        if self.speakers_file:
            self.model_args.speakers_file = self.speakers_file

        # d-vector settings
        if self.use_d_vector_file:
            self.model_args.use_d_vector_file = True
        if self.d_vector_dim is not None and self.d_vector_dim > 0:
            self.model_args.d_vector_dim = self.d_vector_dim
        if self.d_vector_file:
            self.model_args.d_vector_file = self.d_vector_file


================================================
FILE: TTS/tts/configs/fastspeech2_config.py
================================================
from dataclasses import dataclass, field
from typing import List

from TTS.tts.configs.shared_configs import BaseTTSConfig
from TTS.tts.models.forward_tts import ForwardTTSArgs


@dataclass
class Fastspeech2Config(BaseTTSConfig):
    """Configure `ForwardTTS` as FastPitch model.

    Example:

        >>> from TTS.tts.configs.fastspeech2_config import FastSpeech2Config
        >>> config = FastSpeech2Config()

    Args:
        model (str):
            Model name used for selecting the right model at initialization. Defaults to `fast_pitch`.

        base_model (str):
            Name of the base model being configured as this model so that 🐸 TTS knows it needs to initiate
            the base model rather than searching for the `model` implementation. Defaults to `forward_tts`.

        model_args (Coqpit):
            Model class arguments. Check `FastPitchArgs` for more details. Defaults to `FastPitchArgs()`.

        data_dep_init_steps (int):
            Number of steps used for computing normalization parameters at the beginning of the training. GlowTTS uses
            Activation Normalization that pre-computes normalization stats at the beginning and use the same values
            for the rest. Defaults to 10.

        speakers_file (str):
            Path to the file containing the list of speakers. Needed at inference for loading matching speaker ids to
            speaker names. Defaults to `None`.

        use_speaker_embedding (bool):
            enable / disable using speaker embeddings for multi-speaker models. If set True, the model is
            in the multi-speaker mode. Defaults to False.

        use_d_vector_file (bool):
            enable /disable using external speaker embeddings in place of the learned embeddings. Defaults to False.

        d_vector_file (str):
            Path to the file including pre-computed speaker embeddings. Defaults to None.

        d_vector_dim (int):
            Dimension of the external speaker embeddings. Defaults to 0.

        optimizer (str):
            Name of the model optimizer. Defaults to `Adam`.

        optimizer_params (dict):
            Arguments of the model optimizer. Defaults to `{"betas": [0.9, 0.998], "weight_decay": 1e-6}`.

        lr_scheduler (str):
            Name of the learning rate scheduler. Defaults to `Noam`.

        lr_scheduler_params (dict):
            Arguments of the learning rate scheduler. Defaults to `{"warmup_steps": 4000}`.

        lr (float):
            Initial learning rate. Defaults to `1e-3`.

        grad_clip (float):
            Gradient norm clipping value. Defaults to `5.0`.

        spec_loss_type (str):
            Type of the spectrogram loss. Check `ForwardTTSLoss` for possible values. Defaults to `mse`.

        duration_loss_type (str):
            Type of the duration loss. Check `ForwardTTSLoss` for possible values. Defaults to `mse`.

        use_ssim_loss (bool):
            Enable/disable the use of SSIM (Structural Similarity) loss. Defaults to True.

        wd (float):
            Weight decay coefficient. Defaults to `1e-7`.

        ssim_loss_alpha (float):
            Weight for the SSIM loss. If set 0, disables the SSIM loss. Defaults to 1.0.

        dur_loss_alpha (float):
            Weight for the duration predictor's loss. If set 0, disables the huber loss. Defaults to 1.0.

        spec_loss_alpha (float):
            Weight for the L1 spectrogram loss. If set 0, disables the L1 loss. Defaults to 1.0.

        pitch_loss_alpha (float):
            Weight for the pitch predictor's loss. If set 0, disables the pitch predictor. Defaults to 1.0.

        energy_loss_alpha (float):
            Weight for the energy predictor's loss. If set 0, disables the energy predictor. Defaults to 1.0.

        binary_align_loss_alpha (float):
            Weight for the binary loss. If set 0, disables the binary loss. Defaults to 1.0.

        binary_loss_warmup_epochs (float):
            Number of epochs to gradually increase the binary loss impact. Defaults to 150.

        min_seq_len (int):
            Minimum input sequence length to be used at training.

        max_seq_len (int):
            Maximum input sequence length to be used at training. Larger values result in more VRAM usage.

        # dataset configs
        compute_f0(bool):
            Compute pitch. defaults to True

        f0_cache_path(str):
            pith cache path. defaults to None

        # dataset configs
        compute_energy(bool):
            Compute energy. defaults to True

        energy_cache_path(str):
            energy cache path. defaults to None
    """

    model: str = "fastspeech2"
    base_model: str = "forward_tts"

    # model specific params
    model_args: ForwardTTSArgs = ForwardTTSArgs(use_pitch=True, use_energy=True)

    # multi-speaker settings
    num_speakers: int = 0
    speakers_file: str = None
    use_speaker_embedding: bool = False
    use_d_vector_file: bool = False
    d_vector_file: str = False
    d_vector_dim: int = 0

    # optimizer parameters
    optimizer: str = "Adam"
    optimizer_params: dict = field(default_factory=lambda: {"betas": [0.9, 0.998], "weight_decay": 1e-6})
    lr_scheduler: str = "NoamLR"
    lr_scheduler_params: dict = field(default_factory=lambda: {"warmup_steps": 4000})
    lr: float = 1e-4
    grad_clip: float = 5.0

    # loss params
    spec_loss_type: str = "mse"
    duration_loss_type: str = "mse"
    use_ssim_loss: bool = True
    ssim_loss_alpha: float = 1.0
    spec_loss_alpha: float = 1.0
    aligner_loss_alpha: float = 1.0
    pitch_loss_alpha: float = 0.1
    energy_loss_alpha: float = 0.1
    dur_loss_alpha: float = 0.1
    binary_align_loss_alpha: float = 0.1
    binary_loss_warmup_epochs: int = 150

    # overrides
    min_seq_len: int = 13
    max_seq_len: int = 200
    r: int = 1  # DO NOT CHANGE

    # dataset configs
    compute_f0: bool = True
    f0_cache_path: str = None

    # dataset configs
    compute_energy: bool = True
    energy_cache_path: str = None

    # testing
    test_sentences: List[str] = field(
        default_factory=lambda: [
            "It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
            "Be a voice, not an echo.",
            "I'm sorry Dave. I'm afraid I can't do that.",
            "This cake is great. It's so delicious and moist.",
            "Prior to November 22, 1963.",
        ]
    )

    def __post_init__(self):
        # Pass multi-speaker parameters to the model args as `model.init_multispeaker()` looks for it there.
        if self.num_speakers > 0:
            self.model_args.num_speakers = self.num_speakers

        # speaker embedding settings
        if self.use_speaker_embedding:
            self.model_args.use_speaker_embedding = True
        if self.speakers_file:
            self.model_args.speakers_file = self.speakers_file

        # d-vector settings
        if self.use_d_vector_file:
            self.model_args.use_d_vector_file = True
        if self.d_vector_dim is not None and self.d_vector_dim > 0:
            self.model_args.d_vector_dim = self.d_vector_dim
        if self.d_vector_file:
            self.model_args.d_vector_file = self.d_vector_file


================================================
FILE: TTS/tts/configs/glow_tts_config.py
================================================
from dataclasses import dataclass, field
from typing import List

from TTS.tts.configs.shared_configs import BaseTTSConfig


@dataclass
class GlowTTSConfig(BaseTTSConfig):
    """Defines parameters for GlowTTS model.

    Example:

        >>> from TTS.tts.configs.glow_tts_config import GlowTTSConfig
        >>> config = GlowTTSConfig()

    Args:
        model(str):
            Model name used for selecting the right model at initialization. Defaults to `glow_tts`.
        encoder_type (str):
            Type of the encoder used by the model. Look at `TTS.tts.layers.glow_tts.encoder` for more details.
            Defaults to `rel_pos_transformers`.
        encoder_params (dict):
            Parameters used to define the encoder network. Look at `TTS.tts.layers.glow_tts.encoder` for more details.
            Defaults to `{"kernel_size": 3, "dropout_p": 0.1, "num_layers": 6, "num_heads": 2, "hidden_channels_ffn": 768}`
        use_encoder_prenet (bool):
            enable / disable the use of a prenet for the encoder. Defaults to True.
        hidden_channels_enc (int):
            Number of base hidden channels used by the encoder network. It defines the input and the output channel sizes,
            and for some encoder types internal hidden channels sizes too. Defaults to 192.
        hidden_channels_dec (int):
            Number of base hidden channels used by the decoder WaveNet network. Defaults to 192 as in the original work.
        hidden_channels_dp (int):
            Number of layer channels of the duration predictor network. Defaults to 256 as in the original work.
        mean_only (bool):
            If true predict only the mean values by the decoder flow. Defaults to True.
        out_channels (int):
            Number of channels of the model output tensor. Defaults to 80.
        num_flow_blocks_dec (int):
            Number of decoder blocks. Defaults to 12.
        inference_noise_scale (float):
            Noise scale used at inference. Defaults to 0.33.
        kernel_size_dec (int):
            Decoder kernel size. Defaults to 5
        dilation_rate (int):
            Rate to increase dilation by each layer in a decoder block. Defaults to 1.
        num_block_layers (int):
            Number of decoder layers in each decoder block.  Defaults to 4.
        dropout_p_dec (float):
            Dropout rate for decoder. Defaults to 0.1.
        num_speaker (int):
            Number of speaker to define the size of speaker embedding layer. Defaults to 0.
        c_in_channels (int):
            Number of speaker embedding channels. It is set to 512 if embeddings are learned. Defaults to 0.
        num_splits (int):
            Number of split levels in inversible conv1x1 operation. Defaults to 4.
        num_squeeze (int):
            Number of squeeze levels. When squeezing channels increases and time steps reduces by the factor
            'num_squeeze'. Defaults to 2.
        sigmoid_scale (bool):
            enable/disable sigmoid scaling in decoder. Defaults to False.
        mean_only (bool):
            If True, encoder only computes mean value and uses constant variance for each time step. Defaults to true.
        encoder_type (str):
            Encoder module type. Possible values are`["rel_pos_transformer", "gated_conv", "residual_conv_bn", "time_depth_separable"]`
            Check `TTS.tts.layers.glow_tts.encoder` for more details. Defaults to `rel_pos_transformers` as in the original paper.
        encoder_params (dict):
            Encoder module parameters. Defaults to None.
        d_vector_dim (int):
            Channels of external speaker embedding vectors. Defaults to 0.
        data_dep_init_steps (int):
            Number of steps used for computing normalization parameters at the beginning of the training. GlowTTS uses
            Activation Normalization that pre-computes normalization stats at the beginning and use the same values
            for the rest. Defaults to 10.
        style_wav_for_test (str):
            Path to the wav file used for changing the style of the speech. Defaults to None.
        inference_noise_scale (float):
            Variance used for sampling the random noise added to the decoder's input at inference. Defaults to 0.0.
        length_scale (float):
            Multiply the predicted durations with this value to change the speech speed. Defaults to 1.
        use_speaker_embedding (bool):
            enable / disable using speaker embeddings for multi-speaker models. If set True, the model is
            in the multi-speaker mode. Defaults to False.
        use_d_vector_file (bool):
            enable /disable using external speaker embeddings in place of the learned embeddings. Defaults to False.
        d_vector_file (str):
            Path to the file including pre-computed speaker embeddings. Defaults to None.
        noam_schedule (bool):
            enable / disable the use of Noam LR scheduler. Defaults to False.
        warmup_steps (int):
            Number of warm-up steps for the Noam scheduler. Defaults 4000.
        lr (float):
            Initial learning rate. Defaults to `1e-3`.
        wd (float):
            Weight decay coefficient. Defaults to `1e-7`.
        min_seq_len (int):
            Minimum input sequence length to be used at training.
        max_seq_len (int):
            Maximum input sequence length to be used at training. Larger values result in more VRAM usage.
    """

    model: str = "glow_tts"

    # model params
    num_chars: int = None
    encoder_type: str = "rel_pos_transformer"
    encoder_params: dict = field(
        default_factory=lambda: {
            "kernel_size": 3,
            "dropout_p": 0.1,
            "num_layers": 6,
            "num_heads": 2,
            "hidden_channels_ffn": 768,
        }
    )
    use_encoder_prenet: bool = True
    hidden_channels_enc: int = 192
    hidden_channels_dec: int = 192
    hidden_channels_dp: int = 256
    dropout_p_dp: float = 0.1
    dropout_p_dec: float = 0.05
    mean_only: bool = True
    out_channels: int = 80
    num_flow_blocks_dec: int = 12
    inference_noise_scale: float = 0.33
    kernel_size_dec: int = 5
    dilation_rate: int = 1
    num_block_layers: int = 4
    num_speakers: int = 0
    c_in_channels: int = 0
    num_splits: int = 4
    num_squeeze: int = 2
    sigmoid_scale: bool = False
    encoder_type: str = "rel_pos_transformer"
    encoder_params: dict = field(
        default_factory=lambda: {
            "kernel_size": 3,
            "dropout_p": 0.1,
            "num_layers": 6,
            "num_heads": 2,
            "hidden_channels_ffn": 768,
            "input_length": None,
        }
    )
    d_vector_dim: int = 0

    # training params
    data_dep_init_steps: int = 10

    # inference params
    style_wav_for_test: str = None
    inference_noise_scale: float = 0.0
    length_scale: float = 1.0

    # multi-speaker settings
    use_speaker_embedding: bool = False
    speakers_file: str = None
    use_d_vector_file: bool = False
    d_vector_file: str = False

    # optimizer parameters
    optimizer: str = "RAdam"
    optimizer_params: dict = field(default_factory=lambda: {"betas": [0.9, 0.998], "weight_decay": 1e-6})
    lr_scheduler: str = "NoamLR"
    lr_scheduler_params: dict = field(default_factory=lambda: {"warmup_steps": 4000})
    grad_clip: float = 5.0
    lr: float = 1e-3

    # overrides
    min_seq_len: int = 3
    max_seq_len: int = 500
    r: int = 1  # DO NOT CHANGE - TODO: make this immutable once coqpit implements it.

    # testing
    test_sentences: List[str] = field(
        default_factory=lambda: [
            "It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
            "Be a voice, not an echo.",
            "I'm sorry Dave. I'm afraid I can't do that.",
            "This cake is great. It's so delicious and moist.",
            "Prior to November 22, 1963.",
        ]
    )


================================================
FILE: TTS/tts/configs/neuralhmm_tts_config.py
================================================
from dataclasses import dataclass, field
from typing import List

from TTS.tts.configs.shared_configs import BaseTTSConfig


@dataclass
class NeuralhmmTTSConfig(BaseTTSConfig):
    """
    Define parameters for Neural HMM TTS model.

    Example:

        >>> from TTS.tts.configs.overflow_config import OverflowConfig
        >>> config = OverflowConfig()

    Args:
        model (str):
            Model name used to select the right model class to initilize. Defaults to `Overflow`.
        run_eval_steps (int):
            Run evalulation epoch after N steps. If None, waits until training epoch is completed. Defaults to None.
        save_step (int):
            Save local checkpoint every save_step steps. Defaults to 500.
        plot_step (int):
            Plot training stats on the logger every plot_step steps. Defaults to 1.
        model_param_stats (bool):
            Log model parameters stats on the logger dashboard. Defaults to False.
        force_generate_statistics (bool):
            Force generate mel normalization statistics. Defaults to False.
        mel_statistics_parameter_path (str):
            Path to the mel normalization statistics.If the model doesn't finds a file there it will generate statistics.
            Defaults to None.
        num_chars (int):
            Number of characters used by the model. It must be defined before initializing the model. Defaults to None.
        state_per_phone (int):
            Generates N states per phone. Similar, to `add_blank` parameter in GlowTTS but in Overflow it is upsampled by model's encoder. Defaults to 2.
        encoder_in_out_features (int):
            Channels of encoder input and character embedding tensors. Defaults to 512.
        encoder_n_convolutions (int):
            Number of convolution layers in the encoder. Defaults to 3.
        out_channels (int):
            Channels of the final model output. It must match the spectragram size. Defaults to 80.
        ar_order (int):
            Autoregressive order of the model. Defaults to 1. In ablations of Neural HMM it was found that more autoregression while giving more variation hurts naturalness of the synthesised audio.
        sampling_temp (float):
            Variation added to the sample from the latent space of neural HMM. Defaults to 0.334.
        deterministic_transition (bool):
            deterministic duration generation based on duration quantiles as defiend in "S. Ronanki, O. Watts, S. King, and G. E. Henter, “Medianbased generation of synthetic speech durations using a nonparametric approach,” in Proc. SLT, 2016.". Defaults to True.
        duration_threshold (float):
            Threshold for duration quantiles. Defaults to 0.55. Tune this to change the speaking rate of the synthesis, where lower values defines a slower speaking rate and higher values defines a faster speaking rate.
        use_grad_checkpointing (bool):
            Use gradient checkpointing to save memory. In a multi-GPU setting currently pytorch does not supports gradient checkpoint inside a loop so we will have to turn it off then.Adjust depending on whatever get more batch size either by using a single GPU or multi-GPU. Defaults to True.
        max_sampling_time (int):
            Maximum sampling time while synthesising latents from neural HMM. Defaults to 1000.
        prenet_type (str):
            `original` or `bn`. `original` sets the default Prenet and `bn` uses Batch Normalization version of the
            Prenet. Defaults to `original`.
        prenet_dim (int):
            Dimension of the Prenet. Defaults to 256.
        prenet_n_layers (int):
            Number of layers in the Prenet. Defaults to 2.
        prenet_dropout (float):
            Dropout rate of the Prenet. Defaults to 0.5.
        prenet_dropout_at_inference (bool):
            Use dropout at inference time. Defaults to False.
        memory_rnn_dim (int):
            Dimension of the memory LSTM to process the prenet output. Defaults to 1024.
        outputnet_size (list[int]):
            Size of the output network inside the neural HMM. Defaults to [1024].
        flat_start_params (dict):
            Parameters for the flat start initialization of the neural HMM. Defaults to `{"mean": 0.0, "std": 1.0, "transition_p": 0.14}`.
            It will be recomputed when you pass the dataset.
        std_floor (float):
            Floor value for the standard deviation of the neural HMM. Prevents model cheating by putting point mass and getting infinite likelihood at any datapoint. Defaults to 0.01.
            It is called `variance flooring` in standard HMM literature.
        optimizer (str):
            Optimizer to use for training. Defaults to `adam`.
        optimizer_params (dict):
            Parameters for the optimizer. Defaults to `{"weight_decay": 1e-6}`.
        grad_clip (float):
            Gradient clipping threshold. Defaults to 40_000.
        lr (float):
            Learning rate. Defaults to 1e-3.
        lr_scheduler (str):
            Learning rate scheduler for the training. Use one from `torch.optim.Scheduler` schedulers or
            `TTS.utils.training`. Defaults to `None`.
        min_seq_len (int):
            Minimum input sequence length to be used at training.
        max_seq_len (int):
            Maximum input sequence length to be used at training. Larger values result in more VRAM usage.
    """

    model: str = "NeuralHMM_TTS"

    # Training and Checkpoint configs
    run_eval_steps: int = 100
    save_step: int = 500
    plot_step: int = 1
    model_param_stats: bool = False

    # data parameters
    force_generate_statistics: bool = False
    mel_statistics_parameter_path: str = None

    # Encoder parameters
    num_chars: int = None
    state_per_phone: int = 2
    encoder_in_out_features: int = 512
    encoder_n_convolutions: int = 3

    # HMM parameters
    out_channels: int = 80
    ar_order: int = 1
    sampling_temp: float = 0
    deterministic_transition: bool = True
    duration_threshold: float = 0.43
    use_grad_checkpointing: bool = True
    max_sampling_time: int = 1000

    ## Prenet parameters
    prenet_type: str = "original"
    prenet_dim: int = 256
    prenet_n_layers: int = 2
    prenet_dropout: float = 0.5
    prenet_dropout_at_inference: bool = True
    memory_rnn_dim: int = 1024

    ## Outputnet parameters
    outputnet_size: List[int] = field(default_factory=lambda: [1024])
    flat_start_params: dict = field(default_factory=lambda: {"mean": 0.0, "std": 1.0, "transition_p": 0.14})
    std_floor: float = 0.001

    # optimizer parameters
    optimizer: str = "Adam"
    optimizer_params: dict = field(default_factory=lambda: {"weight_decay": 1e-6})
    grad_clip: float = 40000.0
    lr: float = 1e-3
    lr_scheduler: str = None

    # overrides
    min_text_len: int = 10
    max_text_len: int = 500
    min_audio_len: int = 512

    # testing
    test_sentences: List[str] = field(
        default_factory=lambda: [
            "Be a voice, not an echo.",
        ]
    )

    # Extra needed config
    r: int = 1
    use_d_vector_file: bool = False
    use_speaker_embedding: bool = False

    def check_values(self):
        """Validate the hyperparameters.

        Raises:
            AssertionError: when the parameters network is not defined
            AssertionError: transition probability is not between 0 and 1
        """
        assert self.ar_order > 0, "AR order must be greater than 0 it is an autoregressive model."
        assert (
            len(self.outputnet_size) >= 1
        ), f"Parameter Network must have atleast one layer check the config file for parameter network. Provided: {self.parameternetwork}"
        assert (
            0 < self.flat_start_params["transition_p"] < 1
        ), f"Transition probability must be between 0 and 1. Provided: {self.flat_start_params['transition_p']}"


================================================
FILE: TTS/tts/configs/overflow_config.py
================================================
from dataclasses import dataclass, field
from typing import List

from TTS.tts.configs.shared_configs import BaseTTSConfig


@dataclass
class OverflowConfig(BaseTTSConfig):  # The classname has to be camel case
    """
    Define parameters for OverFlow model.

    Example:

        >>> from TTS.tts.configs.overflow_config import OverflowConfig
        >>> config = OverflowConfig()

    Args:
        model (str):
            Model name used to select the right model class to initilize. Defaults to `Overflow`.
        run_eval_steps (int):
            Run evalulation epoch after N steps. If None, waits until training epoch is completed. Defaults to None.
        save_step (int):
            Save local checkpoint every save_step steps. Defaults to 500.
        plot_step (int):
            Plot training stats on the logger every plot_step steps. Defaults to 1.
        model_param_stats (bool):
            Log model parameters stats on the logger dashboard. Defaults to False.
        force_generate_statistics (bool):
            Force generate mel normalization statistics. Defaults to False.
        mel_statistics_parameter_path (str):
            Path to the mel normalization statistics.If the model doesn't finds a file there it will generate statistics.
            Defaults to None.
        num_chars (int):
            Number of characters used by the model. It must be defined before initializing the model. Defaults to None.
        state_per_phone (int):
            Generates N states per phone. Similar, to `add_blank` parameter in GlowTTS but in Overflow it is upsampled by model's encoder. Defaults to 2.
        encoder_in_out_features (int):
            Channels of encoder input and character embedding tensors. Defaults to 512.
        encoder_n_convolutions (int):
            Number of convolution layers in the encoder. Defaults to 3.
        out_channels (int):
            Channels of the final model output. It must match the spectragram size. Defaults to 80.
        ar_order (int):
            Autoregressive order of the model. Defaults to 1. In ablations of Neural HMM it was found that more autoregression while giving more variation hurts naturalness of the synthesised audio.
        sampling_temp (float):
            Variation added to the sample from the latent space of neural HMM. Defaults to 0.334.
        deterministic_transition (bool):
            deterministic duration generation based on duration quantiles as defiend in "S. Ronanki, O. Watts, S. King, and G. E. Henter, “Medianbased generation of synthetic speech durations using a nonparametric approach,” in Proc. SLT, 2016.". Defaults to True.
        duration_threshold (float):
            Threshold for duration quantiles. Defaults to 0.55. Tune this to change the speaking rate of the synthesis, where lower values defines a slower speaking rate and higher values defines a faster speaking rate.
        use_grad_checkpointing (bool):
            Use gradient checkpointing to save memory. In a multi-GPU setting currently pytorch does not supports gradient checkpoint inside a loop so we will have to turn it off then.Adjust depending on whatever get more batch size either by using a single GPU or multi-GPU. Defaults to True.
        max_sampling_time (int):
            Maximum sampling time while synthesising latents from neural HMM. Defaults to 1000.
        prenet_type (str):
            `original` or `bn`. `original` sets the default Prenet and `bn` uses Batch Normalization version of the
            Prenet. Defaults to `original`.
        prenet_dim (int):
            Dimension of the Prenet. Defaults to 256.
        prenet_n_layers (int):
            Number of layers in the Prenet. Defaults to 2.
        prenet_dropout (float):
            Dropout rate of the Prenet. Defaults to 0.5.
        prenet_dropout_at_inference (bool):
            Use dropout at inference time. Defaults to False.
        memory_rnn_dim (int):
            Dimension of the memory LSTM to process the prenet output. Defaults to 1024.
        outputnet_size (list[int]):
            Size of the output network inside the neural HMM. Defaults to [1024].
        flat_start_params (dict):
            Parameters for the flat start initialization of the neural HMM. Defaults to `{"mean": 0.0, "std": 1.0, "transition_p": 0.14}`.
            It will be recomputed when you pass the dataset.
        std_floor (float):
            Floor value for the standard deviation of the neural HMM. Prevents model cheating by putting point mass and getting infinite likelihood at any datapoint. Defaults to 0.01.
            It is called `variance flooring` in standard HMM literature.
        hidden_channels_dec (int):
            Number of base hidden channels used by the decoder WaveNet network. Defaults to 150.
        kernel_size_dec (int):
            Decoder kernel size. Defaults to 5
        dilation_rate (int):
            Rate to increase dilation by each layer in a decoder block. Defaults to 1.
        num_flow_blocks_dec (int):
            Number of decoder layers in each decoder block.  Defaults to 4.
        dropout_p_dec (float):
            Dropout rate of the decoder. Defaults to 0.05.
        num_splits (int):
            Number of split levels in inversible conv1x1 operation. Defaults to 4.
        num_squeeze (int):
            Number of squeeze levels. When squeezing channels increases and time steps reduces by the factor
            'num_squeeze'. Defaults to 2.
        sigmoid_scale (bool):
            enable/disable sigmoid scaling in decoder. Defaults to False.
        c_in_channels (int):
            Unused parameter from GlowTTS's decoder. Defaults to 0.
        optimizer (str):
            Optimizer to use for training. Defaults to `adam`.
        optimizer_params (dict):
            Parameters for the optimizer. Defaults to `{"weight_decay": 1e-6}`.
        grad_clip (float):
            Gradient clipping threshold. Defaults to 40_000.
        lr (float):
            Learning rate. Defaults to 1e-3.
        lr_scheduler (str):
            Learning rate scheduler for the training. Use one from `torch.optim.Scheduler` schedulers or
            `TTS.utils.training`. Defaults to `None`.
        min_seq_len (int):
            Minimum input sequence length to be used at training.
        max_seq_len (int):
            Maximum input sequence length to be used at training. Larger values result in more VRAM usage.
    """

    model: str = "Overflow"

    # Training and Checkpoint configs
    run_eval_steps: int = 100
    save_step: int = 500
    plot_step: int = 1
    model_param_stats: bool = False

    # data parameters
    force_generate_statistics: bool = False
    mel_statistics_parameter_path: str = None

    # Encoder parameters
    num_chars: int = None
    state_per_phone: int = 2
    encoder_in_out_features: int = 512
    encoder_n_convolutions: int = 3

    # HMM parameters
    out_channels: int = 80
    ar_order: int = 1
    sampling_temp: float = 0.334
    deterministic_transition: bool = True
    duration_threshold: float = 0.55
    use_grad_checkpointing: bool = True
    max_sampling_time: int = 1000

    ## Prenet parameters
    prenet_type: str = "original"
    prenet_dim: int = 256
    prenet_n_layers: int = 2
    prenet_dropout: float = 0.5
    prenet_dropout_at_inference: bool = False
    memory_rnn_dim: int = 1024

    ## Outputnet parameters
    outputnet_size: List[int] = field(default_factory=lambda: [1024])
    flat_start_params: dict = field(default_factory=lambda: {"mean": 0.0, "std": 1.0, "transition_p": 0.14})
    std_floor: float = 0.01

    # Decoder parameters
    hidden_channels_dec: int = 150
    kernel_size_dec: int = 5
    dilation_rate: int = 1
    num_flow_blocks_dec: int = 12
    num_block_layers: int = 4
    dropout_p_dec: float = 0.05
    num_splits: int = 4
    num_squeeze: int = 2
    sigmoid_scale: bool = False
    c_in_channels: int = 0

    # optimizer parameters
    optimizer: str = "Adam"
    optimizer_params: dict = field(default_factory=lambda: {"weight_decay": 1e-6})
    grad_clip: float = 40000.0
    lr: float = 1e-3
    lr_scheduler: str = None

    # overrides
    min_text_len: int = 10
    max_text_len: int = 500
    min_audio_len: int = 512

    # testing
    test_sentences: List[str] = field(
        default_factory=lambda: [
            "Be a voice, not an echo.",
        ]
    )

    # Extra needed config
    r: int = 1
    use_d_vector_file: bool = False
    use_speaker_embedding: bool = False

    def check_values(self):
        """Validate the hyperparameters.

        Raises:
            AssertionError: when the parameters network is not defined
            AssertionError: transition probability is not between 0 and 1
        """
        assert self.ar_order > 0, "AR order must be greater than 0 it is an autoregressive model."
        assert (
            len(self.outputnet_size) >= 1
        ), f"Parameter Network must have atleast one layer check the config file for parameter network. Provided: {self.parameternetwork}"
        assert (
            0 < self.flat_start_params["transition_p"] < 1
        ), f"Transition probability must be between 0 and 1. Provided: {self.flat_start_params['transition_p']}"


================================================
FILE: TTS/tts/configs/shared_configs.py
================================================
from dataclasses import asdict, dataclass, field
from typing import Dict, List

from coqpit import Coqpit, check_argument

from TTS.config import BaseAudioConfig, BaseDatasetConfig, BaseTrainingConfig


@dataclass
class GSTConfig(Coqpit):
    """Defines the Global Style Token Module

    Args:
        gst_style_input_wav (str):
            Path to the wav file used to define the style of the output speech at inference. Defaults to None.

        gst_style_input_weights (dict):
            Defines the weights for each style token used at inference. Defaults to None.

        gst_embedding_dim (int):
            Defines the size of the GST embedding vector dimensions. Defaults to 256.

        gst_num_heads (int):
            Number of attention heads used by the multi-head attention. Defaults to 4.

        gst_num_style_tokens (int):
            Number of style token vectors. Defaults to 10.
    """

    gst_style_input_wav: str = None
    gst_style_input_weights: dict = None
    gst_embedding_dim: int = 256
    gst_use_speaker_embedding: bool = False
    gst_num_heads: int = 4
    gst_num_style_tokens: int = 10

    def check_values(
        self,
    ):
        """Check config fields"""
        c = asdict(self)
        super().check_values()
        check_argument("gst_style_input_weights", c, restricted=False)
        check_argument("gst_style_input_wav", c, restricted=False)
        check_argument("gst_embedding_dim", c, restricted=True, min_val=0, max_val=1000)
        check_argument("gst_use_speaker_embedding", c, restricted=False)
        check_argument("gst_num_heads", c, restricted=True, min_val=2, max_val=10)
        check_argument("gst_num_style_tokens", c, restricted=True, min_val=1, max_val=1000)


@dataclass
class CapacitronVAEConfig(Coqpit):
    """Defines the capacitron VAE Module
    Args:
        capacitron_capacity (int):
            Defines the variational capacity limit of the prosody embeddings. Defaults to 150.
        capacitron_VAE_embedding_dim (int):
            Defines the size of the Capacitron embedding vector dimension. Defaults to 128.
        capacitron_use_text_summary_embeddings (bool):
            If True, use a text summary embedding in Capacitron. Defaults to True.
        capacitron_text_summary_embedding_dim (int):
            Defines the size of the capacitron text embedding vector dimension. Defaults to 128.
        capacitron_use_speaker_embedding (bool):
            if True use speaker embeddings in Capacitron. Defaults to False.
        capacitron_VAE_loss_alpha (float):
            Weight for the VAE loss of the Tacotron model. If set less than or equal to zero, it disables the
            corresponding loss function. Defaults to 0.25
        capacitron_grad_clip (float):
            Gradient clipping value for all gradients except beta. Defaults to 5.0
    """

    capacitron_loss_alpha: int = 1
    capacitron_capacity: int = 150
    capacitron_VAE_embedding_dim: int = 128
    capacitron_use_text_summary_embeddings: bool = True
    capacitron_text_summary_embedding_dim: int = 128
    capacitron_use_speaker_embedding: bool = False
    capacitron_VAE_loss_alpha: float = 0.25
    capacitron_grad_clip: float = 5.0

    def check_values(
        self,
    ):
        """Check config fields"""
        c = asdict(self)
        super().check_values()
        check_argument("capacitron_capacity", c, restricted=True, min_val=10, max_val=500)
        check_argument("capacitron_VAE_embedding_dim", c, restricted=True, min_val=16, max_val=1024)
        check_argument("capacitron_use_speaker_embedding", c, restricted=False)
        check_argument("capacitron_text_summary_embedding_dim", c, restricted=False, min_val=16, max_val=512)
        check_argument("capacitron_VAE_loss_alpha", c, restricted=False)
        check_argument("capacitron_grad_clip", c, restricted=False)


@dataclass
class CharactersConfig(Coqpit):
    """Defines arguments for the `BaseCharacters` or `BaseVocabulary` and their subclasses.

    Args:
        characters_class (str):
            Defines the class of the characters used. If None, we pick ```Phonemes``` or ```Graphemes``` based on
            the configuration. Defaults to None.

        vocab_dict (dict):
            Defines the vocabulary dictionary used to encode the characters. Defaults to None.

        pad (str):
            characters in place of empty padding. Defaults to None.

        eos (str):
            characters showing the end of a sentence. Defaults to None.

        bos (str):
            characters showing the beginning of a sentence. Defaults to None.

        blank (str):
            Optional character used between characters by some models for better prosody. Defaults to `_blank`.

        characters (str):
            character set used by the model. Characters not in this list are ignored when converting input text to
            a list of sequence IDs. Defaults to None.

        punctuations (str):
            characters considered as punctuation as parsing the input sentence. Defaults to None.

        phonemes (str):
            characters considered as parsing phonemes. This is only for backwards compat. Use `characters` for new
            models. Defaults to None.

        is_unique (bool):
            remove any duplicate characters in the character lists. It is a bandaid for compatibility with the old
            models trained with character lists with duplicates. Defaults to True.

        is_sorted (bool):
            Sort the characters in alphabetical order. Defaults to True.
    """

    characters_class: str = None

    # using BaseVocabulary
    vocab_dict: Dict = None

    # using on BaseCharacters
    pad: str = None
    eos: str = None
    bos: str = None
    blank: str = None
    characters: str = None
    punctuations: str = None
    phonemes: str = None
    is_unique: bool = True  # for backwards compatibility of models trained with char sets with duplicates
    is_sorted: bool = True


@dataclass
class BaseTTSConfig(BaseTrainingConfig):
    """Shared parameters among all the tts models.

    Args:

        audio (BaseAudioConfig):
            Audio processor config object instance.

        use_phonemes (bool):
            enable / disable phoneme use.

        phonemizer (str):
            Name of the phonemizer to use. If set None, the phonemizer will be selected by `phoneme_language`.
            Defaults to None.

        phoneme_language (str):
            Language code for the phonemizer. You can check the list of supported languages by running
            `python TTS/tts/utils/text/phonemizers/__init__.py`. Defaults to None.

        compute_input_seq_cache (bool):
            enable / disable precomputation of the phoneme sequences. At the expense of some delay at the beginning of
            the training, It allows faster data loader time and precise limitation with `max_seq_len` and
            `min_seq_len`.

        text_cleaner (str):
            Name of the text cleaner used for cleaning and formatting transcripts.

        enable_eos_bos_chars (bool):
            enable / disable the use of eos and bos characters.

        test_senteces_file (str):
            Path to a txt file that has sentences used at test time. The file must have a sentence per line.

        phoneme_cache_path (str):
            Path to the output folder caching the computed phonemes for each sample.

        characters (CharactersConfig):
            Instance of a CharactersConfig class.

        batch_group_size (int):
            Size of the batch groups used for bucketing. By default, the dataloader orders samples by the sequence
            length for a more efficient and stable training. If `batch_group_size > 1` then it performs bucketing to
            prevent using the same batches for each epoch.

        loss_masking (bool):
            enable / disable masking loss values against padded segments of samples in a batch.

        min_text_len (int):
            Minimum length of input text to be used. All shorter samples will be ignored. Defaults to 0.

        max_text_len (int):
            Maximum length of input text to be used. All longer samples will be ignored. Defaults to float("inf").

        min_audio_len (int):
            Minimum length of input audio to be used. All shorter samples will be ignored. Defaults to 0.

        max_audio_len (int):
            Maximum length of input audio to be used. All longer samples will be ignored. The maximum length in the
            dataset defines the VRAM used in the training. Hence, pay attention to this value if you encounter an
            OOM error in training. Defaults to float("inf").

        compute_f0 (int):
            (Not in use yet).

        compute_energy (int):
            (Not in use yet).

        compute_linear_spec (bool):
            If True data loader computes and returns linear spectrograms alongside the other data.

        precompute_num_workers (int):
            Number of workers to precompute features. Defaults to 0.

        use_noise_augment (bool):
            Augment the input audio with random noise.

        start_by_longest (bool):
            If True, the data loader will start loading the longest batch first. It is useful for checking OOM issues.
            Defaults to False.

        shuffle (bool):
            If True, the data loader will shuffle the dataset when there is not sampler defined. Defaults to True.

        drop_last (bool):
            If True, the data loader will drop the last batch if it is not complete. It helps to prevent
            issues that emerge from the partial batch statistics. Defaults to True.

        add_blank (bool):
            Add blank characters between each other two characters. It improves performance for some models at expense
            of slower run-time due to the longer input sequence.

        datasets (List[BaseDatasetConfig]):
            List of datasets used for training. If multiple datasets are provided, they are merged and used together
            for training.

        optimizer (str):
            Optimizer used for the training. Set one from `torch.optim.Optimizer` or `TTS.utils.training`.
            Defaults to ``.

        optimizer_params (dict):
            Optimizer kwargs. Defaults to `{"betas": [0.8, 0.99], "weight_decay": 0.0}`

        lr_scheduler (str):
            Learning rate scheduler for the training. Use one from `torch.optim.Scheduler` schedulers or
            `TTS.utils.training`. Defaults to ``.

        lr_scheduler_params (dict):
            Parameters for the generator learning rate scheduler. Defaults to `{"warmup": 4000}`.

        test_sentences (List[str]):
            List of sentences to be used at testing. Defaults to '[]'

        eval_split_max_size (int):
            Number maximum of samples to be used for evaluation in proportion split. Defaults to None (Disabled).

        eval_split_size (float):
            If between 0.0 and 1.0 represents the proportion of the dataset to include in the evaluation set.
            If > 1, represents the absolute number of evaluation samples. Defaults to 0.01 (1%).

        use_speaker_weighted_sampler (bool):
            Enable / Disable the batch balancer by speaker. Defaults to ```False```.

        speaker_weighted_sampler_alpha (float):
            Number that control the influence of the speaker sampler weights. Defaults to ```1.0```.

        use_language_weighted_sampler (bool):
            Enable / Disable the batch balancer by language. Defaults to ```False```.

        language_weighted_sampler_alpha (float):
            Number that control the influence of the language sampler weights. Defaults to ```1.0```.

        use_length_weighted_sampler (bool):
            Enable / Disable the batch balancer by audio length. If enabled the dataset will be divided
            into 10 buckets considering the min and max audio of the dataset. The sampler weights will be
            computed forcing to have the same quantity of data for each bucket in each training batch. Defaults to ```False```.

        length_weighted_sampler_alpha (float):
            Number that control the influence of the length sampler weights. Defaults to ```1.0```.
    """

    audio: BaseAudioConfig = field(default_factory=BaseAudioConfig)
    # phoneme settings
    use_phonemes: bool = False
    phonemizer: str = None
    phoneme_language: str = None
    compute_input_seq_cache: bool = False
    text_cleaner: str = None
    enable_eos_bos_chars: bool = False
    test_sentences_file: str = ""
    phoneme_cache_path: str = None
    # vocabulary parameters
    characters: CharactersConfig = None
    add_blank: bool = False
    # training params
    batch_group_size: int = 0
    loss_masking: bool = None
    # dataloading
    min_audio_len: int = 1
    max_audio_len: int = float("inf")
    min_text_len: int = 1
    max_text_len: int = float("inf")
    compute_f0: bool = False
    compute_energy: bool = False
    compute_linear_spec: bool = False
    precompute_num_workers: int = 0
    use_noise_augment: bool = False
    start_by_longest: bool = False
    shuffle: bool = False
    drop_last: bool = False
    # dataset
    datasets: List[BaseDatasetConfig] = field(default_factory=lambda: [BaseDatasetConfig()])
    # optimizer
    optimizer: str = "radam"
    optimizer_params: dict = None
    # scheduler
    lr_scheduler: str = None
    lr_scheduler_params: dict = field(default_factory=lambda: {})
    # testing
    test_sentences: List[str] = field(default_factory=lambda: [])
    # evaluation
    eval_split_max_size: int = None
    eval_split_size: float = 0.01
    # weighted samplers
    use_speaker_weighted_sampler: bool = False
    speaker_weighted_sampler_alpha: float = 1.0
    use_language_weighted_sampler: bool = False
    language_weighted_sampler_alpha: float = 1.0
    use_length_weighted_sampler: bool = False
    length_weighted_sampler_alpha: float = 1.0


================================================
FILE: TTS/tts/configs/speedy_speech_config.py
================================================
from dataclasses import dataclass, field
from typing import List

from TTS.tts.configs.shared_configs import BaseTTSConfig
from TTS.tts.models.forward_tts import ForwardTTSArgs


@dataclass
class SpeedySpeechConfig(BaseTTSConfig):
    """Configure `ForwardTTS` as SpeedySpeech model.

    Example:

        >>> from TTS.tts.configs.speedy_speech_config import SpeedySpeechConfig
        >>> config = SpeedySpeechConfig()

     Args:
        model (str):
            Model name used for selecting the right model at initialization. Defaults to `speedy_speech`.

        base_model (str):
            Name of the base model being configured as this model so that 🐸 TTS knows it needs to initiate
            the base model rather than searching for the `model` implementation. Defaults to `forward_tts`.

        model_args (Coqpit):
            Model class arguments. Check `FastPitchArgs` for more details. Defaults to `FastPitchArgs()`.

        data_dep_init_steps (int):
            Number of steps used for computing normalization parameters at the beginning of the training. GlowTTS uses
            Activation Normalization that pre-computes normalization stats at the beginning and use the same values
            for the rest. Defaults to 10.

        speakers_file (str):
            Path to the file containing the list of speakers. Needed at inference for loading matching speaker ids to
            speaker names. Defaults to `None`.

        use_speaker_embedding (bool):
            enable / disable using speaker embeddings for multi-speaker models. If set True, the model is
            in the multi-speaker mode. Defaults to False.

        use_d_vector_file (bool):
            enable /disable using external speaker embeddings in place of the learned embeddings. Defaults to False.

        d_vector_file (str):
            Path to the file including pre-computed speaker embeddings. Defaults to None.

        d_vector_dim (int):
            Dimension of the external speaker embeddings. Defaults to 0.

        optimizer (str):
            Name of the model optimizer. Defaults to `RAdam`.

        optimizer_params (dict):
            Arguments of the model optimizer. Defaults to `{"betas": [0.9, 0.998], "weight_decay": 1e-6}`.

        lr_scheduler (str):
            Name of the learning rate scheduler. Defaults to `Noam`.

        lr_scheduler_params (dict):
            Arguments of the learning rate scheduler. Defaults to `{"warmup_steps": 4000}`.

        lr (float):
            Initial learning rate. Defaults to `1e-3`.

        grad_clip (float):
            Gradient norm clipping value. Defaults to `5.0`.

        spec_loss_type (str):
            Type of the spectrogram loss. Check `ForwardTTSLoss` for possible values. Defaults to `l1`.

        duration_loss_type (str):
            Type of the duration loss. Check `ForwardTTSLoss` for possible values. Defaults to `huber`.

        use_ssim_loss (bool):
            Enable/disable the use of SSIM (Structural Similarity) loss. Defaults to True.

        wd (float):
            Weight decay coefficient. Defaults to `1e-7`.

        ssim_loss_alpha (float):
            Weight for the SSIM loss. If set 0, disables the SSIM loss. Defaults to 1.0.

        dur_loss_alpha (float):
            Weight for the duration predictor's loss. If set 0, disables the huber loss. Defaults to 1.0.

        spec_loss_alpha (float):
            Weight for the L1 spectrogram loss. If set 0, disables the L1 loss. Defaults to 1.0.

        binary_loss_alpha (float):
            Weight for the binary loss. If set 0, disables the binary loss. Defaults to 1.0.

        binary_loss_warmup_epochs (float):
            Number of epochs to gradually increase the binary loss impact. Defaults to 150.

        min_seq_len (int):
            Minimum input sequence length to be used at training.

        max_seq_len (int):
            Maximum input sequence length to be used at training. Larger values result in more VRAM usage.
    """

    model: str = "speedy_speech"
    base_model: str = "forward_tts"

    # set model args as SpeedySpeech
    model_args: ForwardTTSArgs = ForwardTTSArgs(
        use_pitch=False,
        encoder_type="residual_conv_bn",
        encoder_params={
            "kernel_size": 4,
            "dilations": 4 * [1, 2, 4] + [1],
            "num_conv_blocks": 2,
            "num_res_blocks": 13,
        },
        decoder_type="residual_conv_bn",
        decoder_params={
            "kernel_size": 4,
            "dilations": 4 * [1, 2, 4, 8] + [1],
            "num_conv_blocks": 2,
            "num_res_blocks": 17,
        },
        out_channels=80,
        hidden_channels=128,
        positional_encoding=True,
        detach_duration_predictor=True,
    )

    # multi-speaker settings
    num_speakers: int = 0
    speakers_file: str = None
    use_speaker_embedding: bool = False
    use_d_vector_file: bool = False
    d_vector_file: str = False
    d_vector_dim: int = 0

    # optimizer parameters
    optimizer: str = "Adam"
    optimizer_params: dict = field(default_factory=lambda: {"betas": [0.9, 0.998], "weight_decay": 1e-6})
    lr_scheduler: str = "NoamLR"
    lr_scheduler_params: dict = field(default_factory=lambda: {"warmup_steps": 4000})
    lr: float = 1e-4
    grad_clip: float = 5.0

    # loss params
    spec_loss_type: str = "l1"
    duration_loss_type: str = "huber"
    use_ssim_loss: bool = False
    ssim_loss_alpha: float = 1.0
    dur_loss_alpha: float = 1.0
    spec_loss_alpha: float = 1.0
    aligner_loss_alpha: float = 1.0
    binary_align_loss_alpha: float = 0.3
    binary_loss_warmup_epochs: int = 150

    # overrides
    min_seq_len: int = 13
    max_seq_len: int = 200
    r: int = 1  # DO NOT CHANGE

    # dataset configs
    compute_f0: bool = False
    f0_cache_path: str = None

    # testing
    test_sentences: List[str] = field(
        default_factory=lambda: [
            "It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
            "Be a voice, not an echo.",
            "I'm sorry Dave. I'm afraid I can't do that.",
            "This cake is great. It's so delicious and moist.",
            "Prior to November 22, 1963.",
        ]
    )

    def __post_init__(self):
        # Pass multi-speaker parameters to the model args as `model.init_multispeaker()` looks for it there.
        if self.num_speakers > 0:
            self.model_args.num_speakers = self.num_speakers

        # speaker embedding settings
        if self.use_speaker_embedding:
            self.model_args.use_speaker_embedding = True
        if self.speakers_file:
            self.model_args.speakers_file = self.speakers_file

        # d-vector settings
        if self.use_d_vector_file:
            self.model_args.use_d_vector_file = True
        if self.d_vector_dim is not None and self.d_vector_dim > 0:
            self.model_args.d_vector_dim = self.d_vector_dim
        if self.d_vector_file:
            self.model_args.d_vector_file = self.d_vector_file


================================================
FILE: TTS/tts/configs/tacotron2_config.py
================================================
from dataclasses import dataclass

from TTS.tts.configs.tacotron_config import TacotronConfig


@dataclass
class Tacotron2Config(TacotronConfig):
    """Defines parameters for Tacotron2 based models.

    Example:

        >>> from TTS.tts.configs.tacotron2_config import Tacotron2Config
        >>> config = Tacotron2Config()

    Check `TacotronConfig` for argument descriptions.
    """

    model: str = "tacotron2"
    out_channels: int = 80
    encoder_in_features: int = 512
    decoder_in_features: int = 512


================================================
FILE: TTS/tts/configs/tacotron_config.py
================================================
from dataclasses import dataclass, field
from typing import List

from TTS.tts.configs.shared_configs import BaseTTSConfig, CapacitronVAEConfig, GSTConfig


@dataclass
class TacotronConfig(BaseTTSConfig):
    """Defines parameters for Tacotron based models.

    Example:

        >>> from TTS.tts.configs.tacotron_config import TacotronConfig
        >>> config = TacotronConfig()

    Args:
        model (str):
            Model name used to select the right model class to initilize. Defaults to `Tacotron`.
        use_gst (bool):
            enable / disable the use of Global Style Token modules. Defaults to False.
        gst (GSTConfig):
            Instance of `GSTConfig` class.
        gst_style_input (str):
            Path to the wav file used at inference to set the speech style through GST. If `GST` is enabled and
            this is not defined, the model uses a zero vector as an input. Defaults to None.
        use_capacitron_vae (bool):
            enable / disable the use of Capacitron modules. Defaults to False.
        capacitron_vae (CapacitronConfig):
            Instance of `CapacitronConfig` class.
        num_chars (int):
            Number of characters used by the model. It must be defined before initializing the model. Defaults to None.
        num_speakers (int):
            Number of speakers for multi-speaker models. Defaults to 1.
        r (int):
            Initial number of output frames that the decoder computed per iteration. Larger values makes training and inference
            faster but reduces the quality of the output frames. This must be equal to the largest `r` value used in
            `gradual_training` schedule. Defaults to 1.
        gradual_training (List[List]):
            Parameters for the gradual training schedule. It is in the form `[[a, b, c], [d ,e ,f] ..]` where `a` is
            the step number to start using the rest of the values, `b` is the `r` value and `c` is the batch size.
            If sets None, no gradual training is used. Defaults to None.
        memory_size (int):
            Defines the number of previous frames used by the Prenet. If set to < 0, then it uses only the last frame.
            Defaults to -1.
        prenet_type (str):
            `original` or `bn`. `original` sets the default Prenet and `bn` uses Batch Normalization version of the
            Prenet. Defaults to `original`.
        prenet_dropout (bool):
            enables / disables the use of dropout in the Prenet. Defaults to True.
        prenet_dropout_at_inference (bool):
            enable / disable the use of dropout in the Prenet at the inference time. Defaults to False.
        stopnet (bool):
            enable /disable the Stopnet that predicts the end of the decoder sequence. Defaults to True.
        stopnet_pos_weight (float):
            Weight that is applied to over-weight positive instances in the Stopnet loss. Use larger values with
            datasets with longer sentences. Defaults to 0.2.
        max_decoder_steps (int):
            Max number of steps allowed for the decoder. Defaults to 50.
        encoder_in_features (int):
            Channels of encoder input and character embedding tensors. Defaults to 256.
        decoder_in_features (int):
            Channels of decoder input and encoder output tensors. Defaults to 256.
        out_channels (int):
            Channels of the final model output. It must match the spectragram size. Defaults to 80.
        separate_stopnet (bool):
            Use a distinct Stopnet which is trained separately from the rest of the model. Defaults to True.
        attention_type (str):
            attention type. Check ```TTS.tts.layers.attentions.init_attn```. Defaults to 'original'.
        attention_heads (int):
            Number of attention heads for GMM attention. Defaults to 5.
        windowing (bool):
            It especially useful at inference to keep attention alignment diagonal. Defaults to False.
        use_forward_attn (bool):
            It is only valid if ```attn_type``` is ```original```.  Defaults to False.
        forward_attn_mask (bool):
            enable/disable extra masking over forward attention. It is useful at inference to prevent
            possible attention failures. Defaults to False.
        transition_agent (bool):
            enable/disable transition agent in forward attention. Defaults to False.
        location_attn (bool):
            enable/disable location sensitive attention as in the original Tacotron2 paper.
            It is only valid if ```attn_type``` is ```original```. Defaults to True.
        bidirectional_decoder (bool):
            enable/disable bidirectional decoding. Defaults to False.
        double_decoder_consistency (bool):
            enable/disable double decoder consistency. Defaults to False.
        ddc_r (int):
            reduction rate used by the coarse decoder when `double_decoder_consistency` is in use. Set this
            as a multiple of the `r` value. Defaults to 6.
        speakers_file (str):
            Path to the speaker mapping file for the Speaker Manager. Defaults to None.
        use_speaker_embedding (bool):
            enable / disable using speaker embeddings for multi-speaker models. If set True, the model is
            in the multi-speaker mode. Defaults to False.
        use_d_vector_file (bool):
            enable /disable using external speaker embeddings in place of the learned embeddings. Defaults to False.
        d_vector_file (str):
            Path to the file including pre-computed speaker embeddings. Defaults to None.
        optimizer (str):
            Optimizer used for the training. Set one from `torch.optim.Optimizer` or `TTS.utils.training`.
            Defaults to `RAdam`.
        optimizer_params (dict):
            Optimizer kwargs. Defaults to `{"betas": [0.8, 0.99], "weight_decay": 0.0}`
        lr_scheduler (str):
            Learning rate scheduler for the training. Use one from `torch.optim.Scheduler` schedulers or
            `TTS.utils.training`. Defaults to `NoamLR`.
        lr_scheduler_params (dict):
            Parameters for the generator learning rate scheduler. Defaults to `{"warmup": 4000}`.
        lr (float):
            Initial learning rate. Defaults to `1e-4`.
        wd (float):
            Weight decay coefficient. Defaults to `1e-6`.
        grad_clip (float):
            Gradient clipping threshold. Defaults to `5`.
        seq_len_norm (bool):
            enable / disable the sequnce length normalization in the loss functions. If set True, loss of a sample
            is divided by the sequence length. Defaults to False.
        loss_masking (bool):
            enable / disable masking the paddings of the samples in loss computation. Defaults to True.
        decoder_loss_alpha (float):
            Weight for the decoder loss of the Tacotron model. If set less than or equal to zero, it disables the
            corresponding loss function. Defaults to 0.25
        postnet_loss_alpha (float):
            Weight for the postnet loss of the Tacotron model. If set less than or equal to zero, it disables the
            corresponding loss function. Defaults to 0.25
        postnet_diff_spec_alpha (float):
            Weight for the postnet differential loss of the Tacotron model. If set less than or equal to zero, it disables the
            corresponding loss function. Defaults to 0.25
        decoder_diff_spec_alpha (float):

            Weight for the decoder differential loss of the Tacotron model. If set less than or equal to zero, it disables the
            corresponding loss function. Defaults to 0.25
        decoder_ssim_alpha (float):
            Weight for the decoder SSIM loss of the Tacotron model. If set less than or equal to zero, it disables the
            corresponding loss function. Defaults to 0.25
        postnet_ssim_alpha (float):
            Weight for the postnet SSIM loss of the Tacotron model. If set less than or equal to zero, it disables the
            corresponding loss function. Defaults to 0.25
        ga_alpha (float):
            Weight for the guided attention loss. If set less than or equal to zero, it disables the corresponding loss
            function. Defaults to 5.
    """

    model: str = "tacotron"
    # model_params: TacotronArgs = field(default_factory=lambda: TacotronArgs())
    use_gst: bool = False
    gst: GSTConfig = None
    gst_style_input: str = None

    use_capacitron_vae: bool = False
    capacitron_vae: CapacitronVAEConfig = None

    # model specific params
    num_speakers: int = 1
    num_chars: int = 0
    r: int = 2
    gradual_training: List[List[int]] = None
    memory_size: int = -1
    prenet_type: str = "original"
    prenet_dropout: bool = True
    prenet_dropout_at_inference: bool = False
    stopnet: bool = True
    separate_stopnet: bool = True
    stopnet_pos_weight: float = 0.2
    max_decoder_steps: int = 10000
    encoder_in_features: int = 256
    decoder_in_features: int = 256
    decoder_output_dim: int = 80
    out_channels: int = 513

    # attention layers
    attention_type: str = "original"
    attention_heads: int = None
    attention_norm: str = "sigmoid"
    attention_win: bool = False
    windowing: bool = False
    use_forward_attn: bool = False
    forward_attn_mask: bool = False
    transition_agent: bool = False
    location_attn: bool = True

    # advance methods
    bidirectional_decoder: bool = False
    double_decoder_consistency: bool = False
    ddc_r: int = 6

    # multi-speaker settings
    speakers_file: str = None
    use_speaker_embedding: bool = False
    speaker_embedding_dim: int = 512
    use_d_vector_file: bool = False
    d_vector_file: str = False
    d_vector_dim: int = None

    # optimizer parameters
    optimizer: str = "RAdam"
    optimizer_params: dict = field(default_factory=lambda: {"betas": [0.9, 0.998], "weight_decay": 1e-6})
    lr_scheduler: str = "NoamLR"
    lr_scheduler_params: dict = field(default_factory=lambda: {"warmup_steps": 4000})
    lr: float = 1e-4
    grad_clip: float = 5.0
    seq_len_norm: bool = False
    loss_masking: bool = True

    # loss params
    decoder_loss_alpha: float = 0.25
    postnet_loss_alpha: float = 0.25
    postnet_diff_spec_alpha: float = 0.25
    decoder_diff_spec_alpha: float = 0.25
    decoder_ssim_alpha: float = 0.25
    postnet_ssim_alpha: float = 0.25
    ga_alpha: float = 5.0

    # testing
    test_sentences: List[str] = field(
        default_factory=lambda: [
            "It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent.",
            "Be a voice, not an echo.",
            "I'm sorry Dave. I'm afraid I can't do that.",
            "This cake is great. It's so delicious and moist.",
            "Prior to November 22, 1963.",
        ]
    )

    def check_values(self):
        if self.gradual_training:
            assert (
                self.gradual_training[0][1] == self.r
            ), f"[!] the first scheduled gradual training `r` must be equal to the model's `r` value. {self.gradual_training[0][1]} vs {self.r}"
        if self.model == "tacotron" and self.audio is not None:
            assert self.out_channels == (
                self.audio.fft_size // 2 + 1
            ), f"{self.out_channels} vs {self.audio.fft_size // 2 + 1}"
        if self.model == "tacotron2" and self.audio is not None:
            assert self.out_channels == self.audio.num_mels


================================================
FILE: TTS/tts/configs/vits_config.py
================================================
from dataclasses import dataclass, field
from typing import List

from TTS.tts.configs.shared_configs import BaseTTSConfig
from TTS.tts.models.vits import VitsArgs, VitsAudioConfig


@dataclass
class VitsConfig(BaseTTSConfig):
    """Defines parameters for VITS End2End TTS model.

    Args:
        model (str):
            Model name. Do not change unless you know what you are doing.

        model_args (VitsArgs):
            Model architecture arguments. Defaults to `VitsArgs()`.

        audio (VitsAudioConfig):
            Audio processing configuration. Defaults to `VitsAudioConfig()`.

        grad_clip (List):
            Gradient clipping thresholds for each optimizer. Defaults to `[1000.0, 1000.0]`.

        lr_gen (float):
            Initial learning rate for the generator. Defaults to 0.0002.

        lr_disc (float):
            Initial learning rate for the discriminator. Defaults to 0.0002.

        lr_scheduler_gen (str):
            Name of the learning rate scheduler for the generator. One of the `torch.optim.lr_scheduler.*`. Defaults to
            `ExponentialLR`.

        lr_scheduler_gen_params (dict):
            Parameters for the learning rate scheduler of the generator. Defaults to `{'gamma': 0.999875, "last_epoch":-1}`.

        lr_scheduler_disc (str):
            Name of the learning rate scheduler for the discriminator. One of the `torch.optim.lr_scheduler.*`. Defaults to
            `ExponentialLR`.

        lr_scheduler_disc_params (dict):
            Parameters for the learning rate scheduler of the discriminator. Defaults to `{'gamma': 0.999875, "last_epoch":-1}`.

        scheduler_after_epoch (bool):
            If true, step the schedulers after each epoch else after each step. Defaults to `False`.

        optimizer (str):
            Name of the optimizer to use with both the generator and the discriminator networks. One of the
            `torch.optim.*`. Defaults to `AdamW`.

        kl_loss_alpha (float):
            Loss weight for KL loss. Defaults to 1.0.

        disc_loss_alpha (float):
            Loss weight for the discriminator loss. Defaults to 1.0.

        gen_loss_alpha (float):
            Loss weight for the generator loss. Defaults to 1.0.

        feat_loss_alpha (float):
            Loss weight for the feature matching loss. Defaults to 1.0.

        mel_loss_alpha (float):
            Loss weight for the mel loss. Defaults to 45.0.

        return_wav (bool):
            If true, data loader returns the waveform as well as the other outputs. Do not change. Defaults to `True`.

        compute_linear_spec (bool):
            If true, the linear spectrogram is computed and returned alongside the mel output. Do not change. Defaults to `True`.

        use_weighted_sampler (bool):
            If true, use weighted sampler with bucketing for balancing samples between datasets used in training. Defaults to `False`.

        weighted_sampler_attrs (dict):
            Key retuned by the formatter to be used for weighted sampler. For example `{"root_path": 2.0, "speaker_name": 1.0}` sets sample probabilities
            by overweighting `root_path` by 2.0. Defaults to `{}`.

        weighted_sampler_multipliers (dict):
            Weight each unique value of a key returned by the formatter for weighted sampling.
            For example `{"root_path":{"/raid/datasets/libritts-clean-16khz-bwe-coqui_44khz/LibriTTS/train-clean-100/":1.0, "/raid/datasets/libritts-clean-16khz-bwe-coqui_44khz/LibriTTS/train-clean-360/": 0.5}`.
            It will sample instances from `train-clean-100` 2 times more than `train-clean-360`. Defaults to `{}`.

        r (int):
            Number of spectrogram frames to be generated at a time. Do not change. Defaults to `1`.

        add_blank (bool):
            If true, a blank token is added in between every character. Defaults to `True`.

        test_sentences (List[List]):
            List of sentences with speaker and language information to be used for testing.

        language_ids_file (str):
            Path to the language ids file.

        use_language_embedding (bool):
            If true, language embedding is used. Defaults to `False`.

    Note:
        Check :class:`TTS.tts.configs.shared_configs.BaseTTSConfig` for the inherited parameters.

    Example:

        >>> from TTS.tts.configs.vits_config import VitsConfig
        >>> config = VitsConfig()
    """

    model: str = "vits"
    # model specific params
    model_args: VitsArgs = field(default_factory=VitsArgs)
    audio: VitsAudioConfig = VitsAudioConfig()

    # optimizer
    grad_clip: List[float] = field(default_factory=lambda: [1000, 1000])
    lr_gen: float = 0.0002
    lr_disc: float = 0.0002
    lr_scheduler_gen: str = "ExponentialLR"
    lr_scheduler_gen_params: dict = field(default_factory=lambda: {"gamma": 0.999875, "last_epoch": -1})
    lr_scheduler_disc: str = "ExponentialLR"
    lr_scheduler_disc_params: dict = field(default_factory=lambda: {"gamma": 0.999875, "last_epoch": -1})
    scheduler_after_epoch: bool = True
    optimizer: str = "AdamW"
    optimizer_params: dict = field(default_factory=lambda: {"betas": [0.8, 0.99], "eps": 1e-9, "weight_decay": 0.01})

    # loss params
    kl_loss_alpha: float = 1.0
    disc_loss_alpha: float = 1.0
    gen_loss_alpha: float = 1.0
    feat_loss_alpha: float = 1.0
    mel_loss_alpha: float = 45.0
    dur_loss_alpha: float = 1.0
    speaker_encoder_loss_alpha: float = 1.0

    # data loader params
    return_wav: bool = True
    compute_linear_spec: bool = True

    # sampler params
    use_weighted_sampler: bool = False  # TODO: move it to the base config
    weighted_sampler_attrs: dict = field(default_factory=lambda: {})
    weighted_sampler_multipliers: dict = field(default_factory=lambda: {})

    # overrides
    r: int = 1  # DO NOT CHANGE
    add_blank: bool = True

    # testing
    test_sentences: List[List] = field(
        default_factory=lambda: [
            ["It took me quite a long time to develop a voice, and now that I have it I'm not going to be silent."],
            ["Be a voice, not an echo."],
            ["I'm sorry Dave. I'm afraid I can't do that."],
            ["This cake is great. It's so delicious and moist."],
            ["Prior to November 22, 1963."],
        ]
    )

    # multi-speaker settings
    # use speaker embedding layer
    num_speakers: int = 0
    use_speaker_embedding: bool = False
    speakers_file: str = None
    speaker_embedding_channels: int = 256
    language_ids_file: str = None
    use_language_embedding: bool = False

    # use d-vectors
    use_d_vector_file: bool = False
    d_vector_file: List[str] = None
    d_vector_dim: int = None

    def __post_init__(self):
        for key, val in self.model_args.items():
            if hasattr(self, key):
                self[key] = val


================================================
FILE: TTS/tts/datasets/__init__.py
================================================
import os
import sys
from collections import Counter
from pathlib import Path
from typing import Callable, Dict, List, Tuple, Union

import numpy as np

from TTS.tts.datasets.dataset import *
from TTS.tts.datasets.formatters import *


def split_dataset(items, eval_split_max_size=None, eval_split_size=0.01):
    """Split a dataset into train and eval. Consider speaker distribution in multi-speaker training.

    Args:
        items (List[List]):
            A list of samples. Each sample is a list of `[audio_path, text, speaker_id]`.

        eval_split_max_size (int):
            Number maximum of samples to be used for evaluation in proportion split. Defaults to None (Disabled).

        eval_split_size (float):
            If between 0.0 and 1.0 represents the proportion of the dataset to include in the evaluation set.
            If > 1, represents the absolute number of evaluation samples. Defaults to 0.01 (1%).
    """
    speakers = [item["speaker_name"] for item in items]
    is_multi_speaker = len(set(speakers)) > 1
    if eval_split_size > 1:
        eval_split_size = int(eval_split_size)
    else:
        if eval_split_max_size:
            eval_split_size = min(eval_split_max_size, int(len(items) * eval_split_size))
        else:
            eval_split_size = int(len(items) * eval_split_size)

    assert (
        eval_split_size > 0
    ), " [!] You do not have enough samples for the evaluation set. You can work around this setting the 'eval_split_size' parameter to a minimum of {}".format(
        1 / len(items)
    )
    np.random.seed(0)
    np.random.shuffle(items)
    if is_multi_speaker:
        items_eval = []
        speakers = [item["speaker_name"] for item in items]
        speaker_counter = Counter(speakers)
        while len(items_eval) < eval_split_size:
            item_idx = np.random.randint(0, len(items))
            speaker_to_be_removed = items[item_idx]["speaker_name"]
            if speaker_counter[speaker_to_be_removed] > 1:
                items_eval.append(items[item_idx])
                speaker_counter[speaker_to_be_removed] -= 1
                del items[item_idx]
        return items_eval, items
    return items[:eval_split_size], items[eval_split_size:]


def add_extra_keys(metadata, language, dataset_name):
    for item in metadata:
        # add language name
        item["language"] = language
        # add unique audio name
        relfilepath = os.path.splitext(os.path.relpath(item["audio_file"], item["root_path"]))[0]
        audio_unique_name = f"{dataset_name}#{relfilepath}"
        item["audio_unique_name"] = audio_unique_name
    return metadata


def load_tts_samples(
    datasets: Union[List[Dict], Dict],
    eval_split=True,
    formatter: Callable = None,
    eval_split_max_size=None,
    eval_split_size=0.01,
) -> Tuple[List[List], List[List]]:
    """Parse the dataset from the datasets config, load the samples as a List and load the attention alignments if provided.
    If `formatter` is not None, apply the formatter to the samples else pick the formatter from the available ones based
    on the dataset name.

    Args:
        datasets (List[Dict], Dict): A list of datasets or a single dataset dictionary. If multiple datasets are
            in the list, they are all merged.

        eval_split (bool, optional): If true, create a evaluation split. If an eval split provided explicitly, generate
            an eval split automatically. Defaults to True.

        formatter (Callable, optional): The preprocessing function to be applied to create the list of samples. It
            must take the root_path and the meta_file name and return a list of samples in the format of
            `[[text, audio_path, speaker_id], ...]]`. See the available formatters in `TTS.tts.dataset.formatter` as
            example. Defaults to None.

        eval_split_max_size (int):
            Number maximum of samples to be used for evaluation in proportion split. Defaults to None (Disabled).

        eval_split_size (float):
            If between 0.0 and 1.0 represents the proportion of the dataset to include in the evaluation set.
            If > 1, represents the absolute number of evaluation samples. Defaults to 0.01 (1%).

    Returns:
        Tuple[List[List], List[List]: training and evaluation splits of the dataset.
    """
    meta_data_train_all = []
    meta_data_eval_all = [] if eval_split else None
    if not isinstance(datasets, list):
        datasets = [datasets]
    for dataset in datasets:
        formatter_name = dataset["formatter"]
        dataset_name = dataset["dataset_name"]
        root_path = dataset["path"]
        meta_file_train = dataset["meta_file_train"]
        meta_file_val = dataset["meta_file_val"]
        ignored_speakers = dataset["ignored_speakers"]
        language = dataset["language"]

        # setup the right data processor
        if formatter is None:
            formatter = _get_formatter_by_name(formatter_name)
        # load train set
        meta_data_train = formatter(root_path, meta_file_train, ignored_speakers=ignored_speakers)
        assert len(meta_data_train) > 0, f" [!] No training samples found in {root_path}/{meta_file_train}"

        meta_data_train = add_extra_keys(meta_data_train, language, dataset_name)

        print(f" | > Found {len(meta_data_train)} files in {Path(root_path).resolve()}")
        # load evaluation split if set
        if eval_split:
            if meta_file_val:
                meta_data_eval = formatter(root_path, meta_file_val, ignored_speakers=ignored_speakers)
                meta_data_eval = add_extra_keys(meta_data_eval, language, dataset_name)
            else:
                eval_size_per_dataset = eval_split_max_size // len(datasets) if eval_split_max_size else None
                meta_data_eval, meta_data_train = split_dataset(meta_data_train, eval_size_per_dataset, eval_split_size)
            meta_data_eval_all += meta_data_eval
        meta_data_train_all += meta_data_train
        # load attention masks for the duration predictor training
        if dataset.meta_file_attn_mask:
            meta_data = dict(load_attention_mask_meta_data(dataset["meta_file_attn_mask"]))
            for idx, ins in enumerate(meta_data_train_all):
                attn_file = meta_data[ins["audio_file"]].strip()
                meta_data_train_all[idx].update({"alignment_file": attn_file})
            if meta_data_eval_all:
                for idx, ins in enumerate(meta_data_eval_all):
                    attn_file = meta_data[ins["audio_file"]].strip()
                    meta_data_eval_all[idx].update({"alignment_file": attn_file})
        # set none for the next iter
        formatter = None
    return meta_data_train_all, meta_data_eval_all


def load_attention_mask_meta_data(metafile_path):
    """Load meta data file created by compute_attention_masks.py"""
    with open(metafile_path, "r", encoding="utf-8") as f:
        lines = f.readlines()

    meta_data = []
    for line in lines:
        wav_file, attn_file = line.split("|")
        meta_data.append([wav_file, attn_file])
    return meta_data


def _get_formatter_by_name(name):
    """Returns the respective preprocessing function."""
    thismodule = sys.modules[__name__]
    return getattr(thismodule, name.lower())


def find_unique_chars(data_samples, verbose=True):
    texts = "".join(item[0] for item in data_samples)
    chars = set(texts)
    lower_chars = filter(lambda c: c.islower(), chars)
    chars_force_lower = [c.lower() for c in chars]
    chars_force_lower = set(chars_force_lower)

    if verbose:
        print(f" > Number of unique characters: {len(chars)}")
        print(f" > Unique characters: {''.join(sorted(chars))}")
        print(f" > Unique lower characters: {''.join(sorted(lower_chars))}")
        print(f" > Unique all forced to lower characters: {''.join(sorted(chars_force_lower))}")
    return chars_force_lower


================================================
FILE: TTS/tts/datasets/dataset.py
================================================
import base64
import collections
import os
import random
from typing import Dict, List, Union

import numpy as np
import torch
import tqdm
from torch.utils.data import Dataset

from TTS.tts.utils.data import prepare_data, prepare_stop_target, prepare_tensor
from TTS.utils.audio import AudioProcessor
from TTS.utils.audio.numpy_transforms import compute_energy as calculate_energy

# to prevent too many open files error as suggested here
# https://github.com/pytorch/pytorch/issues/11201#issuecomment-421146936
torch.multiprocessing.set_sharing_strategy("file_system")


def _parse_sample(item):
    language_name = None
    attn_file = None
    if len(item) == 5:
        text, wav_file, speaker_name, language_name, attn_file = item
    elif len(item) == 4:
        text, wav_file, speaker_name, language_name = item
    elif len(item) == 3:
        text, wav_file, speaker_name = item
    else:
        raise ValueError(" [!] Dataset cannot parse the sample.")
    return text, wav_file, speaker_name, language_name, attn_file


def noise_augment_audio(wav):
    return wav + (1.0 / 32768.0) * np.random.rand(*wav.shape)


def string2filename(string):
    # generate a safe and reversible filename based on a string
    filename = base64.urlsafe_b64encode(string.encode("utf-8")).decode("utf-8", "ignore")
    return filename


class TTSDataset(Dataset):
    def __init__(
        self,
        outputs_per_step: int = 1,
        compute_linear_spec: bool = False,
        ap: AudioProcessor = None,
        samples: List[Dict] = None,
        tokenizer: "TTSTokenizer" = None,
        compute_f0: bool = False,
        compute_energy: bool = False,
        f0_cache_path: str = None,
        energy_cache_path: str = None,
        return_wav: bool = False,
        batch_group_size: int = 0,
        min_text_len: int = 0,
        max_text_len: int = float("inf"),
        min_audio_len: int = 0,
        max_audio_len: int = float("inf"),
        phoneme_cache_path: str = None,
        precompute_num_workers: int = 0,
        speaker_id_mapping: Dict = None,
        d_vector_mapping: Dict = None,
        language_id_mapping: Dict = None,
        use_noise_augment: bool = False,
        start_by_longest: bool = False,
        verbose: bool = False,
    ):
        """Generic 📂 data loader for `tts` models. It is configurable for different outputs and needs.

        If you need something different, you can subclass and override.

        Args:
            outputs_per_step (int): Number of time frames predicted per step.

            compute_linear_spec (bool): compute linear spectrogram if True.

            ap (TTS.tts.utils.AudioProcessor): Audio processor object.

            samples (list): List of dataset samples.

            tokenizer (TTSTokenizer): tokenizer to convert text to sequence IDs. If None init internally else
                use the given. Defaults to None.

            compute_f0 (bool): compute f0 if True. Defaults to False.

            compute_energy (bool): compute energy if True. Defaults to False.

            f0_cache_path (str): Path to store f0 cache. Defaults to None.

            energy_cache_path (str): Path to store energy cache. Defaults to None.

            return_wav (bool): Return the waveform of the sample. Defaults to False.

            batch_group_size (int): Range of batch randomization after sorting
                sequences by length. It shuffles each batch with bucketing to gather similar lenght sequences in a
                batch. Set 0 to disable. Defaults to 0.

            min_text_len (int): Minimum length of input text to be used. All shorter samples will be ignored.
                Defaults to 0.

            max_text_len (int): Maximum length of input text to be used. All longer samples will be ignored.
                Defaults to float("inf").

            min_audio_len (int): Minimum length of input audio to be used. All shorter samples will be ignored.
                Defaults to 0.

            max_audio_len (int): Maximum length of input audio to be used. All longer samples will be ignored.
                The maximum length in the dataset defines the VRAM used in the training. Hence, pay attention to
                this value if you encounter an OOM error in training. Defaults to float("inf").

            phoneme_cache_path (str): Path to cache computed phonemes. It writes phonemes of each sample to a
                separate file. Defaults to None.

            precompute_num_workers (int): Number of workers to precompute features. Defaults to 0.

            speaker_id_mapping (dict): Mapping of speaker names to IDs used to compute embedding vectors by the
                embedding layer. Defaults to None.

            d_vector_mapping (dict): Mapping of wav files to computed d-vectors. Defaults to None.

            use_noise_augment (bool): Enable adding random noise to wav for augmentation. Defaults to False.

            start_by_longest (bool): Start by longest sequence. It is especially useful to check OOM. Defaults to False.

            verbose (bool): Print diagnostic information. Defaults to false.
        """
        super().__init__()
        self.batch_group_size = batch_group_size
        self._samples = samples
        self.outputs_per_step = outputs_per_step
        self.compute_linear_spec = compute_linear_spec
        self.return_wav = return_wav
        self.compute_f0 = compute_f0
        self.compute_energy = compute_energy
        self.f0_cache_path = f0_cache_path
        self.energy_cache_path = energy_cache_path
        self.min_audio_len = min_audio_len
        self.max_audio_len = max_audio_len
        self.min_text_len = min_text_len
        self.max_text_len = max_text_len
        self.ap = ap
        self.phoneme_cache_path = phoneme_cache_path
        self.speaker_id_mapping = speaker_id_mapping
        self.d_vector_mapping = d_vector_mapping
        self.language_id_mapping = language_id_mapping
        self.use_noise_augment = use_noise_augment
        self.start_by_longest = start_by_longest

        self.verbose = verbose
        self.rescue_item_idx = 1
        self.pitch_computed = False
        self.tokenizer = tokenizer

        if self.tokenizer.use_phonemes:
            self.phoneme_dataset = PhonemeDataset(
                self.samples, self.tokenizer, phoneme_cache_path, precompute_num_workers=precompute_num_workers
            )

        if compute_f0:
            self.f0_dataset = F0Dataset(
                self.samples, self.ap, cache_path=f0_cache_path, precompute_num_workers=precompute_num_workers
            )
        if compute_energy:
            self.energy_dataset = EnergyDataset(
                self.samples, self.ap, cache_path=energy_cache_path, precompute_num_workers=precompute_num_workers
            )
        if self.verbose:
            self.print_logs()

    @property
    def lengths(self):
        lens = []
        for item in self.samples:
            _, wav_file, *_ = _parse_sample(item)
            audio_len = os.path.getsize(wav_file) / 16 * 8  # assuming 16bit audio
            lens.append(audio_len)
        return lens

    @property
    def samples(self):
        return self._samples

    @samples.setter
    def samples(self, new_samples):
        self._samples = new_samples
        if hasattr(self, "f0_dataset"):
            self.f0_dataset.samples = new_samples
        if hasattr(self, "energy_dataset"):
            self.energy_dataset.samples = new_samples
        if hasattr(self, "phoneme_dataset"):
            self.phoneme_dataset.samples = new_samples

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        return self.load_data(idx)

    def print_logs(self, level: int = 0) -> None:
        indent = "\t" * level
        print("\n")
        print(f"{indent}> DataLoader initialization")
        print(f"{indent}| > Tokenizer:")
        self.tokenizer.print_logs(level + 1)
        print(f"{indent}| > Number of instances : {len(self.samples)}")

    def load_wav(self, filename):
        waveform = self.ap.load_wav(filename)
        assert waveform.size > 0
        return waveform

    def get_phonemes(self, idx, text):
        out_dict = self.phoneme_dataset[idx]
        assert text == out_dict["text"], f"{text} != {out_dict['text']}"
        assert len(out_dict["token_ids"]) > 0
        return out_dict

    def get_f0(self, idx):
        out_dict = self.f0_dataset[idx]
        item = self.samples[idx]
        assert item["audio_unique_name"] == out_dict["audio_unique_name"]
        return out_dict

    def get_energy(self, idx):
        out_dict = self.energy_dataset[idx]
        item = self.samples[idx]
        assert item["audio_unique_name"] == out_dict["audio_unique_name"]
        return out_dict

    @staticmethod
    def get_attn_mask(attn_file):
        return np.load(attn_file)

    def get_token_ids(self, idx, text):
        if self.tokenizer.use_phonemes:
            token_ids = self.get_phonemes(idx, text)["token_ids"]
        else:
            token_ids = self.tokenizer.text_to_ids(text)
        return np.array(token_ids, dtype=np.int32)

    def load_data(self, idx):
        item = self.samples[idx]

        raw_text = item["text"]

        wav = np.asarray(self.load_wav(item["audio_file"]), dtype=np.float32)

        # apply noise for augmentation
        if self.use_noise_augment:
            wav = noise_augment_audio(wav)

        # get token ids
        token_ids = self.get_token_ids(idx, item["text"])

        # get pre-computed attention maps
        attn = None
        if "alignment_file" in item:
            attn = self.get_attn_mask(item["alignment_file"])

        # after phonemization the text length may change
        # this is a shareful 🤭 hack to prevent longer phonemes
        # TODO: find a better fix
        if len(token_ids) > self.max_text_len or len(wav) < self.min_audio_len:
            self.rescue_item_idx += 1
            return self.load_data(self.rescue_item_idx)

        # get f0 values
        f0 = None
        if self.compute_f0:
            f0 = self.get_f0(idx)["f0"]
        energy = None
        if self.compute_energy:
            energy = self.get_energy(idx)["energy"]

        sample = {
            "raw_text": raw_text,
            "token_ids": token_ids,
            "wav": wav,
            "pitch": f0,
            "energy": energy,
            "attn": attn,
            "item_idx": item["audio_file"],
            "speaker_name": item["speaker_name"],
            "language_name": item["language"],
            "wav_file_name": os.path.basename(item["audio_file"]),
            "audio_unique_name": item["audio_unique_name"],
        }
        return sample

    @staticmethod
    def _compute_lengths(samples):
        new_samples = []
        for item in samples:
            audio_length = os.path.getsize(item["audio_file"]) / 16 * 8  # assuming 16bit audio
            text_lenght = len(item["text"])
            item["audio_length"] = audio_length
            item["text_length"] = text_lenght
            new_samples += [item]
        return new_samples

    @staticmethod
    def filter_by_length(lengths: List[int], min_len: int, max_len: int):
        idxs = np.argsort(lengths)  # ascending order
        ignore_idx = []
        keep_idx = []
        for idx in idxs:
            length = lengths[idx]
            if length < min_len or length > max_len:
                ignore_idx.append(idx)
            else:
                keep_idx.append(idx)
        return ignore_idx, keep_idx

    @staticmethod
    def sort_by_length(samples: List[List]):
        audio_lengths = [s["audio_length"] for s in samples]
        idxs = np.argsort(audio_lengths)  # ascending order
        return idxs

    @staticmethod
    def create_buckets(samples, batch_group_size: int):
        assert batch_group_size > 0
        for i in range(len(samples) // batch_group_size):
            offset = i * batch_group_size
            end_offset = offset + batch_group_size
            temp_items = samples[offset:end_offset]
            random.shuffle(temp_items)
            samples[offset:end_offset] = temp_items
        return samples

    @staticmethod
    def _select_samples_by_idx(idxs, samples):
        samples_new = []
        for idx in idxs:
            samples_new.append(samples[idx])
        return samples_new

    def preprocess_samples(self):
        r"""Sort `items` based on text length or audio length in ascending order. Filter out samples out or the length
        range.
        """
        samples = self._compute_lengths(self.samples)

        # sort items based on the sequence length in ascending order
        text_lengths = [i["text_length"] for i in samples]
        audio_lengths = [i["audio_length"] for i in samples]
        text_ignore_idx, text_keep_idx = self.filter_by_length(text_lengths, self.min_text_len, self.max_text_len)
        audio_ignore_idx, audio_keep_idx = self.filter_by_length(audio_lengths, self.min_audio_len, self.max_audio_len)
        keep_idx = list(set(audio_keep_idx) & set(text_keep_idx))
        ignore_idx = list(set(audio_ignore_idx) | set(text_ignore_idx))

        samples = self._select_samples_by_idx(keep_idx, samples)

        sorted_idxs = self.sort_by_length(samples)

        if self.start_by_longest:
            longest_idxs = sorted_idxs[-1]
            sorted_idxs[-1] = sorted_idxs[0]
            sorted_idxs[0] = longest_idxs

        samples = self._select_samples_by_idx(sorted_idxs, samples)

        if len(samples) == 0:
            raise RuntimeError(" [!] No samples left")

        # shuffle batch groups
        # create batches with similar length items
        # the larger the `batch_group_size`, the higher the length variety in a batch.
        if self.batch_group_size > 0:
            samples = self.create_buckets(samples, self.batch_group_size)

        # update items to the new sorted items
        audio_lengths = [s["audio_length"] for s in samples]
        text_lengths = [s["text_length"] for s in samples]
        self.samples = samples

        if self.verbose:
            print(" | > Preprocessing samples")
            print(" | > Max text length: {}".format(np.max(text_lengths)))
            print(" | > Min text length: {}".format(np.min(text_lengths)))
            print(" | > Avg text length: {}".format(np.mean(text_lengths)))
            print(" | ")
            print(" | > Max audio length: {}".format(np.max(audio_lengths)))
            print(" | > Min audio length: {}".format(np.min(audio_lengths)))
            print(" | > Avg audio length: {}".format(np.mean(audio_lengths)))
            print(f" | > Num. instances discarded samples: {len(ignore_idx)}")
            print(" | > Batch group size: {}.".format(self.batch_group_size))

    @staticmethod
    def _sort_batch(batch, text_lengths):
        """Sort the batch by the input text length for RNN efficiency.

        Args:
            batch (Dict): Batch returned by `__getitem__`.
            text_lengths (List[int]): Lengths of the input character sequences.
        """
        text_lengths, ids_sorted_decreasing = torch.sort(torch.LongTensor(text_lengths), dim=0, descending=True)
        batch = [batch[idx] for idx in ids_sorted_decreasing]
        return batch, text_lengths, ids_sorted_decreasing

    def collate_fn(self, batch):
        r"""
        Perform preprocessing and create a final data batch:
        1. Sort batch instances by text-length
        2. Convert Audio signal to features.
        3. PAD sequences wrt r.
        4. Load to Torch.
        """

        # Puts each data field into a tensor with outer dimension batch size
        if isinstance(batch[0], collections.abc.Mapping):
            token_ids_lengths = np.array([len(d["token_ids"]) for d in batch])

            # sort items with text input length for RNN efficiency
            batch, token_ids_lengths, ids_sorted_decreasing = self._sort_batch(batch, token_ids_lengths)

            # convert list of dicts to dict of lists
            batch = {k: [dic[k] for dic in batch] for k in batch[0]}

            # get language ids from language names
            if self.language_id_mapping is not None:
                language_ids = [self.language_id_mapping[ln] for ln in batch["language_name"]]
            else:
                language_ids = None
            # get pre-computed d-vectors
            if self.d_vector_mapping is not None:
                embedding_keys = list(batch["audio_unique_name"])
                d_vectors = [self.d_vector_mapping[w]["embedding"] for w in embedding_keys]
            else:
                d_vectors = None

            # get numerical speaker ids from speaker names
            if self.speaker_id_mapping:
                speaker_ids = [self.speaker_id_mapping[sn] for sn in batch["speaker_name"]]
            else:
                speaker_ids = None
            # compute features
            mel = [self.ap.melspectrogram(w).astype("float32") for w in batch["wav"]]

            mel_lengths = [m.shape[1] for m in mel]

            # lengths adjusted by the reduction factor
            mel_lengths_adjusted = [
                m.shape[1] + (self.outputs_per_step - (m.shape[1] % self.outputs_per_step))
                if m.shape[1] % self.outputs_per_step
                else m.shape[1]
                for m in mel
            ]

            # compute 'stop token' targets
            stop_targets = [np.array([0.0] * (mel_len - 1) + [1.0]) for mel_len in mel_lengths]

            # PAD stop targets
            stop_targets = prepare_stop_target(stop_targets, self.outputs_per_step)

            # PAD sequences with longest instance in the batch
            token_ids = prepare_data(batch["token_ids"]).astype(np.int32)

            # PAD features with longest instance
            mel = prepare_tensor(mel, self.outputs_per_step)

            # B x D x T --> B x T x D
            mel = mel.transpose(0, 2, 1)

            # convert things to pytorch
            token_ids_lengths = torch.LongTensor(token_ids_lengths)
            token_ids = torch.LongTensor(token_ids)
            mel = torch.FloatTensor(mel).contiguous()
            mel_lengths = torch.LongTensor(mel_lengths)
            stop_targets = torch.FloatTensor(stop_targets)

            # speaker vectors
            if d_vectors is not None:
                d_vectors = torch.FloatTensor(d_vectors)

            if speaker_ids is not None:
                speaker_ids = torch.LongTensor(speaker_ids)

            if language_ids is not None:
                language_ids = torch.LongTensor(language_ids)

            # compute linear spectrogram
            linear = None
            if self.compute_linear_spec:
                linear = [self.ap.spectrogram(w).astype("float32") for w in batch["wav"]]
                linear = prepare_tensor(linear, self.outputs_per_step)
                linear = linear.transpose(0, 2, 1)
                assert mel.shape[1] == linear.shape[1]
                linear = torch.FloatTensor(linear).contiguous()

            # format waveforms
            wav_padded = None
            if self.return_wav:
                wav_lengths = [w.shape[0] for w in batch["wav"]]
                max_wav_len = max(mel_lengths_adjusted) * self.ap.hop_length
                wav_lengths = torch.LongTensor(wav_lengths)
                wav_padded = torch.zeros(len(batch["wav"]), 1, max_wav_len)
                for i, w in enumerate(batch["wav"]):
                    mel_length = mel_lengths_adjusted[i]
                    w = np.pad(w, (0, self.ap.hop_length * self.outputs_per_step), mode="edge")
                    w = w[: mel_length * self.ap.hop_length]
                    wav_padded[i, :, : w.shape[0]] = torch.from_numpy(w)
                wav_padded.transpose_(1, 2)

            # format F0
            if self.compute_f0:
                pitch = prepare_data(batch["pitch"])
                assert mel.shape[1] == pitch.shape[1], f"[!] {mel.shape} vs {pitch.shape}"
                pitch = torch.FloatTensor(pitch)[:, None, :].contiguous()  # B x 1 xT
            else:
                pitch = None
            # format energy
            if self.compute_energy:
                energy = prepare_data(batch["energy"])
                assert mel.shape[1] == energy.shape[1], f"[!] {mel.shape} vs {energy.shape}"
                energy = torch.FloatTensor(energy)[:, None, :].contiguous()  # B x 1 xT
            else:
                energy = None
            # format attention masks
            attns = None
            if batch["attn"][0] is not None:
                attns = [batch["attn"][idx].T for idx in ids_sorted_decreasing]
                for idx, attn in enumerate(attns):
                    pad2 = mel.shape[1] - attn.shape[1]
                    pad1 = token_ids.shape[1] - attn.shape[0]
                    assert pad1 >= 0 and pad2 >= 0, f"[!] Negative padding - {pad1} and {pad2}"
                    attn = np.pad(attn, [[0, pad1], [0, pad2]])
                    attns[idx] = attn
                attns = prepare_tensor(attns, self.outputs_per_step)
                attns = torch.FloatTensor(attns).unsqueeze(1)

            return {
                "token_id": token_ids,
                "token_id_lengths": token_ids_lengths,
                "speaker_names": batch["speaker_name"],
                "linear": linear,
                "mel": mel,
                "mel_lengths": mel_lengths,
                "stop_targets": stop_targets,
                "item_idxs": batch["item_idx"],
                "d_vectors": d_vectors,
                "speaker_ids": speaker_ids,
                "attns": attns,
                "waveform": wav_padded,
                "raw_text": batch["raw_text"],
                "pitch": pitch,
                "energy": energy,
                "language_ids": language_ids,
                "audio_unique_names": batch["audio_unique_name"],
            }

        raise TypeError(
            (
                "batch must contain tensors, numbers, dicts or lists;\
                         found {}".format(
                    type(batch[0])
                )
            )
        )


class PhonemeDataset(Dataset):
    """Phoneme Dataset for converting input text to phonemes and then token IDs

    At initialization, it pre-computes the phonemes under `cache_path` and loads them in training to reduce data
    loading latency. If `cache_path` is already present, it skips the pre-computation.

    Args:
        samples (Union[List[List], List[Dict]]):
            List of samples. Each sample is a list or a dict.

        tokenizer (TTSTokenizer):
            Tokenizer to convert input text to phonemes.

        cache_path (str):
            Path to cache phonemes. If `cache_path` is already present or None, it skips the pre-computation.

        precompute_num_workers (int):
            Number of workers used for pre-computing the phonemes. Defaults to 0.
    """

    def __init__(
        self,
        samples: Union[List[Dict], List[List]],
        tokenizer: "TTSTokenizer",
        cache_path: str,
        precompute_num_workers=0,
    ):
        self.samples = samples
        self.tokenizer = tokenizer
        self.cache_path = cache_path
        if cache_path is not None and not os.path.exists(cache_path):
            os.makedirs(cache_path)
            self.precompute(precompute_num_workers)

    def __getitem__(self, index):
        item = self.samples[index]
        ids = self.compute_or_load(string2filename(item["audio_unique_name"]), item["text"], item["language"])
        ph_hat = self.tokenizer.ids_to_text(ids)
        return {"text": item["text"], "ph_hat": ph_hat, "token_ids": ids, "token_ids_len": len(ids)}

    def __len__(self):
        return len(self.samples)

    def compute_or_load(self, file_name, text, language):
        """Compute phonemes for the given text.

        If the phonemes are already cached, load them from cache.
        """
        file_ext = "_phoneme.npy"
        cache_path = os.path.join(self.cache_path, file_name + file_ext)
        try:
            ids = np.load(cache_path)
        except FileNotFoundError:
            ids = self.tokenizer.text_to_ids(text, language=language)
            np.save(cache_path, ids)
        return ids

    def get_pad_id(self):
        """Get pad token ID for sequence padding"""
        return self.tokenizer.pad_id

    def precompute(self, num_workers=1):
        """Precompute phonemes for all samples.

        We use pytorch dataloader because we are lazy.
        """
        print("[*] Pre-computing phonemes...")
        with tqdm.tqdm(total=len(self)) as pbar:
            batch_size = num_workers if num_workers > 0 else 1
            dataloder = torch.utils.data.DataLoader(
                batch_size=batch_size, dataset=self, shuffle=False, num_workers=num_workers, collate_fn=self.collate_fn
            )
            for _ in dataloder:
                pbar.update(batch_size)

    def collate_fn(self, batch):
        ids = [item["token_ids"] for item in batch]
        ids_lens = [item["token_ids_len"] for item in batch]
        texts = [item["text"] for item in batch]
        texts_hat = [item["ph_hat"] for item in batch]
        ids_lens_max = max(ids_lens)
        ids_torch = torch.LongTensor(len(ids), ids_lens_max).fill_(self.get_pad_id())
        for i, ids_len in enumerate(ids_lens):
            ids_torch[i, :ids_len] = torch.LongTensor(ids[i])
        return {"text": texts, "ph_hat": texts_hat, "token_ids": ids_torch}

    def print_logs(self, level: int = 0) -> None:
        indent = "\t" * level
        print("\n")
        print(f"{indent}> PhonemeDataset ")
        print(f"{indent}| > Tokenizer:")
        self.tokenizer.print_logs(level + 1)
        print(f"{indent}| > Number of instances : {len(self.samples)}")


class F0Dataset:
    """F0 Dataset for computing F0 from wav files in CPU

    Pre-compute F0 values for all the samples at initialization if `cache_path` is not None or already present. It
    also computes the mean and std of F0 values if `normalize_f0` is True.

    Args:
        samples (Union[List[List], List[Dict]]):
            List of samples. Each sample is a list or a dict.

        ap (AudioProcessor):
            AudioProcessor to compute F0 from wav files.

        cache_path (str):
            Path to cache F0 values. If `cache_path` is already present or None, it skips the pre-computation.
            Defaults to None.

        precompute_num_workers (int):
            Number of workers used for pre-computing the F0 values. Defaults to 0.

        normalize_f0 (bool):
            Whether to normalize F0 values by mean and std. Defaults to True.
    """

    def __init__(
        self,
        samples: Union[List[List], List[Dict]],
        ap: "AudioProcessor",
        verbose=False,
        cache_path: str = None,
        precompute_num_workers=0,
        normalize_f0=True,
    ):
        self.samples = samples
        self.ap = ap
        self.verbose = verbose
        self.cache_path = cache_path
        self.normalize_f0 = normalize_f0
        self.pad_id = 0.0
        self.mean = None
        self.std = None
        if cache_path is not None and not os.path.exists(cache_path):
            os.makedirs(cache_path)
            self.precompute(precompute_num_workers)
        if normalize_f0:
            self.load_stats(cache_path)

    def __getitem__(self, idx):
        item = self.samples[idx]
        f0 = self.compute_or_load(item["audio_file"], string2filename(item["audio_unique_name"]))
        if self.normalize_f0:
            assert self.mean is not None and self.std is not None, " [!] Mean and STD is not available"
            f0 = self.normalize(f0)
        return {"audio_unique_name": item["audio_unique_name"], "f0": f0}

    def __len__(self):
        return len(self.samples)

    def precompute(self, num_workers=0):
        print("[*] Pre-computing F0s...")
        with tqdm.tqdm(total=len(self)) as pbar:
            batch_size = num_workers if num_workers > 0 else 1
            # we do not normalize at preproessing
            normalize_f0 = self.normalize_f0
            self.normalize_f0 = False
            dataloder = torch.utils.data.DataLoader(
                batch_size=batch_size, dataset=self, shuffle=False, num_workers=num_workers, collate_fn=self.collate_fn
            )
            computed_data = []
            for batch in dataloder:
                f0 = batch["f0"]
                computed_data.append(f for f in f0)
                pbar.update(batch_size)
            self.normalize_f0 = normalize_f0

        if self.normalize_f0:
            computed_data = [tensor for batch in computed_data for tensor in batch]  # flatten
            pitch_mean, pitch_std = self.compute_pitch_stats(computed_data)
            pitch_stats = {"mean": pitch_mean, "std": pitch_std}
            np.save(os.path.join(self.cache_path, "pitch_stats"), pitch_stats, allow_pickle=True)

    def get_pad_id(self):
        return self.pad_id

    @staticmethod
    def create_pitch_file_path(file_name, cache_path):
        pitch_file = os.path.join(cache_path, file_name + "_pitch.npy")
        return pitch_file

    @staticmethod
    def _compute_and_save_pitch(ap, wav_file, pitch_file=None):
        wav = ap.load_wav(wav_file)
        pitch = ap.compute_f0(wav)
        if pitch_file:
            np.save(pitch_file, pitch)
        return pitch

    @staticmethod
    def compute_pitch_stats(pitch_vecs):
        nonzeros = np.concatenate([v[np.where(v != 0.0)[0]] for v in pitch_vecs])
        mean, std = np.mean(nonzeros), np.std(nonzeros)
        return mean, std

    def load_stats(self, cache_path):
        stats_path = os.path.join(cache_path, "pitch_stats.npy")
        stats = np.load(stats_path, allow_pickle=True).item()
        self.mean = stats["mean"].astype(np.float32)
        self.std = stats["std"].astype(np.float32)

    def normalize(self, pitch):
        zero_idxs = np.where(pitch == 0.0)[0]
        pitch = pitch - self.mean
        pitch = pitch / self.std
        pitch[zero_idxs] = 0.0
        return pitch

    def denormalize(self, pitch):
        zero_idxs = np.where(pitch == 0.0)[0]
        pitch *= self.std
        pitch += self.mean
        pitch[zero_idxs] = 0.0
        return pitch

    def compute_or_load(self, wav_file, audio_unique_name):
        """
        compute pitch and return a numpy array of pitch values
        """
        pitch_file = self.create_pitch_file_path(audio_unique_name, self.cache_path)
        if not os.path.exists(pitch_file):
            pitch = self._compute_and_save_pitch(self.ap, wav_file, pitch_file)
        else:
            pitch = np.load(pitch_file)
        return pitch.astype(np.float32)

    def collate_fn(self, batch):
        audio_unique_name = [item["audio_unique_name"] for item in batch]
        f0s = [item["f0"] for item in batch]
        f0_lens = [len(item["f0"]) for item in batch]
        f0_lens_max = max(f0_lens)
        f0s_torch = torch.LongTensor(len(f0s), f0_lens_max).fill_(self.get_pad_id())
        for i, f0_len in enumerate(f0_lens):
            f0s_torch[i, :f0_len] = torch.LongTensor(f0s[i])
        return {"audio_unique_name": audio_unique_name, "f0": f0s_torch, "f0_lens": f0_lens}

    def print_logs(self, level: int = 0) -> None:
        indent = "\t" * level
        print("\n")
        print(f"{indent}> F0Dataset ")
        print(f"{indent}| > Number of instances : {len(self.samples)}")


class EnergyDataset:
    """Energy Dataset for computing Energy from wav files in CPU

    Pre-compute Energy values for all the samples at initialization if `cache_path` is not None or already present. It
    also computes the mean and std of Energy values if `normalize_Energy` is True.

    Args:
        samples (Union[List[List], List[Dict]]):
            List of samples. Each sample is a list or a dict.

        ap (AudioProcessor):
            AudioProcessor to compute Energy from wav files.

        cache_path (str):
            Path to cache Energy values. If `cache_path` is already present or None, it skips the pre-computation.
            Defaults to None.

        precompute_num_workers (int):
            Number of workers used for pre-computing the Energy values. Defaults to 0.

        normalize_Energy (bool):
            Whether to normalize Energy values by mean and std. Defaults to True.
    """

    def __init__(
        self,
        samples: Union[List[List], List[Dict]],
        ap: "AudioProcessor",
        verbose=False,
        cache_path: str = None,
        precompute_num_workers=0,
        normalize_energy=True,
    ):
        self.samples = samples
        self.ap = ap
        self.verbose = verbose
        self.cache_path = cache_path
        self.normalize_energy = normalize_energy
        self.pad_id = 0.0
        self.mean = None
        self.std = None
        if cache_path is not None and not os.path.exists(cache_path):
            os.makedirs(cache_path)
            self.precompute(precompute_num_workers)
        if normalize_energy:
            self.load_stats(cache_path)

    def __getitem__(self, idx):
        item = self.samples[idx]
        energy = self.compute_or_load(item["audio_file"], string2filename(item["audio_unique_name"]))
        if self.normalize_energy:
            assert self.mean is not None and self.std is not None, " [!] Mean and STD is not available"
            energy = self.normalize(energy)
        return {"audio_unique_name": item["audio_unique_name"], "energy": energy}

    def __len__(self):
        return len(self.samples)

    def precompute(self, num_workers=0):
        print("[*] Pre-computing energys...")
        with tqdm.tqdm(total=len(self)) as pbar:
            batch_size = num_workers if num_workers > 0 else 1
            # we do not normalize at preproessing
            normalize_energy = self.normalize_energy
            self.normalize_energy = False
            dataloder = torch.utils.data.DataLoader(
                batch_size=batch_size, dataset=self, shuffle=False, num_workers=num_workers, collate_fn=self.collate_fn
            )
            computed_data = []
            for batch in dataloder:
                energy = batch["energy"]
                computed_data.append(e for e in energy)
                pbar.update(batch_size)
            self.normalize_energy = normalize_energy

        if self.normalize_energy:
            computed_data = [tensor for batch in computed_data for tensor in batch]  # flatten
            energy_mean, energy_std = self.compute_energy_stats(computed_data)
            energy_stats = {"mean": energy_mean, "std": energy_std}
            np.save(os.path.join(self.cache_path, "energy_stats"), energy_stats, allow_pickle=True)

    def get_pad_id(self):
        return self.pad_id

    @staticmethod
    def create_energy_file_path(wav_file, cache_path):
        file_name = os.path.splitext(os.path.basename(wav_file))[0]
        energy_file = os.path.join(cache_path, file_name + "_energy.npy")
        return energy_file

    @staticmethod
    def _compute_and_save_energy(ap, wav_file, energy_file=None):
        wav = ap.load_wav(wav_file)
        energy = calculate_energy(wav, fft_size=ap.fft_size, hop_length=ap.hop_length, win_length=ap.win_length)
        if energy_file:
            np.save(energy_file, energy)
        return energy

    @staticmethod
    def compute_energy_stats(energy_vecs):
        nonzeros = np.concatenate([v[np.where(v != 0.0)[0]] for v in energy_vecs])
        mean, std = np.mean(nonzeros), np.std(nonzeros)
        return mean, std

    def load_stats(self, cache_path):
        stats_path = os.path.join(cache_path, "energy_stats.npy")
        stats = np.load(stats_path, allow_pickle=True).item()
        self.mean = stats["mean"].astype(np.float32)
        self.std = stats["std"].astype(np.float32)

    def normalize(self, energy):
        zero_idxs = np.where(energy == 0.0)[0]
        energy = energy - self.mean
        energy = energy / self.std
        energy[zero_idxs] = 0.0
        return energy

    def denormalize(self, energy):
        zero_idxs = np.where(energy == 0.0)[0]
        energy *= self.std
        energy += self.mean
        energy[zero_idxs] = 0.0
        return energy

    def compute_or_load(self, wav_file, audio_unique_name):
        """
        compute energy and return a numpy array of energy values
        """
        energy_file = self.create_energy_file_path(audio_unique_name, self.cache_path)
        if not os.path.exists(energy_file):
            energy = self._compute_and_save_energy(self.ap, wav_file, energy_file)
        else:
            energy = np.load(energy_file)
        return energy.astype(np.float32)

    def collate_fn(self, batch):
        audio_unique_name = [item["audio_unique_name"] for item in batch]
        energys = [item["energy"] for item in batch]
        energy_lens = [len(item["energy"]) for item in batch]
        energy_lens_max = max(energy_lens)
        energys_torch = torch.LongTensor(len(energys), energy_lens_max).fill_(self.get_pad_id())
        for i, energy_len in enumerate(energy_lens):
            energys_torch[i, :energy_len] = torch.LongTensor(energys[i])
        return {"audio_unique_name": audio_unique_name, "energy": energys_torch, "energy_lens": energy_lens}

    def print_logs(self, level: int = 0) -> None:
        indent = "\t" * level
        print("\n")
        print(f"{indent}> energyDataset ")
        print(f"{indent}| > Number of instances : {len(self.samples)}")


================================================
FILE: TTS/tts/datasets/formatters.py
================================================
import os
import re
import xml.etree.ElementTree as ET
from glob import glob
from pathlib import Path
from typing import List

import pandas as pd
from tqdm import tqdm

########################
# DATASETS
########################


def coqui(root_path, meta_file, ignored_speakers=None):
    """Interal dataset formatter."""
    filepath = os.path.join(root_path, meta_file)
    # ensure there are 4 columns for every line
    with open(filepath, "r", encoding="utf8") as f:
        lines = f.readlines()
    num_cols = len(lines[0].split("|"))  # take the first row as reference
    for idx, line in enumerate(lines[1:]):
        if len(line.split("|")) != num_cols:
            print(f" > Missing column in line {idx + 1} -> {line.strip()}")
    # load metadata
    metadata = pd.read_csv(os.path.join(root_path, meta_file), sep="|")
    assert all(x in metadata.columns for x in ["audio_file", "text"])
    speaker_name = None if "speaker_name" in metadata.columns else "coqui"
    emotion_name = None if "emotion_name" in metadata.columns else "neutral"
    items = []
    not_found_counter = 0
    for row in metadata.itertuples():
        if speaker_name is None and ignored_speakers is not None and row.speaker_name in ignored_speakers:
            continue
        audio_path = os.path.join(root_path, row.audio_file)
        if not os.path.exists(audio_path):
            not_found_counter += 1
            continue
        items.append(
            {
                "text": row.text,
                "audio_file": audio_path,
                "speaker_name": speaker_name if speaker_name is not None else row.speaker_name,
                "emotion_name": emotion_name if emotion_name is not None else row.emotion_name,
                "root_path": root_path,
            }
        )
    if not_found_counter > 0:
        print(f" | > [!] {not_found_counter} files not found")
    return items


def tweb(root_path, meta_file, **kwargs):  # pylint: disable=unused-argument
    """Normalize TWEB dataset.
    https://www.kaggle.com/bryanpark/the-world-english-bible-speech-dataset
    """
    txt_file = os.path.join(root_path, meta_file)
    items = []
    speaker_name = "tweb"
    with open(txt_file, "r", encoding="utf-8") as ttf:
        for line in ttf:
            cols = line.split("\t")
            wav_file = os.path.join(root_path, cols[0] + ".wav")
            text = cols[1]
            items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path})
    return items


def mozilla(root_path, meta_file, **kwargs):  # pylint: disable=unused-argument
    """Normalizes Mozilla meta data files to TTS format"""
    txt_file = os.path.join(root_path, meta_file)
    items = []
    speaker_name = "mozilla"
    with open(txt_file, "r", encoding="utf-8") as ttf:
        for line in ttf:
            cols = line.split("|")
            wav_file = cols[1].strip()
            text = cols[0].strip()
            wav_file = os.path.join(root_path, "wavs", wav_file)
            items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path})
    return items


def mozilla_de(root_path, meta_file, **kwargs):  # pylint: disable=unused-argument
    """Normalizes Mozilla meta data files to TTS format"""
    txt_file = os.path.join(root_path, meta_file)
    items = []
    speaker_name = "mozilla"
    with open(txt_file, "r", encoding="ISO 8859-1") as ttf:
        for line in ttf:
            cols = line.strip().split("|")
            wav_file = cols[0].strip()
            text = cols[1].strip()
            folder_name = f"BATCH_{wav_file.split('_')[0]}_FINAL"
            wav_file = os.path.join(root_path, folder_name, wav_file)
            items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path})
    return items


def mailabs(root_path, meta_files=None, ignored_speakers=None):
    """Normalizes M-AI-Labs meta data files to TTS format

    Args:
        root_path (str): root folder of the MAILAB language folder.
        meta_files (str):  list of meta files to be used in the training. If None, finds all the csv files
            recursively. Defaults to None
    """
    speaker_regex = re.compile(f"by_book{os.sep}(male|female){os.sep}(?P<speaker_name>[^{os.sep}]+){os.sep}")
    if not meta_files:
        csv_files = glob(root_path + f"{os.sep}**{os.sep}metadata.csv", recursive=True)
    else:
        csv_files = meta_files

    # meta_files = [f.strip() for f in meta_files.split(",")]
    items = []
    for csv_file in csv_files:
        if os.path.isfile(csv_file):
            txt_file = csv_file
        else:
            txt_file = os.path.join(root_path, csv_file)

        folder = os.path.dirname(txt_file)
        # determine speaker based on folder structure...
        speaker_name_match = speaker_regex.search(txt_file)
        if speaker_name_match is None:
            continue
        speaker_name = speaker_name_match.group("speaker_name")
        # ignore speakers
        if isinstance(ignored_speakers, list):
            if speaker_name in ignored_speakers:
                continue
        print(" | > {}".format(csv_file))
        with open(txt_file, "r", encoding="utf-8") as ttf:
            for line in ttf:
                cols = line.split("|")
                if not meta_files:
                    wav_file = os.path.join(folder, "wavs", cols[0] + ".wav")
                else:
                    wav_file = os.path.join(root_path, folder.replace("metadata.csv", ""), "wavs", cols[0] + ".wav")
                if os.path.isfile(wav_file):
                    text = cols[1].strip()
                    items.append(
                        {"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path}
                    )
                else:
                    # M-AI-Labs have some missing samples, so just print the warning
                    print("> File %s does not exist!" % (wav_file))
    return items


def ljspeech(root_path, meta_file, **kwargs):  # pylint: disable=unused-argument
    """Normalizes the LJSpeech meta data file to TTS format
    https://keithito.com/LJ-Speech-Dataset/"""
    txt_file = os.path.join(root_path, meta_file)
    items = []
    speaker_name = "ljspeech"
    with open(txt_file, "r", encoding="utf-8") as ttf:
        for line in ttf:
            cols = line.split("|")
            wav_file = os.path.join(root_path, "wavs", cols[0] + ".wav")
            text = cols[2]
            items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path})
    return items


def ljspeech_test(root_path, meta_file, **kwargs):  # pylint: disable=unused-argument
    """Normalizes the LJSpeech meta data file for TTS testing
    https://keithito.com/LJ-Speech-Dataset/"""
    txt_file = os.path.join(root_path, meta_file)
    items = []
    with open(txt_file, "r", encoding="utf-8") as ttf:
        speaker_id = 0
        for idx, line in enumerate(ttf):
            # 2 samples per speaker to avoid eval split issues
            if idx % 2 == 0:
                speaker_id += 1
            cols = line.split("|")
            wav_file = os.path.join(root_path, "wavs", cols[0] + ".wav")
            text = cols[2]
            items.append(
                {"text": text, "audio_file": wav_file, "speaker_name": f"ljspeech-{speaker_id}", "root_path": root_path}
            )
    return items


def thorsten(root_path, meta_file, **kwargs):  # pylint: disable=unused-argument
    """Normalizes the thorsten meta data file to TTS format
    https://github.com/thorstenMueller/deep-learning-german-tts/"""
    txt_file = os.path.join(root_path, meta_file)
    items = []
    speaker_name = "thorsten"
    with open(txt_file, "r", encoding="utf-8") as ttf:
        for line in ttf:
            cols = line.split("|")
            wav_file = os.path.join(root_path, "wavs", cols[0] + ".wav")
            text = cols[1]
            items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path})
    return items


def sam_accenture(root_path, meta_file, **kwargs):  # pylint: disable=unused-argument
    """Normalizes the sam-accenture meta data file to TTS format
    https://github.com/Sam-Accenture-Non-Binary-Voice/non-binary-voice-files"""
    xml_file = os.path.join(root_path, "voice_over_recordings", meta_file)
    xml_root = ET.parse(xml_file).getroot()
    items = []
    speaker_name = "sam_accenture"
    for item in xml_root.findall("./fileid"):
        text = item.text
        wav_file = os.path.join(root_path, "vo_voice_quality_transformation", item.get("id") + ".wav")
        if not os.path.exists(wav_file):
            print(f" [!] {wav_file} in metafile does not exist. Skipping...")
            continue
        items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path})
    return items


def ruslan(root_path, meta_file, **kwargs):  # pylint: disable=unused-argument
    """Normalizes the RUSLAN meta data file to TTS format
    https://ruslan-corpus.github.io/"""
    txt_file = os.path.join(root_path, meta_file)
    items = []
    speaker_name = "ruslan"
    with open(txt_file, "r", encoding="utf-8") as ttf:
        for line in ttf:
            cols = line.split("|")
            wav_file = os.path.join(root_path, "RUSLAN", cols[0] + ".wav")
            text = cols[1]
            items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path})
    return items


def css10(root_path, meta_file, **kwargs):  # pylint: disable=unused-argument
    """Normalizes the CSS10 dataset file to TTS format"""
    txt_file = os.path.join(root_path, meta_file)
    items = []
    speaker_name = "css10"
    with open(txt_file, "r", encoding="utf-8") as ttf:
        for line in ttf:
            cols = line.split("|")
            wav_file = os.path.join(root_path, cols[0])
            text = cols[1]
            items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name})
    return items


def nancy(root_path, meta_file, **kwargs):  # pylint: disable=unused-argument
    """Normalizes the Nancy meta data file to TTS format"""
    txt_file = os.path.join(root_path, meta_file)
    items = []
    speaker_name = "nancy"
    with open(txt_file, "r", encoding="utf-8") as ttf:
        for line in ttf:
            utt_id = line.split()[1]
            text = line[line.find('"') + 1 : line.rfind('"') - 1]
            wav_file = os.path.join(root_path, "wavn", utt_id + ".wav")
            items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name})
    return items


def common_voice(root_path, meta_file, ignored_speakers=None):
    """Normalize the common voice meta data file to TTS format."""
    txt_file = os.path.join(root_path, meta_file)
    items = []
    with open(txt_file, "r", encoding="utf-8") as ttf:
        for line in ttf:
            if line.startswith("client_id"):
                continue
            cols = line.split("\t")
            text = cols[2]
            speaker_name = cols[0]
            # ignore speakers
            if isinstance(ignored_speakers, list):
                if speaker_name in ignored_speakers:
                    continue
            wav_file = os.path.join(root_path, "clips", cols[1].replace(".mp3", ".wav"))
            items.append(
                {"text": text, "audio_file": wav_file, "speaker_name": "MCV_" + speaker_name, "root_path": root_path}
            )
    return items


def libri_tts(root_path, meta_files=None, ignored_speakers=None):
    """https://ai.google/tools/datasets/libri-tts/"""
    items = []
    if not meta_files:
        meta_files = glob(f"{root_path}/**/*trans.tsv", recursive=True)
    else:
        if isinstance(meta_files, str):
            meta_files = [os.path.join(root_path, meta_files)]

    for meta_file in meta_files:
        _meta_file = os.path.basename(meta_file).split(".")[0]
        with open(meta_file, "r", encoding="utf-8") as ttf:
            for line in ttf:
                cols = line.split("\t")
                file_name = cols[0]
                speaker_name, chapter_id, *_ = cols[0].split("_")
                _root_path = os.path.join(root_path, f"{speaker_name}/{chapter_id}")
                wav_file = os.path.join(_root_path, file_name + ".wav")
                text = cols[2]
                # ignore speakers
                if isinstance(ignored_speakers, list):
                    if speaker_name in ignored_speakers:
                        continue
                items.append(
                    {
                        "text": text,
                        "audio_file": wav_file,
                        "speaker_name": f"LTTS_{speaker_name}",
                        "root_path": root_path,
                    }
                )
    for item in items:
        assert os.path.exists(item["audio_file"]), f" [!] wav files don't exist - {item['audio_file']}"
    return items


def custom_turkish(root_path, meta_file, **kwargs):  # pylint: disable=unused-argument
    txt_file = os.path.join(root_path, meta_file)
    items = []
    speaker_name = "turkish-female"
    skipped_files = []
    with open(txt_file, "r", encoding="utf-8") as ttf:
        for line in ttf:
            cols = line.split("|")
            wav_file = os.path.join(root_path, "wavs", cols[0].strip() + ".wav")
            if not os.path.exists(wav_file):
                skipped_files.append(wav_file)
                continue
            text = cols[1].strip()
            items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path})
    print(f" [!] {len(skipped_files)} files skipped. They don't exist...")
    return items


# ToDo: add the dataset link when the dataset is released publicly
def brspeech(root_path, meta_file, ignored_speakers=None):
    """BRSpeech 3.0 beta"""
    txt_file = os.path.join(root_path, meta_file)
    items = []
    with open(txt_file, "r", encoding="utf-8") as ttf:
        for line in ttf:
            if line.startswith("wav_filename"):
                continue
            cols = line.split("|")
            wav_file = os.path.join(root_path, cols[0])
            text = cols[2]
            speaker_id = cols[3]
            # ignore speakers
            if isinstance(ignored_speakers, list):
                if speaker_id in ignored_speakers:
                    continue
            items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_id, "root_path": root_path})
    return items


def vctk(root_path, meta_files=None, wavs_path="wav48_silence_trimmed", mic="mic1", ignored_speakers=None):
    """VCTK dataset v0.92.

    URL:
        https://datashare.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip

    This dataset has 2 recordings per speaker that are annotated with ```mic1``` and ```mic2```.
    It is believed that (😄 ) ```mic1``` files are the same as the previous version of the dataset.

    mic1:
        Audio recorded using an omni-directional microphone (DPA 4035).
        Contains very low frequency noises.
        This is the same audio released in previous versions of VCTK:
        https://doi.org/10.7488/ds/1994

    mic2:
        Audio recorded using a small diaphragm condenser microphone with
        very wide bandwidth (Sennheiser MKH 800).
        Two speakers, p280 and p315 had technical issues of the audio
        recordings using MKH 800.
    """
    file_ext = "flac"
    items = []
    meta_files = glob(f"{os.path.join(root_path,'txt')}/**/*.txt", recursive=True)
    for meta_file in meta_files:
        _, speaker_id, txt_file = os.path.relpath(meta_file, root_path).split(os.sep)
        file_id = txt_file.split(".")[0]
        # ignore speakers
        if isinstance(ignored_speakers, list):
            if speaker_id in ignored_speakers:
                continue
        with open(meta_file, "r", encoding="utf-8") as file_text:
            text = file_text.readlines()[0]
        # p280 has no mic2 recordings
        if speaker_id == "p280":
            wav_file = os.path.join(root_path, wavs_path, speaker_id, file_id + f"_mic1.{file_ext}")
        else:
            wav_file = os.path.join(root_path, wavs_path, speaker_id, file_id + f"_{mic}.{file_ext}")
        if os.path.exists(wav_file):
            items.append(
                {"text": text, "audio_file": wav_file, "speaker_name": "VCTK_" + speaker_id, "root_path": root_path}
            )
        else:
            print(f" [!] wav files don't exist - {wav_file}")
    return items


def vctk_old(root_path, meta_files=None, wavs_path="wav48", ignored_speakers=None):
    """homepages.inf.ed.ac.uk/jyamagis/release/VCTK-Corpus.tar.gz"""
    items = []
    meta_files = glob(f"{os.path.join(root_path,'txt')}/**/*.txt", recursive=True)
    for meta_file in meta_files:
        _, speaker_id, txt_file = os.path.relpath(meta_file, root_path).split(os.sep)
        file_id = txt_file.split(".")[0]
        # ignore speakers
        if isinstance(ignored_speakers, list):
            if speaker_id in ignored_speakers:
                continue
        with open(meta_file, "r", encoding="utf-8") as file_text:
            text = file_text.readlines()[0]
        wav_file = os.path.join(root_path, wavs_path, speaker_id, file_id + ".wav")
        items.append(
            {"text": text, "audio_file": wav_file, "speaker_name": "VCTK_old_" + speaker_id, "root_path": root_path}
        )
    return items


def synpaflex(root_path, metafiles=None, **kwargs):  # pylint: disable=unused-argument
    items = []
    speaker_name = "synpaflex"
    root_path = os.path.join(root_path, "")
    wav_files = glob(f"{root_path}**/*.wav", recursive=True)
    for wav_file in wav_files:
        if os.sep + "wav" + os.sep in wav_file:
            txt_file = wav_file.replace("wav", "txt")
        else:
            txt_file = os.path.join(
                os.path.dirname(wav_file), "txt", os.path.basename(wav_file).replace(".wav", ".txt")
            )
        if os.path.exists(txt_file) and os.path.exists(wav_file):
            with open(txt_file, "r", encoding="utf-8") as file_text:
                text = file_text.readlines()[0]
            items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path})
    return items


def open_bible(root_path, meta_files="train", ignore_digits_sentences=True, ignored_speakers=None):
    """ToDo: Refer the paper when available"""
    items = []
    split_dir = meta_files
    meta_files = glob(f"{os.path.join(root_path, split_dir)}/**/*.txt", recursive=True)
    for meta_file in meta_files:
        _, speaker_id, txt_file = os.path.relpath(meta_file, root_path).split(os.sep)
        file_id = txt_file.split(".")[0]
        # ignore speakers
        if isinstance(ignored_speakers, list):
            if speaker_id in ignored_speakers:
                continue
        with open(meta_file, "r", encoding="utf-8") as file_text:
            text = file_text.readline().replace("\n", "")
        # ignore sentences that contains digits
        if ignore_digits_sentences and any(map(str.isdigit, text)):
            continue
        wav_file = os.path.join(root_path, split_dir, speaker_id, file_id + ".flac")
        items.append({"text": text, "audio_file": wav_file, "speaker_name": "OB_" + speaker_id, "root_path": root_path})
    return items


def mls(root_path, meta_files=None, ignored_speakers=None):
    """http://www.openslr.org/94/"""
    items = []
    with open(os.path.join(root_path, meta_files), "r", encoding="utf-8") as meta:
        for line in meta:
            file, text = line.split("\t")
            text = text[:-1]
            speaker, book, *_ = file.split("_")
            wav_file = os.path.join(root_path, os.path.dirname(meta_files), "audio", speaker, book, file + ".wav")
            # ignore speakers
            if isinstance(ignored_speakers, list):
                if speaker in ignored_speakers:
                    continue
            items.append(
                {"text": text, "audio_file": wav_file, "speaker_name": "MLS_" + speaker, "root_path": root_path}
            )
    return items


# ======================================== VOX CELEB ===========================================
def voxceleb2(root_path, meta_file=None, **kwargs):  # pylint: disable=unused-argument
    """
    :param meta_file   Used only for consistency with load_tts_samples api
    """
    return _voxcel_x(root_path, meta_file, voxcel_idx="2")


def voxceleb1(root_path, meta_file=None, **kwargs):  # pylint: disable=unused-argument
    """
    :param meta_file   Used only for consistency with load_tts_samples api
    """
    return _voxcel_x(root_path, meta_file, voxcel_idx="1")


def _voxcel_x(root_path, meta_file, voxcel_idx):
    assert voxcel_idx in ["1", "2"]
    expected_count = 148_000 if voxcel_idx == "1" else 1_000_000
    voxceleb_path = Path(root_path)
    cache_to = voxceleb_path / f"metafile_voxceleb{voxcel_idx}.csv"
    cache_to.parent.mkdir(exist_ok=True)

    # if not exists meta file, crawl recursively for 'wav' files
    if meta_file is not None:
        with open(str(meta_file), "r", encoding="utf-8") as f:
            return [x.strip().split("|") for x in f.readlines()]

    elif not cache_to.exists():
        cnt = 0
        meta_data = []
        wav_files = voxceleb_path.rglob("**/*.wav")
        for path in tqdm(
            wav_files,
            desc=f"Building VoxCeleb {voxcel_idx} Meta file ... this needs to be done only once.",
            total=expected_count,
        ):
            speaker_id = str(Path(path).parent.parent.stem)
            assert speaker_id.startswith("id")
            text = None  # VoxCel does not provide transciptions, and they are not needed for training the SE
            meta_data.append(f"{text}|{path}|voxcel{voxcel_idx}_{speaker_id}\n")
            cnt += 1
        with open(str(cache_to), "w", encoding="utf-8") as f:
            f.write("".join(meta_data))
        if cnt < expected_count:
            raise ValueError(f"Found too few instances for Voxceleb. Should be around {expected_count}, is: {cnt}")

    with open(str(cache_to), "r", encoding="utf-8") as f:
        return [x.strip().split("|") for x in f.readlines()]


def emotion(root_path, meta_file, ignored_speakers=None):
    """Generic emotion dataset"""
    txt_file = os.path.join(root_path, meta_file)
    items = []
    with open(txt_file, "r", encoding="utf-8") as ttf:
        for line in ttf:
            if line.startswith("file_path"):
                continue
            cols = line.split(",")
            wav_file = os.path.join(root_path, cols[0])
            speaker_id = cols[1]
            emotion_id = cols[2].replace("\n", "")
            # ignore speakers
            if isinstance(ignored_speakers, list):
                if speaker_id in ignored_speakers:
                    continue
            items.append(
                {"audio_file": wav_file, "speaker_name": speaker_id, "emotion_name": emotion_id, "root_path": root_path}
            )
    return items


def baker(root_path: str, meta_file: str, **kwargs) -> List[List[str]]:  # pylint: disable=unused-argument
    """Normalizes the Baker meta data file to TTS format

    Args:
        root_path (str): path to the baker dataset
        meta_file (str): name of the meta dataset containing names of wav to select and the transcript of the sentence
    Returns:
        List[List[str]]: List of (text, wav_path, speaker_name) associated with each sentences
    """
    txt_file = os.path.join(root_path, meta_file)
    items = []
    speaker_name = "baker"
    with open(txt_file, "r", encoding="utf-8") as ttf:
        for line in ttf:
            wav_name, text = line.rstrip("\n").split("|")
            wav_path = os.path.join(root_path, "clips_22", wav_name)
            items.append({"text": text, "audio_file": wav_path, "speaker_name": speaker_name, "root_path": root_path})
    return items


def kokoro(root_path, meta_file, **kwargs):  # pylint: disable=unused-argument
    """Japanese single-speaker dataset from https://github.com/kaiidams/Kokoro-Speech-Dataset"""
    txt_file = os.path.join(root_path, meta_file)
    items = []
    speaker_name = "kokoro"
    with open(txt_file, "r", encoding="utf-8") as ttf:
        for line in ttf:
            cols = line.split("|")
            wav_file = os.path.join(root_path, "wavs", cols[0] + ".wav")
            text = cols[2].replace(" ", "")
            items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path})
    return items


def kss(root_path, meta_file, **kwargs):  # pylint: disable=unused-argument
    """Korean single-speaker dataset from https://www.kaggle.com/datasets/bryanpark/korean-single-speaker-speech-dataset"""
    txt_file = os.path.join(root_path, meta_file)
    items = []
    speaker_name = "kss"
    with open(txt_file, "r", encoding="utf-8") as ttf:
        for line in ttf:
            cols = line.split("|")
            wav_file = os.path.join(root_path, cols[0])
            text = cols[2]  # cols[1] => 6월, cols[2] => 유월
            items.append({"text": text, "audio_file": wav_file, "speaker_name": speaker_name, "root_path": root_path})
    return items


================================================
FILE: TTS/tts/layers/__init__.py
================================================
from TTS.tts.layers.losses import *


================================================
FILE: TTS/tts/layers/align_tts/__init__.py
================================================


================================================
FILE: TTS/tts/layers/align_tts/duration_predictor.py
================================================
from torch import nn

from TTS.tts.layers.generic.pos_encoding import PositionalEncoding
from TTS.tts.layers.generic.transformer import FFTransformerBlock


class DurationPredictor(nn.Module):
    def __init__(self, num_chars, hidden_channels, hidden_channels_ffn, num_heads):
        super().__init__()
        self.embed = nn.Embedding(num_chars, hidden_channels)
        self.pos_enc = PositionalEncoding(hidden_channels, dropout_p=0.1)
        self.FFT = FFTransformerBlock(hidden_channels, num_heads, hidden_channels_ffn, 2, 0.1)
        self.out_layer = nn.Conv1d(hidden_channels, 1, 1)

    def forward(self, text, text_lengths):
        # B, L -> B, L
        emb = self.embed(text)
        emb = self.pos_enc(emb.transpose(1, 2))
        x = self.FFT(emb, text_lengths)
        x = self.out_layer(x).squeeze(-1)
        return x


================================================
FILE: TTS/tts/layers/align_tts/mdn.py
================================================
from torch import nn


class MDNBlock(nn.Module):
    """Mixture of Density Network implementation
    https://arxiv.org/pdf/2003.01950.pdf
    """

    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.out_channels = out_channels
        self.conv1 = nn.Conv1d(in_channels, in_channels, 1)
        self.norm = nn.LayerNorm(in_channels)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.1)
        self.conv2 = nn.Conv1d(in_channels, out_channels, 1)

    def forward(self, x):
        o = self.conv1(x)
        o = o.transpose(1, 2)
        o = self.norm(o)
        o = o.transpose(1, 2)
        o = self.relu(o)
        o = self.dropout(o)
        mu_sigma = self.conv2(o)
        # TODO: check this sigmoid
        # mu = torch.sigmoid(mu_sigma[:, :self.out_channels//2, :])
        mu = mu_sigma[:, : self.out_channels // 2, :]
        log_sigma = mu_sigma[:, self.out_channels // 2 :, :]
        return mu, log_sigma


================================================
FILE: TTS/tts/layers/feed_forward/__init__.py
================================================


================================================
FILE: TTS/tts/layers/feed_forward/decoder.py
================================================
import torch
from torch import nn

from TTS.tts.layers.generic.res_conv_bn import Conv1dBN, Conv1dBNBlock, ResidualConv1dBNBlock
from TTS.tts.layers.generic.transformer import FFTransformerBlock
from TTS.tts.layers.generic.wavenet import WNBlocks
from TTS.tts.layers.glow_tts.transformer import RelativePositionTransformer


class WaveNetDecoder(nn.Module):
    """WaveNet based decoder with a prenet and a postnet.

    prenet: conv1d_1x1
    postnet: 3 x [conv1d_1x1 -> relu] -> conv1d_1x1

    TODO: Integrate speaker conditioning vector.

    Note:
        default wavenet parameters;
            params = {
                "num_blocks": 12,
                "hidden_channels":192,
                "kernel_size": 5,
                "dilation_rate": 1,
                "num_layers": 4,
                "dropout_p": 0.05
            }

    Args:
        in_channels (int): number of input channels.
        out_channels (int): number of output channels.
        hidden_channels (int): number of hidden channels for prenet and postnet.
        params (dict): dictionary for residual convolutional blocks.
    """

    def __init__(self, in_channels, out_channels, hidden_channels, c_in_channels, params):
        super().__init__()
        # prenet
        self.prenet = torch.nn.Conv1d(in_channels, params["hidden_channels"], 1)
        # wavenet layers
        self.wn = WNBlocks(params["hidden_channels"], c_in_channels=c_in_channels, **params)
        # postnet
        self.postnet = [
            torch.nn.Conv1d(params["hidden_channels"], hidden_channels, 1),
            torch.nn.ReLU(),
            torch.nn.Conv1d(hidden_channels, hidden_channels, 1),
            torch.nn.ReLU(),
            torch.nn.Conv1d(hidden_channels, hidden_channels, 1),
            torch.nn.ReLU(),
            torch.nn.Conv1d(hidden_channels, out_channels, 1),
        ]
        self.postnet = nn.Sequential(*self.postnet)

    def forward(self, x, x_mask=None, g=None):
        x = self.prenet(x) * x_mask
        x = self.wn(x, x_mask, g)
        o = self.postnet(x) * x_mask
        return o


class RelativePositionTransformerDecoder(nn.Module):
    """Decoder with Relative Positional Transformer.

    Note:
        Default params
            params={
                'hidden_channels_ffn': 128,
                'num_heads': 2,
                "kernel_size": 3,
                "dropout_p": 0.1,
                "num_layers": 8,
                "rel_attn_window_size": 4,
                "input_length": None
            }

    Args:
        in_channels (int): number of input channels.
        out_channels (int): number of output channels.
        hidden_channels (int): number of hidden channels including Transformer layers.
        params (dict): dictionary for residual convolutional blocks.
    """

    def __init__(self, in_channels, out_channels, hidden_channels, params):
        super().__init__()
        self.prenet = Conv1dBN(in_channels, hidden_channels, 1, 1)
        self.rel_pos_transformer = RelativePositionTransformer(in_channels, out_channels, hidden_channels, **params)

    def forward(self, x, x_mask=None, g=None):  # pylint: disable=unused-argument
        o = self.prenet(x) * x_mask
        o = self.rel_pos_transformer(o, x_mask)
        return o


class FFTransformerDecoder(nn.Module):
    """Decoder with FeedForwardTransformer.

    Default params
            params={
                'hidden_channels_ffn': 1024,
                'num_heads': 2,
                "dropout_p": 0.1,
                "num_layers": 6,
            }

    Args:
        in_channels (int): number of input channels.
        out_channels (int): number of output channels.
        hidden_channels (int): number of hidden channels including Transformer layers.
        params (dict): dictionary for residual convolutional blocks.
    """

    def __init__(self, in_channels, out_channels, params):
        super().__init__()
        self.transformer_block = FFTransformerBlock(in_channels, **params)
        self.postnet = nn.Conv1d(in_channels, out_channels, 1)

    def forward(self, x, x_mask=None, g=None):  # pylint: disable=unused-argument
        # TODO: handle multi-speaker
        x_mask = 1 if x_mask is None else x_mask
        o = self.transformer_block(x) * x_mask
        o = self.postnet(o) * x_mask
        return o


class ResidualConv1dBNDecoder(nn.Module):
    """Residual Convolutional Decoder as in the original Speedy Speech paper

    TODO: Integrate speaker conditioning vector.

    Note:
        Default params
                params = {
                    "kernel_size": 4,
                    "dilations": 4 * [1, 2, 4, 8] + [1],
                    "num_conv_blocks": 2,
                    "num_res_blocks": 17
                }

    Args:
        in_channels (int): number of input channels.
        out_channels (int): number of output channels.
        hidden_channels (int): number of hidden channels including ResidualConv1dBNBlock layers.
        params (dict): dictionary for residual convolutional blocks.
    """

    def __init__(self, in_channels, out_channels, hidden_channels, params):
        super().__init__()
        self.res_conv_block = ResidualConv1dBNBlock(in_channels, hidden_channels, hidden_channels, **params)
        self.post_conv = nn.Conv1d(hidden_channels, hidden_channels, 1)
        self.postnet = nn.Sequential(
            Conv1dBNBlock(
                hidden_channels, hidden_channels, hidden_channels, params["kernel_size"], 1, num_conv_blocks=2
            ),
            nn.Conv1d(hidden_channels, out_channels, 1),
        )

    def forward(self, x, x_mask=None, g=None):  # pylint: disable=unused-argument
        o = self.res_conv_block(x, x_mask)
        o = self.post_conv(o) + x
        return self.postnet(o) * x_mask


class Decoder(nn.Module):
    """Decodes the expanded phoneme encoding into spectrograms
    Args:
        out_channels (int): number of output channels.
        in_hidden_channels (int): input and hidden channels. Model keeps the input channels for the intermediate layers.
        decoder_type (str): decoder layer types. 'transformers' or 'residual_conv_bn'. Default 'residual_conv_bn'.
        decoder_params (dict): model parameters for specified decoder type.
        c_in_channels (int): number of channels for conditional input.

    Shapes:
        - input: (B, C, T)
    """

    # pylint: disable=dangerous-default-value
    def __init__(
        self,
        out_channels,
        in_hidden_channels,
        decoder_type="residual_conv_bn",
        decoder_params={
            "kernel_size": 4,
            "dilations": 4 * [1, 2, 4, 8] + [1],
            "num_conv_blocks": 2,
            "num_res_blocks": 17,
        },
        c_in_channels=0,
    ):
        super().__init__()

        if decoder_type.lower() == "relative_position_transformer":
            self.decoder = RelativePositionTransformerDecoder(
                in_channels=in_hidden_channels,
                out_channels=out_channels,
                hidden_channels=in_hidden_channels,
                params=decoder_params,
            )
        elif decoder_type.lower() == "residual_conv_bn":
            self.decoder = ResidualConv1dBNDecoder(
                in_channels=in_hidden_channels,
                out_channels=out_channels,
                hidden_channels=in_hidden_channels,
                params=decoder_params,
            )
        elif decoder_type.lower() == "wavenet":
            self.decoder = WaveNetDecoder(
                in_channels=in_hidden_channels,
                out_channels=out_channels,
                hidden_channels=in_hidden_channels,
                c_in_channels=c_in_channels,
                params=decoder_params,
            )
        elif decoder_type.lower() == "fftransformer":
            self.decoder = FFTransformerDecoder(in_hidden_channels, out_channels, decoder_params)
        else:
            raise ValueError(f"[!] Unknown decoder type - {decoder_type}")

    def forward(self, x, x_mask, g=None):  # pylint: disable=unused-argument
        """
        Args:
            x: [B, C, T]
            x_mask: [B, 1, T]
            g: [B, C_g, 1]
        """
        # TODO: implement multi-speaker
        o = self.decoder(x, x_mask, g)
        return o


================================================
FILE: TTS/tts/layers/feed_forward/duration_predictor.py
================================================
from torch import nn

from TTS.tts.layers.generic.res_conv_bn import Conv1dBN


class DurationPredictor(nn.Module):
    """Speedy Speech duration predictor model.
    Predicts phoneme durations from encoder outputs.

    Note:
        Outputs interpreted as log(durations)
        To get actual durations, do exp transformation

    conv_BN_4x1 -> conv_BN_3x1 -> conv_BN_1x1 -> conv_1x1

    Args:
        hidden_channels (int): number of channels in the inner layers.
    """

    def __init__(self, hidden_channels):
        super().__init__()

        self.layers = nn.ModuleList(
            [
                Conv1dBN(hidden_channels, hidden_channels, 4, 1),
                Conv1dBN(hidden_channels, hidden_channels, 3, 1),
                Conv1dBN(hidden_channels, hidden_channels, 1, 1),
                nn.Conv1d(hidden_channels, 1, 1),
            ]
        )

    def forward(self, x, x_mask):
        """
        Shapes:
            x: [B, C, T]
            x_mask: [B, 1, T]
        """
        o = x
        for layer in self.layers:
            o = layer(o) * x_mask
        return o


================================================
FILE: TTS/tts/layers/feed_forward/encoder.py
================================================
from torch import nn

from TTS.tts.layers.generic.res_conv_bn import ResidualConv1dBNBlock
from TTS.tts.layers.generic.transformer import FFTransformerBlock
from TTS.tts.layers.glow_tts.transformer import RelativePositionTransformer


class RelativePositionTransformerEncoder(nn.Module):
    """Speedy speech encoder built on Transformer with Relative Position encoding.

    TODO: Integrate speaker conditioning vector.

    Args:
        in_channels (int): number of input channels.
        out_channels (int): number of output channels.
        hidden_channels (int): number of hidden channels
        params (dict): dictionary for residual convolutional blocks.
    """

    def __init__(self, in_channels, out_channels, hidden_channels, params):
        super().__init__()
        self.prenet = ResidualConv1dBNBlock(
            in_channels,
            hidden_channels,
            hidden_channels,
            kernel_size=5,
            num_res_blocks=3,
            num_conv_blocks=1,
            dilations=[1, 1, 1],
        )
        self.rel_pos_transformer = RelativePositionTransformer(hidden_channels, out_channels, hidden_channels, **params)

    def forward(self, x, x_mask=None, g=None):  # pylint: disable=unused-argument
        if x_mask is None:
            x_mask = 1
        o = self.prenet(x) * x_mask
        o = self.rel_pos_transformer(o, x_mask)
        return o


class ResidualConv1dBNEncoder(nn.Module):
    """Residual Convolutional Encoder as in the original Speedy Speech paper

    TODO: Integrate speaker conditioning vector.

    Args:
        in_channels (int): number of input channels.
        out_channels (int): number of output channels.
        hidden_channels (int): number of hidden channels
        params (dict): dictionary for residual convolutional blocks.
    """

    def __init__(self, in_channels, out_channels, hidden_channels, params):
        super().__init__()
        self.prenet = nn.Sequential(nn.Conv1d(in_channels, hidden_channels, 1), nn.ReLU())
        self.res_conv_block = ResidualConv1dBNBlock(hidden_channels, hidden_channels, hidden_channels, **params)

        self.postnet = nn.Sequential(
            *[
                nn.Conv1d(hidden_channels, hidden_channels, 1),
                nn.ReLU(),
                nn.BatchNorm1d(hidden_channels),
                nn.Conv1d(hidden_channels, out_channels, 1),
            ]
        )

    def forward(self, x, x_mask=None, g=None):  # pylint: disable=unused-argument
        if x_mask is None:
            x_mask = 1
        o = self.prenet(x) * x_mask
        o = self.res_conv_block(o, x_mask)
        o = self.postnet(o + x) * x_mask
        return o * x_mask


class Encoder(nn.Module):
    # pylint: disable=dangerous-default-value
    """Factory class for Speedy Speech encoder enables different encoder types internally.

    Args:
        num_chars (int): number of characters.
        out_channels (int): number of output channels.
        in_hidden_channels (int): input and hidden channels. Model keeps the input channels for the intermediate layers.
        encoder_type (str): encoder layer types. 'transformers' or 'residual_conv_bn'. Default 'residual_conv_bn'.
        encoder_params (dict): model parameters for specified encoder type.
        c_in_channels (int): number of channels for conditional input.

    Note:
        Default encoder_params to be set in config.json...

        ```python
        # for 'relative_position_transformer'
        encoder_params={
            'hidden_channels_ffn': 128,
            'num_heads': 2,
            "kernel_size": 3,
            "dropout_p": 0.1,
            "num_layers": 6,
            "rel_attn_window_size": 4,
            "input_length": None
        },

        # for 'residual_conv_bn'
        encoder_params = {
            "kernel_size": 4,
            "dilations": 4 * [1, 2, 4] + [1],
            "num_conv_blocks": 2,
            "num_res_blocks": 13
        }

        # for 'fftransformer'
        encoder_params = {
            "hidden_channels_ffn": 1024 ,
            "num_heads": 2,
            "num_layers": 6,
            "dropout_p": 0.1
        }
        ```
    """

    def __init__(
        self,
        in_hidden_channels,
        out_channels,
        encoder_type="residual_conv_bn",
        encoder_params={"kernel_size": 4, "dilations": 4 * [1, 2, 4] + [1], "num_conv_blocks": 2, "num_res_blocks": 13},
        c_in_channels=0,
    ):
        super().__init__()
        self.out_channels = out_channels
        self.in_channels = in_hidden_channels
        self.hidden_channels = in_hidden_channels
        self.encoder_type = encoder_type
        self.c_in_channels = c_in_channels

        # init encoder
        if encoder_type.lower() == "relative_position_transformer":
            # text encoder
            # pylint: disable=unexpected-keyword-arg
            self.encoder = RelativePositionTransformerEncoder(
                in_hidden_channels, out_channels, in_hidden_channels, encoder_params
            )
        elif encoder_type.lower() == "residual_conv_bn":
            self.encoder = ResidualConv1dBNEncoder(in_hidden_channels, out_channels, in_hidden_channels, encoder_params)
        elif encoder_type.lower() == "fftransformer":
            assert (
                in_hidden_channels == out_channels
            ), "[!] must be `in_channels` == `out_channels` when encoder type is 'fftransformer'"
            # pylint: disable=unexpected-keyword-arg
            self.encoder = FFTransformerBlock(in_hidden_channels, **encoder_params)
        else:
            raise NotImplementedError(" [!] unknown encoder type.")

    def forward(self, x, x_mask, g=None):  # pylint: disable=unused-argument
        """
        Shapes:
            x: [B, C, T]
            x_mask: [B, 1, T]
            g: [B, C, 1]
        """
        o = self.encoder(x, x_mask)
        return o * x_mask


================================================
FILE: TTS/tts/layers/generic/__init__.py
================================================


================================================
FILE: TTS/tts/layers/generic/aligner.py
================================================
from typing import Tuple

import torch
from torch import nn


class AlignmentNetwork(torch.nn.Module):
    """Aligner Network for learning alignment between the input text and the model output with Gaussian Attention.

    ::

        query -> conv1d -> relu -> conv1d -> relu -> conv1d -> L2_dist -> softmax -> alignment
        key   -> conv1d -> relu -> conv1d -----------------------^

    Args:
        in_query_channels (int): Number of channels in the query network. Defaults to 80.
        in_key_channels (int): Number of channels in the key network. Defaults to 512.
        attn_channels (int): Number of inner channels in the attention layers. Defaults to 80.
        temperature (float): Temperature for the softmax. Defaults to 0.0005.
    """

    def __init__(
        self,
        in_query_channels=80,
        in_key_channels=512,
        attn_channels=80,
        temperature=0.0005,
    ):
        super().__init__()
        self.temperature = temperature
        self.softmax = torch.nn.Softmax(dim=3)
        self.log_softmax = torch.nn.LogSoftmax(dim=3)

        self.key_layer = nn.Sequential(
            nn.Conv1d(
                in_key_channels,
                in_key_channels * 2,
                kernel_size=3,
                padding=1,
                bias=True,
            ),
            torch.nn.ReLU(),
            nn.Conv1d(in_key_channels * 2, attn_channels, kernel_size=1, padding=0, bias=True),
        )

        self.query_layer = nn.Sequential(
            nn.Conv1d(
                in_query_channels,
                in_query_channels * 2,
                kernel_size=3,
                padding=1,
                bias=True,
            ),
            torch.nn.ReLU(),
            nn.Conv1d(in_query_channels * 2, in_query_channels, kernel_size=1, padding=0, bias=True),
            torch.nn.ReLU(),
            nn.Conv1d(in_query_channels, attn_channels, kernel_size=1, padding=0, bias=True),
        )

    def forward(
        self, queries: torch.tensor, keys: torch.tensor, mask: torch.tensor = None, attn_prior: torch.tensor = None
    ) -> Tuple[torch.tensor, torch.tensor]:
        """Forward pass of the aligner encoder.
        Shapes:
            - queries: :math:`[B, C, T_de]`
            - keys: :math:`[B, C_emb, T_en]`
            - mask: :math:`[B, T_de]`
        Output:
            attn (torch.tensor): :math:`[B, 1, T_en, T_de]` soft attention mask.
            attn_logp (torch.tensor): :math:`[ßB, 1, T_en , T_de]` log probabilities.
        """
        key_out = self.key_layer(keys)
        query_out = self.query_layer(queries)
        attn_factor = (query_out[:, :, :, None] - key_out[:, :, None]) ** 2
        attn_logp = -self.temperature * attn_factor.sum(1, keepdim=True)
        if attn_prior is not None:
            attn_logp = self.log_softmax(attn_logp) + torch.log(attn_prior[:, None] + 1e-8)
        if mask is not None:
            attn_logp.data.masked_fill_(~mask.bool().unsqueeze(2), -float("inf"))
        attn = self.softmax(attn_logp)
        return attn, attn_logp


================================================
FILE: TTS/tts/layers/generic/gated_conv.py
================================================
from torch import nn

from .normalization import LayerNorm


class GatedConvBlock(nn.Module):
    """Gated convolutional block as in https://arxiv.org/pdf/1612.08083.pdf
    Args:
        in_out_channels (int): number of input/output channels.
        kernel_size (int): convolution kernel size.
        dropout_p (float): dropout rate.
    """

    def __init__(self, in_out_channels, kernel_size, dropout_p, num_layers):
        super().__init__()
        # class arguments
        self.dropout_p = dropout_p
        self.num_layers = num_layers
        # define layers
        self.conv_layers = nn.ModuleList()
        self.norm_layers = nn.ModuleList()
        self.layers = nn.ModuleList()
        for _ in range(num_layers):
            self.conv_layers += [nn.Conv1d(in_out_channels, 2 * in_out_channels, kernel_size, padding=kernel_size // 2)]
            self.norm_layers += [LayerNorm(2 * in_out_channels)]

    def forward(self, x, x_mask):
        o = x
        res = x
        for idx in range(self.num_layers):
            o = nn.functional.dropout(o, p=self.dropout_p, training=self.training)
            o = self.conv_layers[idx](o * x_mask)
            o = self.norm_layers[idx](o)
            o = nn.functional.glu(o, dim=1)
            o = res + o
            res = o
        return o


================================================
FILE: TTS/tts/layers/generic/normalization.py
================================================
import torch
from torch import nn


class LayerNorm(nn.Module):
    def __init__(self, channels, eps=1e-4):
        """Layer norm for the 2nd dimension of the input.
        Args:
            channels (int): number of channels (2nd dimension) of the input.
            eps (float): to prevent 0 division

        Shapes:
            - input: (B, C, T)
            - output: (B, C, T)
        """
        super().__init__()
        self.channels = channels
        self.eps = eps

        self.gamma = nn.Parameter(torch.ones(1, channels, 1) * 0.1)
        self.beta = nn.Parameter(torch.zeros(1, channels, 1))

    def forward(self, x):
        mean = torch.mean(x, 1, keepdim=True)
        variance = torch.mean((x - mean) ** 2, 1, keepdim=True)
        x = (x - mean) * torch.rsqrt(variance + self.eps)
        x = x * self.gamma + self.beta
        return x


class LayerNorm2(nn.Module):
    """Layer norm for the 2nd dimension of the input using torch primitive.
    Args:
        channels (int): number of channels (2nd dimension) of the input.
        eps (float): to prevent 0 division

    Shapes:
        - input: (B, C, T)
        - output: (B, C, T)
    """

    def __init__(self, channels, eps=1e-5):
        super().__init__()
        self.channels = channels
        self.eps = eps

        self.gamma = nn.Parameter(torch.ones(channels))
        self.beta = nn.Parameter(torch.zeros(channels))

    def forward(self, x):
        x = x.transpose(1, -1)
        x = torch.nn.functional.layer_norm(x, (self.channels,), self.gamma, self.beta, self.eps)
        return x.transpose(1, -1)


class TemporalBatchNorm1d(nn.BatchNorm1d):
    """Normalize each channel separately over time and batch."""

    def __init__(self, channels, affine=True, track_running_stats=True, momentum=0.1):
        super().__init__(channels, affine=affine, track_running_stats=track_running_stats, momentum=momentum)

    def forward(self, x):
        return super().forward(x.transpose(2, 1)).transpose(2, 1)


class ActNorm(nn.Module):
    """Activation Normalization bijector as an alternative to Batch Norm. It computes
    mean and std from a sample data in advance and it uses these values
    for normalization at training.

    Args:
        channels (int): input channels.
        ddi (False): data depended initialization flag.

    Shapes:
        - inputs: (B, C, T)
        - outputs: (B, C, T)
    """

    def __init__(self, channels, ddi=False, **kwargs):  # pylint: disable=unused-argument
        super().__init__()
        self.channels = channels
        self.initialized = not ddi

        self.logs = nn.Parameter(torch.zeros(1, channels, 1))
        self.bias = nn.Parameter(torch.zeros(1, channels, 1))

    def forward(self, x, x_mask=None, reverse=False, **kwargs):  # pylint: disable=unused-argument
        if x_mask is None:
            x_mask = torch.ones(x.size(0), 1, x.size(2)).to(device=x.device, dtype=x.dtype)
        x_len = torch.sum(x_mask, [1, 2])
        if not self.initialized:
            self.initialize(x, x_mask)
            self.initialized = True

        if reverse:
            z = (x - self.bias) * torch.exp(-self.logs) * x_mask
            logdet = None
        else:
            z = (self.bias + torch.exp(self.logs) * x) * x_mask
            logdet = torch.sum(self.logs) * x_len  # [b]

        return z, logdet

    def store_inverse(self):
        pass

    def set_ddi(self, ddi):
        self.initialized = not ddi

    def initialize(self, x, x_mask):
        with torch.no_grad():
            denom = torch.sum(x_mask, [0, 2])
            m = torch.sum(x * x_mask, [0, 2]) / denom
            m_sq = torch.sum(x * x * x_mask, [0, 2]) / denom
            v = m_sq - (m**2)
            logs = 0.5 * torch.log(torch.clamp_min(v, 1e-6))

            bias_init = (-m * torch.exp(-logs)).view(*self.bias.shape).to(dtype=self.bias.dtype)
            logs_init = (-logs).view(*self.logs.shape).to(dtype=self.logs.dtype)

            self.bias.data.copy_(bias_init)
            self.logs.data.copy_(logs_init)


================================================
FILE: TTS/tts/layers/generic/pos_encoding.py
================================================
import math

import torch
from torch import nn


class PositionalEncoding(nn.Module):
    """Sinusoidal positional encoding for non-recurrent neural networks.
    Implementation based on "Attention Is All You Need"

    Args:
       channels (int): embedding size
       dropout_p (float): dropout rate applied to the output.
       max_len (int): maximum sequence length.
       use_scale (bool): whether to use a learnable scaling coefficient.
    """

    def __init__(self, channels, dropout_p=0.0, max_len=5000, use_scale=False):
        super().__init__()
        if channels % 2 != 0:
            raise ValueError(
                "Cannot use sin/cos positional encoding with " "odd channels (got channels={:d})".format(channels)
            )
        self.use_scale = use_scale
        if use_scale:
            self.scale = torch.nn.Parameter(torch.ones(1))
        pe = torch.zeros(max_len, channels)
        position = torch.arange(0, max_len).unsqueeze(1)
        div_term = torch.pow(10000, torch.arange(0, channels, 2).float() / channels)
        pe[:, 0::2] = torch.sin(position.float() * div_term)
        pe[:, 1::2] = torch.cos(position.float() * div_term)
        pe = pe.unsqueeze(0).transpose(1, 2)
        self.register_buffer("pe", pe)
        if dropout_p > 0:
            self.dropout = nn.Dropout(p=dropout_p)
        self.channels = channels

    def forward(self, x, mask=None, first_idx=None, last_idx=None):
        """
        Shapes:
            x: [B, C, T]
            mask: [B, 1, T]
            first_idx: int
            last_idx: int
        """

        x = x * math.sqrt(self.channels)
        if first_idx is None:
            if self.pe.size(2) < x.size(2):
                raise RuntimeError(
                    f"Sequence is {x.size(2)} but PositionalEncoding is"
                    f" limited to {self.pe.size(2)}. See max_len argument."
                )
            if mask is not None:
                pos_enc = self.pe[:, :, : x.size(2)] * mask
            else:
                pos_enc = self.pe[:, :, : x.size(2)]
            if self.use_scale:
                x = x + self.scale * pos_enc
            else:
                x = x + pos_enc
        else:
            if self.use_scale:
                x = x + self.scale * self.pe[:, :, first_idx:last_idx]
            else:
                x = x + self.pe[:, :, first_idx:last_idx]
        if hasattr(self, "dropout"):
            x = self.dropout(x)
        return x


================================================
FILE: TTS/tts/layers/generic/res_conv_bn.py
================================================
from torch import nn


class ZeroTemporalPad(nn.Module):
    """Pad sequences to equal lentgh in the temporal dimension"""

    def __init__(self, kernel_size, dilation):
        super().__init__()
        total_pad = dilation * (kernel_size - 1)
        begin = total_pad // 2
        end = total_pad - begin
        self.pad_layer = nn.ZeroPad2d((0, 0, begin, end))

    def forward(self, x):
        return self.pad_layer(x)


class Conv1dBN(nn.Module):
    """1d convolutional with batch norm.
    conv1d -> relu -> BN blocks.

    Note:
        Batch normalization is applied after ReLU regarding the original implementation.

    Args:
        in_channels (int): number of input channels.
        out_channels (int): number of output channels.
        kernel_size (int): kernel size for convolutional filters.
        dilation (int): dilation for convolution layers.
    """

    def __init__(self, in_channels, out_channels, kernel_size, dilation):
        super().__init__()
        padding = dilation * (kernel_size - 1)
        pad_s = padding // 2
        pad_e = padding - pad_s
        self.conv1d = nn.Conv1d(in_channels, out_channels, kernel_size, dilation=dilation)
        self.pad = nn.ZeroPad2d((pad_s, pad_e, 0, 0))  # uneven left and right padding
        self.norm = nn.BatchNorm1d(out_channels)

    def forward(self, x):
        o = self.conv1d(x)
        o = self.pad(o)
        o = nn.functional.relu(o)
        o = self.norm(o)
        return o


class Conv1dBNBlock(nn.Module):
    """1d convolutional block with batch norm. It is a set of conv1d -> relu -> BN blocks.

    Args:
        in_channels (int): number of input channels.
        out_channels (int): number of output channels.
        hidden_channels (int): number of inner convolution channels.
        kernel_size (int): kernel size for convolutional filters.
        dilation (int): dilation for convolution layers.
        num_conv_blocks (int, optional): number of convolutional blocks. Defaults to 2.
    """

    def __init__(self, in_channels, out_channels, hidden_channels, kernel_size, dilation, num_conv_blocks=2):
        super().__init__()
        self.conv_bn_blocks = []
        for idx in range(num_conv_blocks):
            layer = Conv1dBN(
                in_channels if idx == 0 else hidden_channels,
                out_channels if idx == (num_conv_blocks - 1) else hidden_channels,
                kernel_size,
                dilation,
            )
            self.conv_bn_blocks.append(layer)
        self.conv_bn_blocks = nn.Sequential(*self.conv_bn_blocks)

    def forward(self, x):
        """
        Shapes:
            x: (B, D, T)
        """
        return self.conv_bn_blocks(x)


class ResidualConv1dBNBlock(nn.Module):
    """Residual Convolutional Blocks with BN
    Each block has 'num_conv_block' conv layers and 'num_res_blocks' such blocks are connected
    with residual connections.

    conv_block = (conv1d -> relu -> bn) x 'num_conv_blocks'
    residuak_conv_block =  (x -> conv_block ->  + ->) x 'num_res_blocks'
                            ' - - - - - - - - - ^
    Args:
        in_channels (int): number of input channels.
        out_channels (int): number of output channels.
        hidden_channels (int): number of inner convolution channels.
        kernel_size (int): kernel size for convolutional filters.
        dilations (list): dilations for each convolution layer.
        num_res_blocks (int, optional): number of residual blocks. Defaults to 13.
        num_conv_blocks (int, optional): number of convolutional blocks in each residual block. Defaults to 2.
    """

    def __init__(
        self, in_channels, out_channels, hidden_channels, kernel_size, dilations, num_res_blocks=13, num_conv_blocks=2
    ):
        super().__init__()
        assert len(dilations) == num_res_blocks
        self.res_blocks = nn.ModuleList()
        for idx, dilation in enumerate(dilations):
            block = Conv1dBNBlock(
                in_channels if idx == 0 else hidden_channels,
                out_channels if (idx + 1) == len(dilations) else hidden_channels,
                hidden_channels,
                kernel_size,
                dilation,
                num_conv_blocks,
            )
            self.res_blocks.append(block)

    def forward(self, x, x_mask=None):
        if x_mask is None:
            x_mask = 1.0
        o = x * x_mask
        for block in self.res_blocks:
            res = o
            o = block(o)
            o = o + res
            if x_mask is not None:
                o = o * x_mask
        return o


================================================
FILE: TTS/tts/layers/generic/time_depth_sep_conv.py
================================================
import torch
from torch import nn


class TimeDepthSeparableConv(nn.Module):
    """Time depth separable convolution as in https://arxiv.org/pdf/1904.02619.pdf
    It shows competative results with less computation and memory footprint."""

    def __init__(self, in_channels, hid_channels, out_channels, kernel_size, bias=True):
        super().__init__()

        self.in_channels = in_channels
        self.out_channels = out_channels
        self.hid_channels = hid_channels
        self.kernel_size = kernel_size

        self.time_conv = nn.Conv1d(
            in_channels,
            2 * hid_channels,
            kernel_size=1,
            stride=1,
            padding=0,
            bias=bias,
        )
        self.norm1 = nn.BatchNorm1d(2 * hid_channels)
        self.depth_conv = nn.Conv1d(
            hid_channels,
            hid_channels,
            kernel_size,
            stride=1,
            padding=(kernel_size - 1) // 2,
            groups=hid_channels,
            bias=bias,
        )
        self.norm2 = nn.BatchNorm1d(hid_channels)
        self.time_conv2 = nn.Conv1d(
            hid_channels,
            out_channels,
            kernel_size=1,
            stride=1,
            padding=0,
            bias=bias,
        )
        self.norm3 = nn.BatchNorm1d(out_channels)

    def forward(self, x):
        x_res = x
        x = self.time_conv(x)
        x = self.norm1(x)
        x = nn.functional.glu(x, dim=1)
        x = self.depth_conv(x)
        x = self.norm2(x)
        x = x * torch.sigmoid(x)
        x = self.time_conv2(x)
        x = self.norm3(x)
        x = x_res + x
        return x


class TimeDepthSeparableConvBlock(nn.Module):
    def __init__(self, in_channels, hid_channels, out_channels, num_layers, kernel_size, bias=True):
        super().__init__()
        assert (kernel_size - 1) % 2 == 0
        assert num_layers > 1

        self.layers = nn.ModuleList()
        layer = TimeDepthSeparableConv(
            in_channels, hid_channels, out_channels if num_layers == 1 else hid_channels, kernel_size, bias
        )
        self.layers.append(layer)
        for idx in range(num_layers - 1):
            layer = TimeDepthSeparableConv(
                hid_channels,
                hid_channels,
                out_channels if (idx + 1) == (num_layers - 1) else hid_channels,
                kernel_size,
                bias,
            )
            self.layers.append(layer)

    def forward(self, x, mask):
        for layer in self.layers:
            x = layer(x * mask)
        return x


================================================
FILE: TTS/tts/layers/generic/transformer.py
================================================
import torch
import torch.nn.functional as F
from torch import nn


class FFTransformer(nn.Module):
    def __init__(self, in_out_channels, num_heads, hidden_channels_ffn=1024, kernel_size_fft=3, dropout_p=0.1):
        super().__init__()
        self.self_attn = nn.MultiheadAttention(in_out_channels, num_heads, dropout=dropout_p)

        padding = (kernel_size_fft - 1) // 2
        self.conv1 = nn.Conv1d(in_out_channels, hidden_channels_ffn, kernel_size=kernel_size_fft, padding=padding)
        self.conv2 = nn.Conv1d(hidden_channels_ffn, in_out_channels, kernel_size=kernel_size_fft, padding=padding)

        self.norm1 = nn.LayerNorm(in_out_channels)
        self.norm2 = nn.LayerNorm(in_out_channels)

        self.dropout1 = nn.Dropout(dropout_p)
        self.dropout2 = nn.Dropout(dropout_p)

    def forward(self, src, src_mask=None, src_key_padding_mask=None):
        """😦 ugly looking with all the transposing"""
        src = src.permute(2, 0, 1)
        src2, enc_align = self.self_attn(src, src, src, attn_mask=src_mask, key_padding_mask=src_key_padding_mask)
        src = src + self.dropout1(src2)
        src = self.norm1(src + src2)
        # T x B x D -> B x D x T
        src = src.permute(1, 2, 0)
        src2 = self.conv2(F.relu(self.conv1(src)))
        src2 = self.dropout2(src2)
        src = src + src2
        src = src.transpose(1, 2)
        src = self.norm2(src)
        src = src.transpose(1, 2)
        return src, enc_align


class FFTransformerBlock(nn.Module):
    def __init__(self, in_out_channels, num_heads, hidden_channels_ffn, num_layers, dropout_p):
        super().__init__()
        self.fft_layers = nn.ModuleList(
            [
                FFTransformer(
                    in_out_channels=in_out_channels,
                    num_heads=num_heads,
                    hidden_channels_ffn=hidden_channels_ffn,
                    dropout_p=dropout_p,
                )
                for _ in range(num_layers)
            ]
        )

    def forward(self, x, mask=None, g=None):  # pylint: disable=unused-argument
        """
        TODO: handle multi-speaker
        Shapes:
            - x: :math:`[B, C, T]`
            - mask:  :math:`[B, 1, T] or [B, T]`
        """
        if mask is not None and mask.ndim == 3:
            mask = mask.squeeze(1)
            # mask is negated, torch uses 1s and 0s reversely.
            mask = ~mask.bool()
        alignments = []
        for layer in self.fft_layers:
            x, align = layer(x, src_key_padding_mask=mask)
            alignments.append(align.unsqueeze(1))
        alignments = torch.cat(alignments, 1)
        return x


class FFTDurationPredictor:
    def __init__(
        self, in_channels, hidden_channels, num_heads, num_layers, dropout_p=0.1, cond_channels=None
    ):  # pylint: disable=unused-argument
        self.fft = FFTransformerBlock(in_channels, num_heads, hidden_channels, num_layers, dropout_p)
        self.proj = nn.Linear(in_channels, 1)

    def forward(self, x, mask=None, g=None):  # pylint: disable=unused-argument
        """
        Shapes:
            - x: :math:`[B, C, T]`
            - mask:  :math:`[B, 1, T]`

        TODO: Handle the cond input
        """
        x = self.fft(x, mask=mask)
        x = self.proj(x)
        return x


================================================
FILE: TTS/tts/layers/generic/wavenet.py
================================================
import torch
from torch import nn


@torch.jit.script
def fused_add_tanh_sigmoid_multiply(input_a, input_b, n_channels):
    n_channels_int = n_channels[0]
    in_act = input_a + input_b
    t_act = torch.tanh(in_act[:, :n_channels_int, :])
    s_act = torch.sigmoid(in_act[:, n_channels_int:, :])
    acts = t_act * s_act
    return acts


class WN(torch.nn.Module):
    """Wavenet layers with weight norm and no input conditioning.

         |-----------------------------------------------------------------------------|
         |                                    |-> tanh    -|                           |
    res -|- conv1d(dilation) -> dropout -> + -|            * -> conv1d1x1 -> split -|- + -> res
    g -------------------------------------|  |-> sigmoid -|                        |
    o --------------------------------------------------------------------------- + --------- o

    Args:
        in_channels (int): number of input channels.
        hidden_channes (int): number of hidden channels.
        kernel_size (int): filter kernel size for the first conv layer.
        dilation_rate (int): dilations rate to increase dilation per layer.
            If it is 2, dilations are 1, 2, 4, 8 for the next 4 layers.
        num_layers (int): number of wavenet layers.
        c_in_channels (int): number of channels of conditioning input.
        dropout_p (float): dropout rate.
        weight_norm (bool): enable/disable weight norm for convolution layers.
    """

    def __init__(
        self,
        in_channels,
        hidden_channels,
        kernel_size,
        dilation_rate,
        num_layers,
        c_in_channels=0,
        dropout_p=0,
        weight_norm=True,
    ):
        super().__init__()
        assert kernel_size % 2 == 1
        assert hidden_channels % 2 == 0
        self.in_channels = in_channels
        self.hidden_channels = hidden_channels
        self.kernel_size = kernel_size
        self.dilation_rate = dilation_rate
        self.num_layers = num_layers
        self.c_in_channels = c_in_channels
        self.dropout_p = dropout_p

        self.in_layers = torch.nn.ModuleList()
        self.res_skip_layers = torch.nn.ModuleList()
        self.dropout = nn.Dropout(dropout_p)

        # init conditioning layer
        if c_in_channels > 0:
            cond_layer = torch.nn.Conv1d(c_in_channels, 2 * hidden_channels * num_layers, 1)
            self.cond_layer = torch.nn.utils.weight_norm(cond_layer, name="weight")
        # intermediate layers
        for i in range(num_layers):
            dilation = dilation_rate**i
            padding = int((kernel_size * dilation - dilation) / 2)
            if i == 0:
                in_layer = torch.nn.Conv1d(
                    in_channels, 2 * hidden_channels, kernel_size, dilation=dilation, padding=padding
                )
            else:
                in_layer = torch.nn.Conv1d(
                    hidden_channels, 2 * hidden_channels, kernel_size, dilation=dilation, padding=padding
                )
            in_layer = torch.nn.utils.weight_norm(in_layer, name="weight")
            self.in_layers.append(in_layer)

            if i < num_layers - 1:
                res_skip_channels = 2 * hidden_channels
            else:
                res_skip_channels = hidden_channels

            res_skip_layer = torch.nn.Conv1d(hidden_channels, res_skip_channels, 1)
            res_skip_layer = torch.nn.utils.weight_norm(res_skip_layer, name="weight")
            self.res_skip_layers.append(res_skip_layer)
        # setup weight norm
        if not weight_norm:
            self.remove_weight_norm()

    def forward(self, x, x_mask=None, g=None, **kwargs):  # pylint: disable=unused-argument
        output = torch.zeros_like(x)
        n_channels_tensor = torch.IntTensor([self.hidden_channels])
        x_mask = 1.0 if x_mask is None else x_mask
        if g is not None:
            g = self.cond_layer(g)
        for i in range(self.num_layers):
            x_in = self.in_layers[i](x)
            x_in = self.dropout(x_in)
            if g is not None:
                cond_offset = i * 2 * self.hidden_channels
                g_l = g[:, cond_offset : cond_offset + 2 * self.hidden_channels, :]
            else:
                g_l = torch.zeros_like(x_in)
            acts = fused_add_tanh_sigmoid_multiply(x_in, g_l, n_channels_tensor)
            res_skip_acts = self.res_skip_layers[i](acts)
            if i < self.num_layers - 1:
                x = (x + res_skip_acts[:, : self.hidden_channels, :]) * x_mask
                output = output + res_skip_acts[:, self.hidden_channels :, :]
            else:
                output = output + res_skip_acts
        return output * x_mask

    def remove_weight_norm(self):
        if self.c_in_channels != 0:
            torch.nn.utils.remove_weight_norm(self.cond_layer)
        for l in self.in_layers:
            torch.nn.utils.remove_weight_norm(l)
        for l in self.res_skip_layers:
            torch.nn.utils.remove_weight_norm(l)


class WNBlocks(nn.Module):
    """Wavenet blocks.

    Note: After each block dilation resets to 1 and it increases in each block
        along the dilation rate.

    Args:
        in_channels (int): number of input channels.
        hidden_channes (int): number of hidden channels.
        kernel_size (int): filter kernel size for the first conv layer.
        dilation_rate (int): dilations rate to increase dilation per layer.
            If it is 2, dilations are 1, 2, 4, 8 for the next 4 layers.
        num_blocks (int): number of wavenet blocks.
        num_layers (int): number of wavenet layers.
        c_in_channels (int): number of channels of conditioning input.
        dropout_p (float): dropout rate.
        weight_norm (bool): enable/disable weight norm for convolution layers.
    """

    def __init__(
        self,
        in_channels,
        hidden_channels,
        kernel_size,
        dilation_rate,
        num_blocks,
        num_layers,
        c_in_channels=0,
        dropout_p=0,
        weight_norm=True,
    ):
        super().__init__()
        self.wn_blocks = nn.ModuleList()
        for idx in range(num_blocks):
            layer = WN(
                in_channels=in_channels if idx == 0 else hidden_channels,
                hidden_channels=hidden_channels,
                kernel_size=kernel_size,
                dilation_rate=dilation_rate,
                num_layers=num_layers,
                c_in_channels=c_in_channels,
                dropout_p=dropout_p,
                weight_norm=weight_norm,
            )
            self.wn_blocks.append(layer)

    def forward(self, x, x_mask=None, g=None):
        o = x
        for layer in self.wn_blocks:
            o = layer(o, x_mask, g)
        return o


================================================
FILE: TTS/tts/layers/glow_tts/__init__.py
================================================


================================================
FILE: TTS/tts/layers/glow_tts/decoder.py
================================================
import torch
from torch import nn

from TTS.tts.layers.generic.normalization import ActNorm
from TTS.tts.layers.glow_tts.glow import CouplingBlock, InvConvNear


def squeeze(x, x_mask=None, num_sqz=2):
    """GlowTTS squeeze operation
    Increase number of channels and reduce number of time steps
    by the same factor.

    Note:
        each 's' is a n-dimensional vector.
        ``[s1,s2,s3,s4,s5,s6] --> [[s1, s3, s5], [s2, s4, s6]]``
    """
    b, c, t = x.size()

    t = (t // num_sqz) * num_sqz
    x = x[:, :, :t]
    x_sqz = x.view(b, c, t // num_sqz, num_sqz)
    x_sqz = x_sqz.permute(0, 3, 1, 2).contiguous().view(b, c * num_sqz, t // num_sqz)

    if x_mask is not None:
        x_mask = x_mask[:, :, num_sqz - 1 :: num_sqz]
    else:
        x_mask = torch.ones(b, 1, t // num_sqz).to(device=x.device, dtype=x.dtype)
    return x_sqz * x_mask, x_mask


def unsqueeze(x, x_mask=None, num_sqz=2):
    """GlowTTS unsqueeze operation (revert the squeeze)

    Note:
        each 's' is a n-dimensional vector.
        ``[[s1, s3, s5], [s2, s4, s6]] --> [[s1, s3, s5, s2, s4, s6]]``
    """
    b, c, t = x.size()

    x_unsqz = x.view(b, num_sqz, c // num_sqz, t)
    x_unsqz = x_unsqz.permute(0, 2, 3, 1).contiguous().view(b, c // num_sqz, t * num_sqz)

    if x_mask is not None:
        x_mask = x_mask.unsqueeze(-1).repeat(1, 1, 1, num_sqz).view(b, 1, t * num_sqz)
    else:
        x_mask = torch.ones(b, 1, t * num_sqz).to(device=x.device, dtype=x.dtype)
    return x_unsqz * x_mask, x_mask


class Decoder(nn.Module):
    """Stack of Glow Decoder Modules.

    ::

        Squeeze -> ActNorm -> InvertibleConv1x1 -> AffineCoupling -> Unsqueeze

    Args:
        in_channels (int): channels of input tensor.
        hidden_channels (int): hidden decoder channels.
        kernel_size (int): Coupling block kernel size. (Wavenet filter kernel size.)
        dilation_rate (int): rate to increase dilation by each layer in a decoder block.
        num_flow_blocks (int): number of decoder blocks.
        num_coupling_layers (int): number coupling layers. (number of wavenet layers.)
        dropout_p (float): wavenet dropout rate.
        sigmoid_scale (bool): enable/disable sigmoid scaling in coupling layer.
    """

    def __init__(
        self,
        in_channels,
        hidden_channels,
        kernel_size,
        dilation_rate,
        num_flow_blocks,
        num_coupling_layers,
        dropout_p=0.0,
        num_splits=4,
        num_squeeze=2,
        sigmoid_scale=False,
        c_in_channels=0,
    ):
        super().__init__()

        self.in_channels = in_channels
        self.hidden_channels = hidden_channels
        self.kernel_size = kernel_size
        self.dilation_rate = dilation_rate
        self.num_flow_blocks = num_flow_blocks
        self.num_coupling_layers = num_coupling_layers
        self.dropout_p = dropout_p
        self.num_splits = num_splits
        self.num_squeeze = num_squeeze
        self.sigmoid_scale = sigmoid_scale
        self.c_in_channels = c_in_channels

        self.flows = nn.ModuleList()
        for _ in range(num_flow_blocks):
            self.flows.append(ActNorm(channels=in_channels * num_squeeze))
            self.flows.append(InvConvNear(channels=in_channels * num_squeeze, num_splits=num_splits))
            self.flows.append(
                CouplingBlock(
                    in_channels * num_squeeze,
                    hidden_channels,
                    kernel_size=kernel_size,
                    dilation_rate=dilation_rate,
                    num_layers=num_coupling_layers,
                    c_in_channels=c_in_channels,
                    dropout_p=dropout_p,
                    sigmoid_scale=sigmoid_scale,
                )
            )

    def forward(self, x, x_mask, g=None, reverse=False):
        """
        Shapes:
            - x:  :math:`[B, C, T]`
            - x_mask: :math:`[B, 1 ,T]`
            - g: :math:`[B, C]`
        """
        if not reverse:
            flows = self.flows
            logdet_tot = 0
        else:
            flows = reversed(self.flows)
            logdet_tot = None

        if self.num_squeeze > 1:
            x, x_mask = squeeze(x, x_mask, self.num_squeeze)
        for f in flows:
            if not reverse:
                x, logdet = f(x, x_mask, g=g, reverse=reverse)
                logdet_tot += logdet
            else:
                x, logdet = f(x, x_mask, g=g, reverse=reverse)
        if self.num_squeeze > 1:
            x, x_mask = unsqueeze(x, x_mask, self.num_squeeze)
        return x, logdet_tot

    def store_inverse(self):
        for f in self.flows:
            f.store_inverse()


================================================
FILE: TTS/tts/layers/glow_tts/duration_predictor.py
================================================
import torch
from torch import nn

from ..generic.normalization import LayerNorm


class DurationPredictor(nn.Module):
    """Glow-TTS duration prediction model.

    ::

        [2 x (conv1d_kxk -> relu -> layer_norm -> dropout)] -> conv1d_1x1 -> durs

    Args:
        in_channels (int): Number of channels of the input tensor.
        hidden_channels (int): Number of hidden channels of the network.
        kernel_size (int): Kernel size for the conv layers.
        dropout_p (float): Dropout rate used after each conv layer.
    """

    def __init__(self, in_channels, hidden_channels, kernel_size, dropout_p, cond_channels=None, language_emb_dim=None):
        super().__init__()

        # add language embedding dim in the input
        if language_emb_dim:
            in_channels += language_emb_dim

        # class arguments
        self.in_channels = in_channels
        self.filter_channels = hidden_channels
        self.kernel_size = kernel_size
        self.dropout_p = dropout_p
        # layers
        self.drop = nn.Dropout(dropout_p)
        self.conv_1 = nn.Conv1d(in_channels, hidden_channels, kernel_size, padding=kernel_size // 2)
        self.norm_1 = LayerNorm(hidden_channels)
        self.conv_2 = nn.Conv1d(hidden_channels, hidden_channels, kernel_size, padding=kernel_size // 2)
        self.norm_2 = LayerNorm(hidden_channels)
        # output layer
        self.proj = nn.Conv1d(hidden_channels, 1, 1)
        if cond_channels is not None and cond_channels != 0:
            self.cond = nn.Conv1d(cond_channels, in_channels, 1)

        if language_emb_dim != 0 and language_emb_dim is not None:
            self.cond_lang = nn.Conv1d(language_emb_dim, in_channels, 1)

    def forward(self, x, x_mask, g=None, lang_emb=None):
        """
        Shapes:
            - x: :math:`[B, C, T]`
            - x_mask: :math:`[B, 1, T]`
            - g: :math:`[B, C, 1]`
        """
        if g is not None:
            x = x + self.cond(g)

        if lang_emb is not None:
            x = x + self.cond_lang(lang_emb)

        x = self.conv_1(x * x_mask)
        x = torch.relu(x)
        x = self.norm_1(x)
        x = self.drop(x)
        x = self.conv_2(x * x_mask)
        x = torch.relu(x)
        x = self.norm_2(x)
        x = self.drop(x)
        x = self.proj(x * x_mask)
        return x * x_mask


================================================
FILE: TTS/tts/layers/glow_tts/encoder.py
================================================
import math

import torch
from torch import nn

from TTS.tts.layers.generic.gated_conv import GatedConvBlock
from TTS.tts.layers.generic.res_conv_bn import ResidualConv1dBNBlock
from TTS.tts.layers.generic.time_depth_sep_conv import TimeDepthSeparableConvBlock
from TTS.tts.layers.glow_tts.duration_predictor import DurationPredictor
from TTS.tts.layers.glow_tts.glow import ResidualConv1dLayerNormBlock
from TTS.tts.layers.glow_tts.transformer import RelativePositionTransformer
from TTS.tts.utils.helpers import sequence_mask


class Encoder(nn.Module):
    """Glow-TTS encoder module.

    ::

        embedding -> <prenet> -> encoder_module -> <postnet> --> proj_mean
                                                             |
                                                             |-> proj_var
                                                             |
                                                             |-> concat -> duration_predictor
                                                                    ↑
                                                              speaker_embed

    Args:
        num_chars (int): number of characters.
        out_channels (int): number of output channels.
        hidden_channels (int): encoder's embedding size.
        hidden_channels_ffn (int): transformer's feed-forward channels.
        kernel_size (int): kernel size for conv layers and duration predictor.
        dropout_p (float): dropout rate for any dropout layer.
        mean_only (bool): if True, output only mean values and use constant std.
        use_prenet (bool): if True, use pre-convolutional layers before transformer layers.
        c_in_channels (int): number of channels in conditional input.

    Shapes:
        - input: (B, T, C)

    ::

        suggested encoder params...

        for encoder_type == 'rel_pos_transformer'
            encoder_params={
                'kernel_size':3,
                'dropout_p': 0.1,
                'num_layers': 6,
                'num_heads': 2,
                'hidden_channels_ffn': 768,  # 4 times the hidden_channels
                'input_length': None
            }

        for encoder_type == 'gated_conv'
            encoder_params={
                'kernel_size':5,
                'dropout_p': 0.1,
                'num_layers': 9,
            }

        for encoder_type == 'residual_conv_bn'
            encoder_params={
                "kernel_size": 4,
                "dilations": [1, 2, 4, 1, 2, 4, 1, 2, 4, 1, 2, 4, 1],
                "num_conv_blocks": 2,
                "num_res_blocks": 13
            }

         for encoder_type == 'time_depth_separable'
            encoder_params={
                "kernel_size": 5,
                'num_layers': 9,
            }
    """

    def __init__(
        self,
        num_chars,
        out_channels,
        hidden_channels,
        hidden_channels_dp,
        encoder_type,
        encoder_params,
        dropout_p_dp=0.1,
        mean_only=False,
        use_prenet=True,
        c_in_channels=0,
    ):
        super().__init__()
        # class arguments
        self.num_chars = num_chars
        self.out_channels = out_channels
        self.hidden_channels = hidden_channels
        self.hidden_channels_dp = hidden_channels_dp
        self.dropout_p_dp = dropout_p_dp
        self.mean_only = mean_only
        self.use_prenet = use_prenet
        self.c_in_channels = c_in_channels
        self.encoder_type = encoder_type
        # embedding layer
        self.emb = nn.Embedding(num_chars, hidden_channels)
        nn.init.normal_(self.emb.weight, 0.0, hidden_channels**-0.5)
        # init encoder module
        if encoder_type.lower() == "rel_pos_transformer":
            if use_prenet:
                self.prenet = ResidualConv1dLayerNormBlock(
                    hidden_channels, hidden_channels, hidden_channels, kernel_size=5, num_layers=3, dropout_p=0.5
                )
            self.encoder = RelativePositionTransformer(
                hidden_channels, hidden_channels, hidden_channels, **encoder_params
            )
        elif encoder_type.lower() == "gated_conv":
            self.encoder = GatedConvBlock(hidden_channels, **encoder_params)
        elif encoder_type.lower() == "residual_conv_bn":
            if use_prenet:
                self.prenet = nn.Sequential(nn.Conv1d(hidden_channels, hidden_channels, 1), nn.ReLU())
            self.encoder = ResidualConv1dBNBlock(hidden_channels, hidden_channels, hidden_channels, **encoder_params)
            self.postnet = nn.Sequential(
                nn.Conv1d(self.hidden_channels, self.hidden_channels, 1), nn.BatchNorm1d(self.hidden_channels)
            )
        elif encoder_type.lower() == "time_depth_separable":
            if use_prenet:
                self.prenet = ResidualConv1dLayerNormBlock(
                    hidden_channels, hidden_channels, hidden_channels, kernel_size=5, num_layers=3, dropout_p=0.5
                )
            self.encoder = TimeDepthSeparableConvBlock(
                hidden_channels, hidden_channels, hidden_channels, **encoder_params
            )
        else:
            raise ValueError(" [!] Unkown encoder type.")

        # final projection layers
        self.proj_m = nn.Conv1d(hidden_channels, out_channels, 1)
        if not mean_only:
            self.proj_s = nn.Conv1d(hidden_channels, out_channels, 1)
        # duration predictor
        self.duration_predictor = DurationPredictor(
            hidden_channels + c_in_channels, hidden_channels_dp, 3, dropout_p_dp
        )

    def forward(self, x, x_lengths, g=None):
        """
        Shapes:
            - x: :math:`[B, C, T]`
            - x_lengths: :math:`[B]`
            - g (optional): :math:`[B, 1, T]`
        """
        # embedding layer
        # [B ,T, D]
        x = self.emb(x) * math.sqrt(self.hidden_channels)
        # [B, D, T]
        x = torch.transpose(x, 1, -1)
        # compute input sequence mask
        x_mask = torch.unsqueeze(sequence_mask(x_lengths, x.size(2)), 1).to(x.dtype)
        # prenet
        if hasattr(self, "prenet") and self.use_prenet:
            x = self.prenet(x, x_mask)
        # encoder
        x = self.encoder(x, x_mask)
        # postnet
        if hasattr(self, "postnet"):
            x = self.postnet(x) * x_mask
        # set duration predictor input
        if g is not None:
            g_exp = g.expand(-1, -1, x.size(-1))
            x_dp = torch.cat([x.detach(), g_exp], 1)
        else:
            x_dp = x.detach()
        # final projection layer
        x_m = self.proj_m(x) * x_mask
        if not self.mean_only:
            x_logs = self.proj_s(x) * x_mask
        else:
            x_logs = torch.zeros_like(x_m)
        # duration predictor
        logw = self.duration_predictor(x_dp, x_mask)
        return x_m, x_logs, logw, x_mask


================================================
FILE: TTS/tts/layers/glow_tts/glow.py
================================================
import torch
from packaging.version import Version
from torch import nn
from torch.nn import functional as F

from TTS.tts.layers.generic.wavenet import WN

from ..generic.normalization import LayerNorm


class ResidualConv1dLayerNormBlock(nn.Module):
    """Conv1d with Layer Normalization and residual connection as in GlowTTS paper.
    https://arxiv.org/pdf/1811.00002.pdf

    ::

        x |-> conv1d -> layer_norm -> relu -> dropout -> + -> o
          |---------------> conv1d_1x1 ------------------|

    Args:
        in_channels (int): number of input tensor channels.
        hidden_channels (int): number of inner layer channels.
        out_channels (int): number of output tensor channels.
        kernel_size (int): kernel size of conv1d filter.
        num_layers (int): number of blocks.
        dropout_p (float): dropout rate for each block.
    """

    def __init__(self, in_channels, hidden_channels, out_channels, kernel_size, num_layers, dropout_p):
        super().__init__()
        self.in_channels = in_channels
        self.hidden_channels = hidden_channels
        self.out_channels = out_channels
        self.kernel_size = kernel_size
        self.num_layers = num_layers
        self.dropout_p = dropout_p
        assert num_layers > 1, " [!] number of layers should be > 0."
        assert kernel_size % 2 == 1, " [!] kernel size should be odd number."

        self.conv_layers = nn.ModuleList()
        self.norm_layers = nn.ModuleList()

        for idx in range(num_layers):
            self.conv_layers.append(
                nn.Conv1d(
                    in_channels if idx == 0 else hidden_channels, hidden_channels, kernel_size, padding=kernel_size // 2
                )
            )
            self.norm_layers.append(LayerNorm(hidden_channels))

        self.proj = nn.Conv1d(hidden_channels, out_channels, 1)
        self.proj.weight.data.zero_()
        self.proj.bias.data.zero_()

    def forward(self, x, x_mask):
        """
        Shapes:
            - x: :math:`[B, C, T]`
            - x_mask: :math:`[B, 1, T]`
        """
        x_res = x
        for i in range(self.num_layers):
            x = self.conv_layers[i](x * x_mask)
            x = self.norm_layers[i](x * x_mask)
            x = F.dropout(F.relu(x), self.dropout_p, training=self.training)
        x = x_res + self.proj(x)
        return x * x_mask


class InvConvNear(nn.Module):
    """Invertible Convolution with input splitting as in GlowTTS paper.
    https://arxiv.org/pdf/1811.00002.pdf

    Args:
        channels (int): input and output channels.
        num_splits (int): number of splits, also H and W of conv layer.
        no_jacobian (bool): enable/disable jacobian computations.

    Note:
        Split the input into groups of size self.num_splits and
        perform 1x1 convolution separately. Cast 1x1 conv operation
        to 2d by reshaping the input for efficiency.
    """

    def __init__(self, channels, num_splits=4, no_jacobian=False, **kwargs):  # pylint: disable=unused-argument
        super().__init__()
        assert num_splits % 2 == 0
        self.channels = channels
        self.num_splits = num_splits
        self.no_jacobian = no_jacobian
        self.weight_inv = None

        if Version(torch.__version__) < Version("1.9"):
            w_init = torch.qr(torch.FloatTensor(self.num_splits, self.num_splits).normal_())[0]
        else:
            w_init = torch.linalg.qr(torch.FloatTensor(self.num_splits, self.num_splits).normal_(), "complete")[0]

        if torch.det(w_init) < 0:
            w_init[:, 0] = -1 * w_init[:, 0]
        self.weight = nn.Parameter(w_init)

    def forward(self, x, x_mask=None, reverse=False, **kwargs):  # pylint: disable=unused-argument
        """
        Shapes:
            - x: :math:`[B, C, T]`
            - x_mask: :math:`[B, 1, T]`
        """
        b, c, t = x.size()
        assert c % self.num_splits == 0
        if x_mask is None:
            x_mask = 1
            x_len = torch.ones((b,), dtype=x.dtype, device=x.device) * t
        else:
            x_len = torch.sum(x_mask, [1, 2])

        x = x.view(b, 2, c // self.num_splits, self.num_splits // 2, t)
        x = x.permute(0, 1, 3, 2, 4).contiguous().view(b, self.num_splits, c // self.num_splits, t)

        if reverse:
            if self.weight_inv is not None:
                weight = self.weight_inv
            else:
                weight = torch.inverse(self.weight.float()).to(dtype=self.weight.dtype)
            logdet = None
        else:
            weight = self.weight
            if self.no_jacobian:
                logdet = 0
            else:
                logdet = torch.logdet(self.weight) * (c / self.num_splits) * x_len  # [b]

        weight = weight.view(self.num_splits, self.num_splits, 1, 1)
        z = F.conv2d(x, weight)

        z = z.view(b, 2, self.num_splits // 2, c // self.num_splits, t)
        z = z.permute(0, 1, 3, 2, 4).contiguous().view(b, c, t) * x_mask
        return z, logdet

    def store_inverse(self):
        weight_inv = torch.inverse(self.weight.float()).to(dtype=self.weight.dtype)
        self.weight_inv = nn.Parameter(weight_inv, requires_grad=False)


class CouplingBlock(nn.Module):
    """Glow Affine Coupling block as in GlowTTS paper.
    https://arxiv.org/pdf/1811.00002.pdf

    ::

        x --> x0 -> conv1d -> wavenet -> conv1d --> t, s -> concat(s*x1 + t, x0) -> o
        '-> x1 - - - - - - - - - - - - - - - - - - - - - - - - - ^

    Args:
         in_channels (int): number of input tensor channels.
         hidden_channels (int): number of hidden channels.
         kernel_size (int): WaveNet filter kernel size.
         dilation_rate (int): rate to increase dilation by each layer in a decoder block.
         num_layers (int): number of WaveNet layers.
         c_in_channels (int): number of conditioning input channels.
         dropout_p (int): wavenet dropout rate.
         sigmoid_scale (bool): enable/disable sigmoid scaling for output scale.

    Note:
         It does not use the conditional inputs differently from WaveGlow.
    """

    def __init__(
        self,
        in_channels,
        hidden_channels,
        kernel_size,
        dilation_rate,
        num_layers,
        c_in_channels=0,
        dropout_p=0,
        sigmoid_scale=False,
    ):
        super().__init__()
        self.in_channels = in_channels
        self.hidden_channels = hidden_channels
        self.kernel_size = kernel_size
        self.dilation_rate = dilation_rate
        self.num_layers = num_layers
        self.c_in_channels = c_in_channels
        self.dropout_p = dropout_p
        self.sigmoid_scale = sigmoid_scale
        # input layer
        start = torch.nn.Conv1d(in_channels // 2, hidden_channels, 1)
        start = torch.nn.utils.weight_norm(start)
        self.start = start
        # output layer
        # Initializing last layer to 0 makes the affine coupling layers
        # do nothing at first.  This helps with training stability
        end = torch.nn.Conv1d(hidden_channels, in_channels, 1)
        end.weight.data.zero_()
        end.bias.data.zero_()
        self.end = end
        # coupling layers
        self.wn = WN(hidden_channels, hidden_channels, kernel_size, dilation_rate, num_layers, c_in_channels, dropout_p)

    def forward(self, x, x_mask=None, reverse=False, g=None, **kwargs):  # pylint: disable=unused-argument
        """
        Shapes:
            - x: :math:`[B, C, T]`
            - x_mask: :math:`[B, 1, T]`
            - g: :math:`[B, C, 1]`
        """
        if x_mask is None:
            x_mask = 1
        x_0, x_1 = x[:, : self.in_channels // 2], x[:, self.in_channels // 2 :]

        x = self.start(x_0) * x_mask
        x = self.wn(x, x_mask, g)
        out = self.end(x)

        z_0 = x_0
        t = out[:, : self.in_channels // 2, :]
        s = out[:, self.in_channels // 2 :, :]
        if self.sigmoid_scale:
            s = torch.log(1e-6 + torch.sigmoid(s + 2))

        if reverse:
            z_1 = (x_1 - t) * torch.exp(-s) * x_mask
            logdet = None
        else:
            z_1 = (t + torch.exp(s) * x_1) * x_mask
            logdet = torch.sum(s * x_mask, [1, 2])

        z = torch.cat([z_0, z_1], 1)
        return z, logdet

    def store_inverse(self):
        self.wn.remove_weight_norm()


================================================
FILE: TTS/tts/layers/glow_tts/transformer.py
================================================
import math

import torch
from torch import nn
from torch.nn import functional as F

from TTS.tts.layers.generic.normalization import LayerNorm, LayerNorm2


class RelativePositionMultiHeadAttention(nn.Module):
    """Multi-head attention with Relative Positional embedding.
    https://arxiv.org/pdf/1809.04281.pdf

    It learns positional embeddings for a window of neighbours. For keys and values,
    it learns different set of embeddings. Key embeddings are agregated with the attention
    scores and value embeddings are aggregated with the output.

    Note:
        Example with relative attention window size 2

        - input = [a, b, c, d, e]
        - rel_attn_embeddings = [e(t-2), e(t-1), e(t+1), e(t+2)]

        So it learns 4 embedding vectors (in total 8) separately for key and value vectors.

        Considering the input c

        - e(t-2) corresponds to c -> a
        - e(t-2) corresponds to c -> b
        - e(t-2) corresponds to c -> d
        - e(t-2) corresponds to c -> e

        These embeddings are shared among different time steps. So input a, b, d and e also uses
        the same embeddings.

        Embeddings are ignored when the relative window is out of limit for the first and the last
        n items.

    Args:
        channels (int): input and inner layer channels.
        out_channels (int): output channels.
        num_heads (int): number of attention heads.
        rel_attn_window_size (int, optional): relation attention window size.
            If 4, for each time step next and previous 4 time steps are attended.
            If default, relative encoding is disabled and it is a regular transformer.
            Defaults to None.
        heads_share (bool, optional): [description]. Defaults to True.
        dropout_p (float, optional): dropout rate. Defaults to 0..
        input_length (int, optional): intput length for positional encoding. Defaults to None.
        proximal_bias (bool, optional): enable/disable proximal bias as in the paper. Defaults to False.
        proximal_init (bool, optional): enable/disable poximal init as in the paper.
            Init key and query layer weights the same. Defaults to False.
    """

    def __init__(
        self,
        channels,
        out_channels,
        num_heads,
        rel_attn_window_size=None,
        heads_share=True,
        dropout_p=0.0,
        input_length=None,
        proximal_bias=False,
        proximal_init=False,
    ):
        super().__init__()
        assert channels % num_heads == 0, " [!] channels should be divisible by num_heads."
        # class attributes
        self.channels = channels
        self.out_channels = out_channels
        self.num_heads = num_heads
        self.rel_attn_window_size = rel_attn_window_size
        self.heads_share = heads_share
        self.input_length = input_length
        self.proximal_bias = proximal_bias
        self.dropout_p = dropout_p
        self.attn = None
        # query, key, value layers
        self.k_channels = channels // num_heads
        self.conv_q = nn.Conv1d(channels, channels, 1)
        self.conv_k = nn.Conv1d(channels, channels, 1)
        self.conv_v = nn.Conv1d(channels, channels, 1)
        # output layers
        self.conv_o = nn.Conv1d(channels, out_channels, 1)
        self.dropout = nn.Dropout(dropout_p)
        # relative positional encoding layers
        if rel_attn_window_size is not None:
            n_heads_rel = 1 if heads_share else num_heads
            rel_stddev = self.k_channels**-0.5
            emb_rel_k = nn.Parameter(
                torch.randn(n_heads_rel, rel_attn_window_size * 2 + 1, self.k_channels) * rel_stddev
            )
            emb_rel_v = nn.Parameter(
                torch.randn(n_heads_rel, rel_attn_window_size * 2 + 1, self.k_channels) * rel_stddev
            )
            self.register_parameter("emb_rel_k", emb_rel_k)
            self.register_parameter("emb_rel_v", emb_rel_v)

        # init layers
        nn.init.xavier_uniform_(self.conv_q.weight)
        nn.init.xavier_uniform_(self.conv_k.weight)
        # proximal bias
        if proximal_init:
            self.conv_k.weight.data.copy_(self.conv_q.weight.data)
            self.conv_k.bias.data.copy_(self.conv_q.bias.data)
        nn.init.xavier_uniform_(self.conv_v.weight)

    def forward(self, x, c, attn_mask=None):
        """
        Shapes:
            - x: :math:`[B, C, T]`
            - c: :math:`[B, C, T]`
            - attn_mask: :math:`[B, 1, T, T]`
        """
        q = self.conv_q(x)
        k = self.conv_k(c)
        v = self.conv_v(c)
        x, self.attn = self.attention(q, k, v, mask=attn_mask)
        x = self.conv_o(x)
        return x

    def attention(self, query, key, value, mask=None):
        # reshape [b, d, t] -> [b, n_h, t, d_k]
        b, d, t_s, t_t = (*key.size(), query.size(2))
        query = query.view(b, self.num_heads, self.k_channels, t_t).transpose(2, 3)
        key = key.view(b, self.num_heads, self.k_channels, t_s).transpose(2, 3)
        value = value.view(b, self.num_heads, self.k_channels, t_s).transpose(2, 3)
        # compute raw attention scores
        scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(self.k_channels)
        # relative positional encoding for scores
        if self.rel_attn_window_size is not None:
            assert t_s == t_t, "Relative attention is only available for self-attention."
            # get relative key embeddings
            key_relative_embeddings = self._get_relative_embeddings(self.emb_rel_k, t_s)
            rel_logits = self._matmul_with_relative_keys(query, key_relative_embeddings)
            rel_logits = self._relative_position_to_absolute_position(rel_logits)
            scores_local = rel_logits / math.sqrt(self.k_channels)
            scores = scores + scores_local
        # proximan bias
        if self.proximal_bias:
            assert t_s == t_t, "Proximal bias is only available for self-attention."
            scores = scores + self._attn_proximity_bias(t_s).to(device=scores.device, dtype=scores.dtype)
        # attention score masking
        if mask is not None:
            # add small value to prevent oor error.
            scores = scores.masked_fill(mask == 0, -1e4)
            if self.input_length is not None:
                block_mask = torch.ones_like(scores).triu(-1 * self.input_length).tril(self.input_length)
                scores = scores * block_mask + -1e4 * (1 - block_mask)
        # attention score normalization
        p_attn = F.softmax(scores, dim=-1)  # [b, n_h, t_t, t_s]
        # apply dropout to attention weights
        p_attn = self.dropout(p_attn)
        # compute output
        output = torch.matmul(p_attn, value)
        # relative positional encoding for values
        if self.rel_attn_window_size is not None:
            relative_weights = self._absolute_position_to_relative_position(p_attn)
            value_relative_embeddings = self._get_relative_embeddings(self.emb_rel_v, t_s)
            output = output + self._matmul_with_relative_values(relative_weights, value_relative_embeddings)
        output = output.transpose(2, 3).contiguous().view(b, d, t_t)  # [b, n_h, t_t, d_k] -> [b, d, t_t]
        return output, p_attn

    @staticmethod
    def _matmul_with_relative_values(p_attn, re):
        """
        Args:
            p_attn (Tensor): attention weights.
            re (Tensor): relative value embedding vector. (a_(i,j)^V)

        Shapes:
            -p_attn: :math:`[B, H, T, V]`
            -re: :math:`[H or 1, V, D]`
            -logits: :math:`[B, H, T, D]`
        """
        logits = torch.matmul(p_attn, re.unsqueeze(0))
        return logits

    @staticmethod
    def _matmul_with_relative_keys(query, re):
        """
        Args:
            query (Tensor): batch of query vectors. (x*W^Q)
            re (Tensor): relative key embedding vector. (a_(i,j)^K)

        Shapes:
            - query: :math:`[B, H, T, D]`
            - re: :math:`[H or 1, V, D]`
            - logits: :math:`[B, H, T, V]`
        """
        # logits = torch.einsum('bhld, kmd -> bhlm', [query, re.to(query.dtype)])
        logits = torch.matmul(query, re.unsqueeze(0).transpose(-2, -1))
        return logits

    def _get_relative_embeddings(self, relative_embeddings, length):
        """Convert embedding vestors to a tensor of embeddings"""
        # Pad first before slice to avoid using cond ops.
        pad_length = max(length - (self.rel_attn_window_size + 1), 0)
        slice_start_position = max((self.rel_attn_window_size + 1) - length, 0)
        slice_end_position = slice_start_position + 2 * length - 1
        if pad_length > 0:
            padded_relative_embeddings = F.pad(relative_embeddings, [0, 0, pad_length, pad_length, 0, 0])
        else:
            padded_relative_embeddings = relative_embeddings
        used_relative_embeddings = padded_relative_embeddings[:, slice_start_position:slice_end_position]
        return used_relative_embeddings

    @staticmethod
    def _relative_position_to_absolute_position(x):
        """Converts tensor from relative to absolute indexing for local attention.
        Shapes:
            x: :math:`[B, C, T, 2 * T - 1]`
        Returns:
            A Tensor of shape :math:`[B, C, T, T]`
        """
        batch, heads, length, _ = x.size()
        # Pad to shift from relative to absolute indexing.
        x = F.pad(x, [0, 1, 0, 0, 0, 0, 0, 0])
        # Pad extra elements so to add up to shape (len+1, 2*len-1).
        x_flat = x.view([batch, heads, length * 2 * length])
        x_flat = F.pad(x_flat, [0, length - 1, 0, 0, 0, 0])
        # Reshape and slice out the padded elements.
        x_final = x_flat.view([batch, heads, length + 1, 2 * length - 1])[:, :, :length, length - 1 :]
        return x_final

    @staticmethod
    def _absolute_position_to_relative_position(x):
        """
        Shapes:
            - x: :math:`[B, C, T, T]`
            - ret: :math:`[B, C, T, 2*T-1]`
        """
        batch, heads, length, _ = x.size()
        # padd along column
        x = F.pad(x, [0, length - 1, 0, 0, 0, 0, 0, 0])
        x_flat = x.view([batch, heads, length**2 + length * (length - 1)])
        # add 0's in the beginning that will skew the elements after reshape
        x_flat = F.pad(x_flat, [length, 0, 0, 0, 0, 0])
        x_final = x_flat.view([batch, heads, length, 2 * length])[:, :, :, 1:]
        return x_final

    @staticmethod
    def _attn_proximity_bias(length):
        """Produce an attention mask that discourages distant
        attention values.
        Args:
            length (int): an integer scalar.
        Returns:
            a Tensor with shape :math:`[1, 1, T, T]`
        """
        # L
        r = torch.arange(length, dtype=torch.float32)
        # L x L
        diff = torch.unsqueeze(r, 0) - torch.unsqueeze(r, 1)
        # scale mask values
        diff = -torch.log1p(torch.abs(diff))
        # 1 x 1 x L x L
        return diff.unsqueeze(0).unsqueeze(0)


class FeedForwardNetwork(nn.Module):
    """Feed Forward Inner layers for Transformer.

    Args:
        in_channels (int): input tensor channels.
        out_channels (int): output tensor channels.
        hidden_channels (int): inner layers hidden channels.
        kernel_size (int): conv1d filter kernel size.
        dropout_p (float, optional): dropout rate. Defaults to 0.
    """

    def __init__(self, in_channels, out_channels, hidden_channels, kernel_size, dropout_p=0.0, causal=False):
        super().__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.hidden_channels = hidden_channels
        self.kernel_size = kernel_size
        self.dropout_p = dropout_p

        if causal:
            self.padding = self._causal_padding
        else:
            self.padding = self._same_padding

        self.conv_1 = nn.Conv1d(in_channels, hidden_channels, kernel_size)
        self.conv_2 = nn.Conv1d(hidden_channels, out_channels, kernel_size)
        self.dropout = nn.Dropout(dropout_p)

    def forward(self, x, x_mask):
        x = self.conv_1(self.padding(x * x_mask))
        x = torch.relu(x)
        x = self.dropout(x)
        x = self.conv_2(self.padding(x * x_mask))
        return x * x_mask

    def _causal_padding(self, x):
        if self.kernel_size == 1:
            return x
        pad_l = self.kernel_size - 1
        pad_r = 0
        padding = [[0, 0], [0, 0], [pad_l, pad_r]]
        x = F.pad(x, self._pad_shape(padding))
        return x

    def _same_padding(self, x):
        if self.kernel_size == 1:
            return x
        pad_l = (self.kernel_size - 1) // 2
        pad_r = self.kernel_size // 2
        padding = [[0, 0], [0, 0], [pad_l, pad_r]]
        x = F.pad(x, self._pad_shape(padding))
        return x

    @staticmethod
    def _pad_shape(padding):
        l = padding[::-1]
        pad_shape = [item for sublist in l for item in sublist]
        return pad_shape


class RelativePositionTransformer(nn.Module):
    """Transformer with Relative Potional Encoding.
    https://arxiv.org/abs/1803.02155

    Args:
        in_channels (int): number of channels of the input tensor.
        out_chanels (int): number of channels of the output tensor.
        hidden_channels (int): model hidden channels.
        hidden_channels_ffn (int): hidden channels of FeedForwardNetwork.
        num_heads (int): number of attention heads.
        num_layers (int): number of transformer layers.
        kernel_size (int, optional): kernel size of feed-forward inner layers. Defaults to 1.
        dropout_p (float, optional): dropout rate for self-attention and feed-forward inner layers_per_stack. Defaults to 0.
        rel_attn_window_size (int, optional): relation attention window size.
            If 4, for each time step next and previous 4 time steps are attended.
            If default, relative encoding is disabled and it is a regular transformer.
            Defaults to None.
        input_length (int, optional): input lenght to limit position encoding. Defaults to None.
        layer_norm_type (str, optional): type "1" uses torch tensor operations and type "2" uses torch layer_norm
            primitive. Use type "2", type "1: is for backward compat. Defaults to "1".
    """

    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        hidden_channels: int,
        hidden_channels_ffn: int,
        num_heads: int,
        num_layers: int,
        kernel_size=1,
        dropout_p=0.0,
        rel_attn_window_size: int = None,
        input_length: int = None,
        layer_norm_type: str = "1",
    ):
        super().__init__()
        self.hidden_channels = hidden_channels
        self.hidden_channels_ffn = hidden_channels_ffn
        self.num_heads = num_heads
        self.num_layers = num_layers
        self.kernel_size = kernel_size
        self.dropout_p = dropout_p
        self.rel_attn_window_size = rel_attn_window_size

        self.dropout = nn.Dropout(dropout_p)
        self.attn_layers = nn.ModuleList()
        self.norm_layers_1 = nn.ModuleList()
        self.ffn_layers = nn.ModuleList()
        self.norm_layers_2 = nn.ModuleList()

        for idx in range(self.num_layers):
            self.attn_layers.append(
                RelativePositionMultiHeadAttention(
                    hidden_channels if idx != 0 else in_channels,
                    hidden_channels,
                    num_heads,
                    rel_attn_window_size=rel_attn_window_size,
                    dropout_p=dropout_p,
                    input_length=input_length,
                )
            )
            if layer_norm_type == "1":
                self.norm_layers_1.append(LayerNorm(hidden_channels))
            elif layer_norm_type == "2":
                self.norm_layers_1.append(LayerNorm2(hidden_channels))
            else:
                raise ValueError(" [!] Unknown layer norm type")

            if hidden_channels != out_channels and (idx + 1) == self.num_layers:
                self.proj = nn.Conv1d(hidden_channels, out_channels, 1)

            self.ffn_layers.append(
                FeedForwardNetwork(
                    hidden_channels,
                    hidden_channels if (idx + 1) != self.num_layers else out_channels,
                    hidden_channels_ffn,
                    kernel_size,
                    dropout_p=dropout_p,
                )
            )

            if layer_norm_type == "1":
                self.norm_layers_2.append(LayerNorm(hidden_channels if (idx + 1) != self.num_layers else out_channels))
            elif layer_norm_type == "2":
                self.norm_layers_2.append(LayerNorm2(hidden_channels if (idx + 1) != self.num_layers else out_channels))
            else:
                raise ValueError(" [!] Unknown layer norm type")

    def forward(self, x, x_mask):
        """
        Shapes:
            - x: :math:`[B, C, T]`
            - x_mask: :math:`[B, 1, T]`
        """
        attn_mask = x_mask.unsqueeze(2) * x_mask.unsqueeze(-1)
        for i in range(self.num_layers):
            x = x * x_mask
            y = self.attn_layers[i](x, x, attn_mask)
            y = self.dropout(y)
            x = self.norm_layers_1[i](x + y)

            y = self.ffn_layers[i](x, x_mask)
            y = self.dropout(y)

            if (i + 1) == self.num_layers and hasattr(self, "proj"):
                x = self.proj(x)

            x = self.norm_layers_2[i](x + y)
        x = x * x_mask
        return x


================================================
FILE: TTS/tts/layers/losses.py
================================================
import math

import numpy as np
import torch
from coqpit import Coqpit
from torch import nn
from torch.nn import functional

from TTS.tts.utils.helpers import sequence_mask
from TTS.tts.utils.ssim import SSIMLoss as _SSIMLoss
from TTS.utils.audio.torch_transforms import TorchSTFT


# pylint: disable=abstract-method
# relates https://github.com/pytorch/pytorch/issues/42305
class L1LossMasked(nn.Module):
    def __init__(self, seq_len_norm):
        super().__init__()
        self.seq_len_norm = seq_len_norm

    def forward(self, x, target, length):
        """
        Args:
            x: A Variable containing a FloatTensor of size
                (batch, max_len, dim) which contains the
                unnormalized probability for each class.
            target: A Variable containing a LongTensor of size
                (batch, max_len, dim) which contains the index of the true
                class for each corresponding step.
            length: A Variable containing a LongTensor of size (batch,)
                which contains the length of each data in a batch.
        Shapes:
            x: B x T X D
            target: B x T x D
            length: B
        Returns:
            loss: An average loss value in range [0, 1] masked by the length.
        """
        # mask: (batch, max_len, 1)
        target.requires_grad = False
        mask = sequence_mask(sequence_length=length, max_len=target.size(1)).unsqueeze(2).float()
        if self.seq_len_norm:
            norm_w = mask / mask.sum(dim=1, keepdim=True)
            out_weights = norm_w.div(target.shape[0] * target.shape[2])
            mask = mask.expand_as(x)
            loss = functional.l1_loss(x * mask, target * mask, reduction="none")
            loss = loss.mul(out_weights.to(loss.device)).sum()
        else:
            mask = mask.expand_as(x)
            loss = functional.l1_loss(x * mask, target * mask, reduction="sum")
            loss = loss / mask.sum()
        return loss


class MSELossMasked(nn.Module):
    def __init__(self, seq_len_norm):
        super().__init__()
        self.seq_len_norm = seq_len_norm

    def forward(self, x, target, length):
        """
        Args:
            x: A Variable containing a FloatTensor of size
                (batch, max_len, dim) which contains the
                unnormalized probability for each class.
            target: A Variable containing a LongTensor of size
                (batch, max_len, dim) which contains the index of the true
                class for each corresponding step.
            length: A Variable containing a LongTensor of size (batch,)
                which contains the length of each data in a batch.
        Shapes:
            - x: :math:`[B, T, D]`
            - target: :math:`[B, T, D]`
            - length: :math:`B`
        Returns:
            loss: An average loss value in range [0, 1] masked by the length.
        """
        # mask: (batch, max_len, 1)
        target.requires_grad = False
        mask = sequence_mask(sequence_length=length, max_len=target.size(1)).unsqueeze(2).float()
        if self.seq_len_norm:
            norm_w = mask / mask.sum(dim=1, keepdim=True)
            out_weights = norm_w.div(target.shape[0] * target.shape[2])
            mask = mask.expand_as(x)
            loss = functional.mse_loss(x * mask, target * mask, reduction="none")
            loss = loss.mul(out_weights.to(loss.device)).sum()
        else:
            mask = mask.expand_as(x)
            loss = functional.mse_loss(x * mask, target * mask, reduction="sum")
            loss = loss / mask.sum()
        return loss


def sample_wise_min_max(x: torch.Tensor, mask: torch.Tensor) -> torch.Tensor:
    """Min-Max normalize tensor through first dimension
    Shapes:
        - x: :math:`[B, D1, D2]`
        - m: :math:`[B, D1, 1]`
    """
    maximum = torch.amax(x.masked_fill(~mask, 0), dim=(1, 2), keepdim=True)
    minimum = torch.amin(x.masked_fill(~mask, np.inf), dim=(1, 2), keepdim=True)
    return (x - minimum) / (maximum - minimum + 1e-8)


class SSIMLoss(torch.nn.Module):
    """SSIM loss as (1 - SSIM)
    SSIM is explained here https://en.wikipedia.org/wiki/Structural_similarity
    """

    def __init__(self):
        super().__init__()
        self.loss_func = _SSIMLoss()

    def forward(self, y_hat, y, length):
        """
        Args:
            y_hat (tensor): model prediction values.
            y (tensor): target values.
            length (tensor): length of each sample in a batch for masking.

        Shapes:
            y_hat: B x T X D
            y: B x T x D
            length: B

         Returns:
            loss: An average loss value in range [0, 1] masked by the length.
        """
        mask = sequence_mask(sequence_length=length, max_len=y.size(1)).unsqueeze(2)
        y_norm = sample_wise_min_max(y, mask)
        y_hat_norm = sample_wise_min_max(y_hat, mask)
        ssim_loss = self.loss_func((y_norm * mask).unsqueeze(1), (y_hat_norm * mask).unsqueeze(1))

        if ssim_loss.item() > 1.0:
            print(f" > SSIM loss is out-of-range {ssim_loss.item()}, setting it 1.0")
            ssim_loss = torch.tensor(1.0, device=ssim_loss.device)

        if ssim_loss.item() < 0.0:
            print(f" > SSIM loss is out-of-range {ssim_loss.item()}, setting it 0.0")
            ssim_loss = torch.tensor(0.0, device=ssim_loss.device)

        return ssim_loss


class AttentionEntropyLoss(nn.Module):
    # pylint: disable=R0201
    def forward(self, align):
        """
        Forces attention to be more decisive by penalizing
        soft attention weights
        """
        entropy = torch.distributions.Categorical(probs=align).entropy()
        loss = (entropy / np.log(align.shape[1])).mean()
        return loss


class BCELossMasked(nn.Module):
    """BCE loss with masking.

    Used mainly for stopnet in autoregressive models.

    Args:
        pos_weight (float): weight for positive samples. If set < 1, penalize early stopping. Defaults to None.
    """

    def __init__(self, pos_weight: float = None):
        super().__init__()
        self.pos_weight = nn.Parameter(torch.tensor([pos_weight]), requires_grad=False)

    def forward(self, x, target, length):
        """
        Args:
            x: A Variable containing a FloatTensor of size
                (batch, max_len) which contains the
                unnormalized probability for each class.
            target: A Variable containing a LongTensor of size
                (batch, max_len) which contains the index of the true
                class for each corresponding step.
            length: A Variable containing a LongTensor of size (batch,)
                which contains the length of each data in a batch.
        Shapes:
            x: B x T
            target: B x T
            length: B
        Returns:
            loss: An average loss value in range [0, 1] masked by the length.
        """
        target.requires_grad = False
        if length is not None:
            # mask: (batch, max_len, 1)
            mask = sequence_mask(sequence_length=length, max_len=target.size(1))
            num_items = mask.sum()
            loss = functional.binary_cross_entropy_with_logits(
                x.masked_select(mask), target.masked_select(mask), pos_weight=self.pos_weight, reduction="sum"
            )
        else:
            loss = functional.binary_cross_entropy_with_logits(x, target, pos_weight=self.pos_weight, reduction="sum")
            num_items = torch.numel(x)
        loss = loss / num_items
        return loss


class DifferentialSpectralLoss(nn.Module):
    """Differential Spectral Loss
    https://arxiv.org/ftp/arxiv/papers/1909/1909.10302.pdf"""

    def __init__(self, loss_func):
        super().__init__()
        self.loss_func = loss_func

    def forward(self, x, target, length=None):
        """
         Shapes:
            x: B x T
            target: B x T
            length: B
        Returns:
            loss: An average loss value in range [0, 1] masked by the length.
        """
        x_diff = x[:, 1:] - x[:, :-1]
        target_diff = target[:, 1:] - target[:, :-1]
        if length is None:
            return self.loss_func(x_diff, target_diff)
        return self.loss_func(x_diff, target_diff, length - 1)


class GuidedAttentionLoss(torch.nn.Module):
    def __init__(self, sigma=0.4):
        super().__init__()
        self.sigma = sigma

    def _make_ga_masks(self, ilens, olens):
        B = len(ilens)
        max_ilen = max(ilens)
        max_olen = max(olens)
        ga_masks = torch.zeros((B, max_olen, max_ilen))
        for idx, (ilen, olen) in enumerate(zip(ilens, olens)):
            ga_masks[idx, :olen, :ilen] = self._make_ga_mask(ilen, olen, self.sigma)
        return ga_masks

    def forward(self, att_ws, ilens, olens):
        ga_masks = self._make_ga_masks(ilens, olens).to(att_ws.device)
        seq_masks = self._make_masks(ilens, olens).to(att_ws.device)
        losses = ga_masks * att_ws
        loss = torch.mean(losses.masked_select(seq_masks))
        return loss

    @staticmethod
    def _make_ga_mask(ilen, olen, sigma):
        grid_x, grid_y = torch.meshgrid(torch.arange(olen).to(olen), torch.arange(ilen).to(ilen))
        grid_x, grid_y = grid_x.float(), grid_y.float()
        return 1.0 - torch.exp(-((grid_y / ilen - grid_x / olen) ** 2) / (2 * (sigma**2)))

    @staticmethod
    def _make_masks(ilens, olens):
        in_masks = sequence_mask(ilens)
        out_masks = sequence_mask(olens)
        return out_masks.unsqueeze(-1) & in_masks.unsqueeze(-2)


class Huber(nn.Module):
    # pylint: disable=R0201
    def forward(self, x, y, length=None):
        """
        Shapes:
            x: B x T
            y: B x T
            length: B
        """
        mask = sequence_mask(sequence_length=length, max_len=y.size(1)).unsqueeze(2).float()
        return torch.nn.functional.smooth_l1_loss(x * mask, y * mask, reduction="sum") / mask.sum()


class ForwardSumLoss(nn.Module):
    def __init__(self, blank_logprob=-1):
        super().__init__()
        self.log_softmax = torch.nn.LogSoftmax(dim=3)
        self.ctc_loss = torch.nn.CTCLoss(zero_infinity=True)
        self.blank_logprob = blank_logprob

    def forward(self, attn_logprob, in_lens, out_lens):
        key_lens = in_lens
        query_lens = out_lens
        attn_logprob_padded = torch.nn.functional.pad(input=attn_logprob, pad=(1, 0), value=self.blank_logprob)

        total_loss = 0.0
        for bid in range(attn_logprob.shape[0]):
            target_seq = torch.arange(1, key_lens[bid] + 1).unsqueeze(0)
            curr_logprob = attn_logprob_padded[bid].permute(1, 0, 2)[: query_lens[bid], :, : key_lens[bid] + 1]

            curr_logprob = self.log_softmax(curr_logprob[None])[0]
            loss = self.ctc_loss(
                curr_logprob,
                target_seq,
                input_lengths=query_lens[bid : bid + 1],
                target_lengths=key_lens[bid : bid + 1],
            )
            total_loss = total_loss + loss

        total_loss = total_loss / attn_logprob.shape[0]
        return total_loss


########################
# MODEL LOSS LAYERS
########################


class TacotronLoss(torch.nn.Module):
    """Collection of Tacotron set-up based on provided config."""

    def __init__(self, c, ga_sigma=0.4):
        super().__init__()
        self.stopnet_pos_weight = c.stopnet_pos_weight
        self.use_capacitron_vae = c.use_capacitron_vae
        if self.use_capacitron_vae:
            self.capacitron_capacity = c.capacitron_vae.capacitron_capacity
            self.capacitron_vae_loss_alpha = c.capacitron_vae.capacitron_VAE_loss_alpha
        self.ga_alpha = c.ga_alpha
        self.decoder_diff_spec_alpha = c.decoder_diff_spec_alpha
        self.postnet_diff_spec_alpha = c.postnet_diff_spec_alpha
        self.decoder_alpha = c.decoder_loss_alpha
        self.postnet_alpha = c.postnet_loss_alpha
        self.decoder_ssim_alpha = c.decoder_ssim_alpha
        self.postnet_ssim_alpha = c.postnet_ssim_alpha
        self.config = c

        # postnet and decoder loss
        if c.loss_masking:
            self.criterion = L1LossMasked(c.seq_len_norm) if c.model in ["Tacotron"] else MSELossMasked(c.seq_len_norm)
        else:
            self.criterion = nn.L1Loss() if c.model in ["Tacotron"] else nn.MSELoss()
        # guided attention loss
        if c.ga_alpha > 0:
            self.criterion_ga = GuidedAttentionLoss(sigma=ga_sigma)
        # differential spectral loss
        if c.postnet_diff_spec_alpha > 0 or c.decoder_diff_spec_alpha > 0:
            self.criterion_diff_spec = DifferentialSpectralLoss(loss_func=self.criterion)
        # ssim loss
        if c.postnet_ssim_alpha > 0 or c.decoder_ssim_alpha > 0:
            self.criterion_ssim = SSIMLoss()
        # stopnet loss
        # pylint: disable=not-callable
        self.criterion_st = BCELossMasked(pos_weight=torch.tensor(self.stopnet_pos_weight)) if c.stopnet else None

        # For dev pruposes only
        self.criterion_capacitron_reconstruction_loss = nn.L1Loss(reduction="sum")

    def forward(
        self,
        postnet_output,
        decoder_output,
        mel_input,
        linear_input,
        stopnet_output,
        stopnet_target,
        stop_target_length,
        capacitron_vae_outputs,
        output_lens,
        decoder_b_output,
        alignments,
        alignment_lens,
        alignments_backwards,
        input_lens,
    ):
        # decoder outputs linear or mel spectrograms for Tacotron and Tacotron2
        # the target should be set acccordingly
        postnet_target = linear_input if self.config.model.lower() in ["tacotron"] else mel_input

        return_dict = {}
        # remove lengths if no masking is applied
        if not self.config.loss_masking:
            output_lens = None
        # decoder and postnet losses
        if self.config.loss_masking:
            if self.decoder_alpha > 0:
                decoder_loss = self.criterion(decoder_output, mel_input, output_lens)
            if self.postnet_alpha > 0:
                postnet_loss = self.criterion(postnet_output, postnet_target, output_lens)
        else:
            if self.decoder_alpha > 0:
                decoder_loss = self.criterion(decoder_output, mel_input)
            if self.postnet_alpha > 0:
                postnet_loss = self.criterion(postnet_output, postnet_target)
        loss = self.decoder_alpha * decoder_loss + self.postnet_alpha * postnet_loss
        return_dict["decoder_loss"] = decoder_loss
        return_dict["postnet_loss"] = postnet_loss

        if self.use_capacitron_vae:
            # extract capacitron vae infos
            posterior_distribution, prior_distribution, beta = capacitron_vae_outputs

            # KL divergence term between the posterior and the prior
            kl_term = torch.mean(torch.distributions.kl_divergence(posterior_distribution, prior_distribution))

            # Limit the mutual information between the data and latent space by the variational capacity limit
            kl_capacity = kl_term - self.capacitron_capacity

            # pass beta through softplus to keep it positive
            beta = torch.nn.functional.softplus(beta)[0]

            # This is the term going to the main ADAM optimiser, we detach beta because
            # beta is optimised by a separate, SGD optimiser below
            capacitron_vae_loss = beta.detach() * kl_capacity

            # normalize the capacitron_vae_loss as in L1Loss or MSELoss.
            # After this, both the standard loss and capacitron_vae_loss will be in the same scale.
            # For this reason we don't need use L1Loss and MSELoss in "sum" reduction mode.
            # Note: the batch is not considered because the L1Loss was calculated in "sum" mode
            # divided by the batch size, So not dividing the capacitron_vae_loss by B is legitimate.

            # get B T D dimension from input
            B, T, D = mel_input.size()
            # normalize
            if self.config.loss_masking:
                # if mask loss get T using the mask
                T = output_lens.sum() / B

            # Only for dev purposes to be able to compare the reconstruction loss with the values in the
            # original Capacitron paper
            return_dict["capaciton_reconstruction_loss"] = (
                self.criterion_capacitron_reconstruction_loss(decoder_output, mel_input) / decoder_output.size(0)
            ) + kl_capacity

            capacitron_vae_loss = capacitron_vae_loss / (T * D)
            capacitron_vae_loss = capacitron_vae_loss * self.capacitron_vae_loss_alpha

            # This is the term to purely optimise beta and to pass into the SGD optimizer
            beta_loss = torch.negative(beta) * kl_capacity.detach()

            loss += capacitron_vae_loss

            return_dict["capacitron_vae_loss"] = capacitron_vae_loss
            return_dict["capacitron_vae_beta_loss"] = beta_loss
            return_dict["capacitron_vae_kl_term"] = kl_term
            return_dict["capacitron_beta"] = beta

        stop_loss = (
            self.criterion_st(stopnet_output, stopnet_target, stop_target_length)
            if self.config.stopnet
            else torch.zeros(1)
        )
        loss += stop_loss
        return_dict["stopnet_loss"] = stop_loss

        # backward decoder loss (if enabled)
        if self.config.bidirectional_decoder:
            if self.config.loss_masking:
                decoder_b_loss = self.criterion(torch.flip(decoder_b_output, dims=(1,)), mel_input, output_lens)
            else:
                decoder_b_loss = self.criterion(torch.flip(decoder_b_output, dims=(1,)), mel_input)
            decoder_c_loss = torch.nn.functional.l1_loss(torch.flip(decoder_b_output, dims=(1,)), decoder_output)
            loss += self.decoder_alpha * (decoder_b_loss + decoder_c_loss)
            return_dict["decoder_b_loss"] = decoder_b_loss
            return_dict["decoder_c_loss"] = decoder_c_loss

        # double decoder consistency loss (if enabled)
        if self.config.double_decoder_consistency:
            if self.config.loss_masking:
                decoder_b_loss = self.criterion(decoder_b_output, mel_input, output_lens)
            else:
                decoder_b_loss = self.criterion(decoder_b_output, mel_input)
            # decoder_c_loss = torch.nn.functional.l1_loss(decoder_b_output, decoder_output)
            attention_c_loss = torch.nn.functional.l1_loss(alignments, alignments_backwards)
            loss += self.decoder_alpha * (decoder_b_loss + attention_c_loss)
            return_dict["decoder_coarse_loss"] = decoder_b_loss
            return_dict["decoder_ddc_loss"] = attention_c_loss

        # guided attention loss (if enabled)
        if self.config.ga_alpha > 0:
            ga_loss = self.criterion_ga(alignments, input_lens, alignment_lens)
            loss += ga_loss * self.ga_alpha
            return_dict["ga_loss"] = ga_loss

        # decoder differential spectral loss
        if self.config.decoder_diff_spec_alpha > 0:
            decoder_diff_spec_loss = self.criterion_diff_spec(decoder_output, mel_input, output_lens)
            loss += decoder_diff_spec_loss * self.decoder_diff_spec_alpha
            return_dict["decoder_diff_spec_loss"] = decoder_diff_spec_loss

        # postnet differential spectral loss
        if self.config.postnet_diff_spec_alpha > 0:
            postnet_diff_spec_loss = self.criterion_diff_spec(postnet_output, postnet_target, output_lens)
            loss += postnet_diff_spec_loss * self.postnet_diff_spec_alpha
            return_dict["postnet_diff_spec_loss"] = postnet_diff_spec_loss

        # decoder ssim loss
        if self.config.decoder_ssim_alpha > 0:
            decoder_ssim_loss = self.criterion_ssim(decoder_output, mel_input, output_lens)
            loss += decoder_ssim_loss * self.postnet_ssim_alpha
            return_dict["decoder_ssim_loss"] = decoder_ssim_loss

        # postnet ssim loss
        if self.config.postnet_ssim_alpha > 0:
            postnet_ssim_loss = self.criterion_ssim(postnet_output, postnet_target, output_lens)
            loss += postnet_ssim_loss * self.postnet_ssim_alpha
            return_dict["postnet_ssim_loss"] = postnet_ssim_loss

        return_dict["loss"] = loss
        return return_dict


class GlowTTSLoss(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.constant_factor = 0.5 * math.log(2 * math.pi)

    def forward(self, z, means, scales, log_det, y_lengths, o_dur_log, o_attn_dur, x_lengths):
        return_dict = {}
        # flow loss - neg log likelihood
        pz = torch.sum(scales) + 0.5 * torch.sum(torch.exp(-2 * scales) * (z - means) ** 2)
        log_mle = self.constant_factor + (pz - torch.sum(log_det)) / (torch.sum(y_lengths) * z.shape[2])
        # duration loss - MSE
        loss_dur = torch.sum((o_dur_log - o_attn_dur) ** 2) / torch.sum(x_lengths)
        # duration loss - huber loss
        # loss_dur = torch.nn.functional.smooth_l1_loss(o_dur_log, o_attn_dur, reduction="sum") / torch.sum(x_lengths)
        return_dict["loss"] = log_mle + loss_dur
        return_dict["log_mle"] = log_mle
        return_dict["loss_dur"] = loss_dur

        # check if any loss is NaN
        for key, loss in return_dict.items():
            if torch.isnan(loss):
                raise RuntimeError(f" [!] NaN loss with {key}.")
        return return_dict


def mse_loss_custom(x, y):
    """MSE loss using the torch back-end without reduction.
    It uses less VRAM than the raw code"""
    expanded_x, expanded_y = torch.broadcast_tensors(x, y)
    return torch._C._nn.mse_loss(expanded_x, expanded_y, 0)  # pylint: disable=protected-access, c-extension-no-member


class MDNLoss(nn.Module):
    """Mixture of Density Network Loss as described in https://arxiv.org/pdf/2003.01950.pdf."""

    def forward(self, logp, text_lengths, mel_lengths):  # pylint: disable=no-self-use
        """
        Shapes:
            mu: [B, D, T]
            log_sigma: [B, D, T]
            mel_spec: [B, D, T]
        """
        B, T_seq, T_mel = logp.shape
        log_alpha = logp.new_ones(B, T_seq, T_mel) * (-1e4)
        log_alpha[:, 0, 0] = logp[:, 0, 0]
        for t in range(1, T_mel):
            prev_step = torch.cat(
                [log_alpha[:, :, t - 1 : t], functional.pad(log_alpha[:, :, t - 1 : t], (0, 0, 1, -1), value=-1e4)],
                dim=-1,
            )
            log_alpha[:, :, t] = torch.logsumexp(prev_step + 1e-4, dim=-1) + logp[:, :, t]
        alpha_last = log_alpha[torch.arange(B), text_lengths - 1, mel_lengths - 1]
        mdn_loss = -alpha_last.mean() / T_seq
        return mdn_loss  # , log_prob_matrix


class AlignTTSLoss(nn.Module):
    """Modified AlignTTS Loss.
    Computes
        - L1 and SSIM losses from output spectrograms.
        - Huber loss for duration predictor.
        - MDNLoss for Mixture of Density Network.

    All loss values are aggregated by a weighted sum of the alpha values.

    Args:
        c (dict): TTS model configuration.
    """

    def __init__(self, c):
        super().__init__()
        self.mdn_loss = MDNLoss()
        self.spec_loss = MSELossMasked(False)
        self.ssim = SSIMLoss()
        self.dur_loss = MSELossMasked(False)

        self.ssim_alpha = c.ssim_alpha
        self.dur_loss_alpha = c.dur_loss_alpha
        self.spec_loss_alpha = c.spec_loss_alpha
        self.mdn_alpha = c.mdn_alpha

    def forward(
        self, logp, decoder_output, decoder_target, decoder_output_lens, dur_output, dur_target, input_lens, phase
    ):
        # ssim_alpha, dur_loss_alpha, spec_loss_alpha, mdn_alpha = self.set_alphas(step)
        spec_loss, ssim_loss, dur_loss, mdn_loss = 0, 0, 0, 0
        if phase == 0:
            mdn_loss = self.mdn_loss(logp, input_lens, decoder_output_lens)
        elif phase == 1:
            spec_loss = self.spec_loss(decoder_output, decoder_target, decoder_output_lens)
            ssim_loss = self.ssim(decoder_output, decoder_target, decoder_output_lens)
        elif phase == 2:
            mdn_loss = self.mdn_loss(logp, input_lens, decoder_output_lens)
            spec_loss = self.spec_lossX(decoder_output, decoder_target, decoder_output_lens)
            ssim_loss = self.ssim(decoder_output, decoder_target, decoder_output_lens)
        elif phase == 3:
            dur_loss = self.dur_loss(dur_output.unsqueeze(2), dur_target.unsqueeze(2), input_lens)
        else:
            mdn_loss = self.mdn_loss(logp, input_lens, decoder_output_lens)
            spec_loss = self.spec_loss(decoder_output, decoder_target, decoder_output_lens)
            ssim_loss = self.ssim(decoder_output, decoder_target, decoder_output_lens)
            dur_loss = self.dur_loss(dur_output.unsqueeze(2), dur_target.unsqueeze(2), input_lens)
        loss = (
            self.spec_loss_alpha * spec_loss
            + self.ssim_alpha * ssim_loss
            + self.dur_loss_alpha * dur_loss
            + self.mdn_alpha * mdn_loss
        )
        return {"loss": loss, "loss_l1": spec_loss, "loss_ssim": ssim_loss, "loss_dur": dur_loss, "mdn_loss": mdn_loss}


class VitsGeneratorLoss(nn.Module):
    def __init__(self, c: Coqpit):
        super().__init__()
        self.kl_loss_alpha = c.kl_loss_alpha
        self.gen_loss_alpha = c.gen_loss_alpha
        self.feat_loss_alpha = c.feat_loss_alpha
        self.dur_loss_alpha = c.dur_loss_alpha
        self.mel_loss_alpha = c.mel_loss_alpha
        self.spk_encoder_loss_alpha = c.speaker_encoder_loss_alpha
        self.stft = TorchSTFT(
            c.audio.fft_size,
            c.audio.hop_length,
            c.audio.win_length,
            sample_rate=c.audio.sample_rate,
            mel_fmin=c.audio.mel_fmin,
            mel_fmax=c.audio.mel_fmax,
            n_mels=c.audio.num_mels,
            use_mel=True,
            do_amp_to_db=True,
        )

    @staticmethod
    def feature_loss(feats_real, feats_generated):
        loss = 0
        for dr, dg in zip(feats_real, feats_generated):
            for rl, gl in zip(dr, dg):
                rl = rl.float().detach()
                gl = gl.float()
                loss += torch.mean(torch.abs(rl - gl))
        return loss * 2

    @staticmethod
    def generator_loss(scores_fake):
        loss = 0
        gen_losses = []
        for dg in scores_fake:
            dg = dg.float()
            l = torch.mean((1 - dg) ** 2)
            gen_losses.append(l)
            loss += l

        return loss, gen_losses

    @staticmethod
    def kl_loss(z_p, logs_q, m_p, logs_p, z_mask):
        """
        z_p, logs_q: [b, h, t_t]
        m_p, logs_p: [b, h, t_t]
        """
        z_p = z_p.float()
        logs_q = logs_q.float()
        m_p = m_p.float()
        logs_p = logs_p.float()
        z_mask = z_mask.float()

        kl = logs_p - logs_q - 0.5
        kl += 0.5 * ((z_p - m_p) ** 2) * torch.exp(-2.0 * logs_p)
        kl = torch.sum(kl * z_mask)
        l = kl / torch.sum(z_mask)
        return l

    @staticmethod
    def cosine_similarity_loss(gt_spk_emb, syn_spk_emb):
        return -torch.nn.functional.cosine_similarity(gt_spk_emb, syn_spk_emb).mean()

    def forward(
        self,
        mel_slice,
        mel_slice_hat,
        z_p,
        logs_q,
        m_p,
        logs_p,
        z_len,
        scores_disc_fake,
        feats_disc_fake,
        feats_disc_real,
        loss_duration,
        use_speaker_encoder_as_loss=False,
        gt_spk_emb=None,
        syn_spk_emb=None,
    ):
        """
        Shapes:
            - mel_slice : :math:`[B, 1, T]`
            - mel_slice_hat: :math:`[B, 1, T]`
            - z_p: :math:`[B, C, T]`
            - logs_q: :math:`[B, C, T]`
            - m_p: :math:`[B, C, T]`
            - logs_p: :math:`[B, C, T]`
            - z_len: :math:`[B]`
            - scores_disc_fake[i]: :math:`[B, C]`
            - feats_disc_fake[i][j]: :math:`[B, C, T', P]`
            - feats_disc_real[i][j]: :math:`[B, C, T', P]`
        """
        loss = 0.0
        return_dict = {}
        z_mask = sequence_mask(z_len).float()
        # compute losses
        loss_kl = (
            self.kl_loss(z_p=z_p, logs_q=logs_q, m_p=m_p, logs_p=logs_p, z_mask=z_mask.unsqueeze(1))
            * self.kl_loss_alpha
        )
        loss_feat = (
            self.feature_loss(feats_real=feats_disc_real, feats_generated=feats_disc_fake) * self.feat_loss_alpha
        )
        loss_gen = self.generator_loss(scores_fake=scores_disc_fake)[0] * self.gen_loss_alpha
        loss_mel = torch.nn.functional.l1_loss(mel_slice, mel_slice_hat) * self.mel_loss_alpha
        loss_duration = torch.sum(loss_duration.float()) * self.dur_loss_alpha
        loss = loss_kl + loss_feat + loss_mel + loss_gen + loss_duration

        if use_speaker_encoder_as_loss:
            loss_se = self.cosine_similarity_loss(gt_spk_emb, syn_spk_emb) * self.spk_encoder_loss_alpha
            loss = loss + loss_se
            return_dict["loss_spk_encoder"] = loss_se
        # pass losses to the dict
        return_dict["loss_gen"] = loss_gen
        return_dict["loss_kl"] = loss_kl
        return_dict["loss_feat"] = loss_feat
        return_dict["loss_mel"] = loss_mel
        return_dict["loss_duration"] = loss_duration
        return_dict["loss"] = loss
        return return_dict


class VitsDiscriminatorLoss(nn.Module):
    def __init__(self, c: Coqpit):
        super().__init__()
        self.disc_loss_alpha = c.disc_loss_alpha

    @staticmethod
    def discriminator_loss(scores_real, scores_fake):
        loss = 0
        real_losses = []
        fake_losses = []
        for dr, dg in zip(scores_real, scores_fake):
            dr = dr.float()
            dg = dg.float()
            real_loss = torch.mean((1 - dr) ** 2)
            fake_loss = torch.mean(dg**2)
            loss += real_loss + fake_loss
            real_losses.append(real_loss.item())
            fake_losses.append(fake_loss.item())
        return loss, real_losses, fake_losses

    def forward(self, scores_disc_real, scores_disc_fake):
        loss = 0.0
        return_dict = {}
        loss_disc, loss_disc_real, _ = self.discriminator_loss(
            scores_real=scores_disc_real, scores_fake=scores_disc_fake
        )
        return_dict["loss_disc"] = loss_disc * self.disc_loss_alpha
        loss = loss + return_dict["loss_disc"]
        return_dict["loss"] = loss

        for i, ldr in enumerate(loss_disc_real):
            return_dict[f"loss_disc_real_{i}"] = ldr
        return return_dict


class ForwardTTSLoss(nn.Module):
    """Generic configurable ForwardTTS loss."""

    def __init__(self, c):
        super().__init__()
        if c.spec_loss_type == "mse":
            self.spec_loss = MSELossMasked(False)
        elif c.spec_loss_type == "l1":
            self.spec_loss = L1LossMasked(False)
        else:
            raise ValueError(" [!] Unknown spec_loss_type {}".format(c.spec_loss_type))

        if c.duration_loss_type == "mse":
            self.dur_loss = MSELossMasked(False)
        elif c.duration_loss_type == "l1":
            self.dur_loss = L1LossMasked(False)
        elif c.duration_loss_type == "huber":
            self.dur_loss = Huber()
        else:
            raise ValueError(" [!] Unknown duration_loss_type {}".format(c.duration_loss_type))

        if c.model_args.use_aligner:
            self.aligner_loss = ForwardSumLoss()
            self.aligner_loss_alpha = c.aligner_loss_alpha

        if c.model_args.use_pitch:
            self.pitch_loss = MSELossMasked(False)
            self.pitch_loss_alpha = c.pitch_loss_alpha

        if c.model_args.use_energy:
            self.energy_loss = MSELossMasked(False)
            self.energy_loss_alpha = c.energy_loss_alpha

        if c.use_ssim_loss:
            self.ssim = SSIMLoss() if c.use_ssim_loss else None
            self.ssim_loss_alpha = c.ssim_loss_alpha

        self.spec_loss_alpha = c.spec_loss_alpha
        self.dur_loss_alpha = c.dur_loss_alpha
        self.binary_alignment_loss_alpha = c.binary_align_loss_alpha

    @staticmethod
    def _binary_alignment_loss(alignment_hard, alignment_soft):
        """Binary loss that forces soft alignments to match the hard alignments as
        explained in `https://arxiv.org/pdf/2108.10447.pdf`.
        """
        log_sum = torch.log(torch.clamp(alignment_soft[alignment_hard == 1], min=1e-12)).sum()
        return -log_sum / alignment_hard.sum()

    def forward(
        self,
        decoder_output,
        decoder_target,
        decoder_output_lens,
        dur_output,
        dur_target,
        pitch_output,
        pitch_target,
        energy_output,
        energy_target,
        input_lens,
        alignment_logprob=None,
        alignment_hard=None,
        alignment_soft=None,
        binary_loss_weight=None,
    ):
        loss = 0
        return_dict = {}
        if hasattr(self, "ssim_loss") and self.ssim_loss_alpha > 0:
            ssim_loss = self.ssim(decoder_output, decoder_target, decoder_output_lens)
            loss = loss + self.ssim_loss_alpha * ssim_loss
            return_dict["loss_ssim"] = self.ssim_loss_alpha * ssim_loss

        if self.spec_loss_alpha > 0:
            spec_loss = self.spec_loss(decoder_output, decoder_target, decoder_output_lens)
            loss = loss + self.spec_loss_alpha * spec_loss
            return_dict["loss_spec"] = self.spec_loss_alpha * spec_loss

        if self.dur_loss_alpha > 0:
            log_dur_tgt = torch.log(dur_target.float() + 1)
            dur_loss = self.dur_loss(dur_output[:, :, None], log_dur_tgt[:, :, None], input_lens)
            loss = loss + self.dur_loss_alpha * dur_loss
            return_dict["loss_dur"] = self.dur_loss_alpha * dur_loss

        if hasattr(self, "pitch_loss") and self.pitch_loss_alpha > 0:
            pitch_loss = self.pitch_loss(pitch_output.transpose(1, 2), pitch_target.transpose(1, 2), input_lens)
            loss = loss + self.pitch_loss_alpha * pitch_loss
            return_dict["loss_pitch"] = self.pitch_loss_alpha * pitch_loss

        if hasattr(self, "energy_loss") and self.energy_loss_alpha > 0:
            energy_loss = self.energy_loss(energy_output.transpose(1, 2), energy_target.transpose(1, 2), input_lens)
            loss = loss + self.energy_loss_alpha * energy_loss
            return_dict["loss_energy"] = self.energy_loss_alpha * energy_loss

        if hasattr(self, "aligner_loss") and self.aligner_loss_alpha > 0:
            aligner_loss = self.aligner_loss(alignment_logprob, input_lens, decoder_output_lens)
            loss = loss + self.aligner_loss_alpha * aligner_loss
            return_dict["loss_aligner"] = self.aligner_loss_alpha * aligner_loss

        if self.binary_alignment_loss_alpha > 0 and alignment_hard is not None:
            binary_alignment_loss = self._binary_alignment_loss(alignment_hard, alignment_soft)
            loss = loss + self.binary_alignment_loss_alpha * binary_alignment_loss
            if binary_loss_weight:
                return_dict["loss_binary_alignment"] = (
                    self.binary_alignment_loss_alpha * binary_alignment_loss * binary_loss_weight
                )
            else:
                return_dict["loss_binary_alignment"] = self.binary_alignment_loss_alpha * binary_alignment_loss

        return_dict["loss"] = loss
        return return_dict


================================================
FILE: TTS/tts/layers/overflow/__init__.py
================================================


================================================
FILE: TTS/tts/layers/overflow/common_layers.py
================================================
from typing import List, Tuple

import torch
import torch.nn.functional as F
from torch import nn
from tqdm.auto import tqdm

from TTS.tts.layers.tacotron.common_layers import Linear
from TTS.tts.layers.tacotron.tacotron2 import ConvBNBlock


class Encoder(nn.Module):
    r"""Neural HMM Encoder

    Same as Tacotron 2 encoder but increases the input length by states per phone

    Args:
        num_chars (int): Number of characters in the input.
        state_per_phone (int): Number of states per phone.
        in_out_channels (int): number of input and output channels.
        n_convolutions (int): number of convolutional layers.
    """

    def __init__(self, num_chars, state_per_phone, in_out_channels=512, n_convolutions=3):
        super().__init__()

        self.state_per_phone = state_per_phone
        self.in_out_channels = in_out_channels

        self.emb = nn.Embedding(num_chars, in_out_channels)
        self.convolutions = nn.ModuleList()
        for _ in range(n_convolutions):
            self.convolutions.append(ConvBNBlock(in_out_channels, in_out_channels, 5, "relu"))
        self.lstm = nn.LSTM(
            in_out_channels,
            int(in_out_channels / 2) * state_per_phone,
            num_layers=1,
            batch_first=True,
            bias=True,
            bidirectional=True,
        )
        self.rnn_state = None

    def forward(self, x: torch.FloatTensor, x_len: torch.LongTensor) -> Tuple[torch.FloatTensor, torch.LongTensor]:
        """Forward pass to the encoder.

        Args:
            x (torch.FloatTensor): input text indices.
                - shape: :math:`(b, T_{in})`
            x_len (torch.LongTensor): input text lengths.
                - shape: :math:`(b,)`

        Returns:
            Tuple[torch.FloatTensor, torch.LongTensor]: encoder outputs and output lengths.
                -shape: :math:`((b, T_{in} * states_per_phone, in_out_channels), (b,))`
        """
        b, T = x.shape
        o = self.emb(x).transpose(1, 2)
        for layer in self.convolutions:
            o = layer(o)
        o = o.transpose(1, 2)
        o = nn.utils.rnn.pack_padded_sequence(o, x_len.cpu(), batch_first=True)
        self.lstm.flatten_parameters()
        o, _ = self.lstm(o)
        o, _ = nn.utils.rnn.pad_packed_sequence(o, batch_first=True)
        o = o.reshape(b, T * self.state_per_phone, self.in_out_channels)
        x_len = x_len * self.state_per_phone
        return o, x_len

    def inference(self, x, x_len):
        """Inference to the encoder.

        Args:
            x (torch.FloatTensor): input text indices.
                - shape: :math:`(b, T_{in})`
            x_len (torch.LongTensor): input text lengths.
                - shape: :math:`(b,)`

        Returns:
            Tuple[torch.FloatTensor, torch.LongTensor]: encoder outputs and output lengths.
                -shape: :math:`((b, T_{in} * states_per_phone, in_out_channels), (b,))`
        """
        b, T = x.shape
        o = self.emb(x).transpose(1, 2)
        for layer in self.convolutions:
            o = layer(o)
        o = o.transpose(1, 2)
        # self.lstm.flatten_parameters()
        o, _ = self.lstm(o)
        o = o.reshape(b, T * self.state_per_phone, self.in_out_channels)
        x_len = x_len * self.state_per_phone
        return o, x_len


class ParameterModel(nn.Module):
    r"""Main neural network of the outputnet

    Note: Do not put dropout layers here, the model will not converge.

    Args:
            outputnet_size (List[int]): the architecture of the parameter model
            input_size (int): size of input for the first layer
            output_size (int): size of output i.e size of the feature dim
            frame_channels (int): feature dim to set the flat start bias
            flat_start_params (dict): flat start parameters to set the bias
    """

    def __init__(
        self,
        outputnet_size: List[int],
        input_size: int,
        output_size: int,
        frame_channels: int,
        flat_start_params: dict,
    ):
        super().__init__()
        self.frame_channels = frame_channels

        self.layers = nn.ModuleList(
            [Linear(inp, out) for inp, out in zip([input_size] + outputnet_size[:-1], outputnet_size)]
        )
        self.last_layer = nn.Linear(outputnet_size[-1], output_size)
        self.flat_start_output_layer(
            flat_start_params["mean"], flat_start_params["std"], flat_start_params["transition_p"]
        )

    def flat_start_output_layer(self, mean, std, transition_p):
        self.last_layer.weight.data.zero_()
        self.last_layer.bias.data[0 : self.frame_channels] = mean
        self.last_layer.bias.data[self.frame_channels : 2 * self.frame_channels] = OverflowUtils.inverse_softplus(std)
        self.last_layer.bias.data[2 * self.frame_channels :] = OverflowUtils.inverse_sigmod(transition_p)

    def forward(self, x):
        for layer in self.layers:
            x = F.relu(layer(x))
        x = self.last_layer(x)
        return x


class Outputnet(nn.Module):
    r"""
    This network takes current state and previous observed values as input
    and returns its parameters, mean, standard deviation and probability
    of transition to the next state
    """

    def __init__(
        self,
        encoder_dim: int,
        memory_rnn_dim: int,
        frame_channels: int,
        outputnet_size: List[int],
        flat_start_params: dict,
        std_floor: float = 1e-2,
    ):
        super().__init__()

        self.frame_channels = frame_channels
        self.flat_start_params = flat_start_params
        self.std_floor = std_floor

        input_size = memory_rnn_dim + encoder_dim
        output_size = 2 * frame_channels + 1

        self.parametermodel = ParameterModel(
            outputnet_size=outputnet_size,
            input_size=input_size,
            output_size=output_size,
            flat_start_params=flat_start_params,
            frame_channels=frame_channels,
        )

    def forward(self, ar_mels, inputs):
        r"""Inputs observation and returns the means, stds and transition probability for the current state

        Args:
            ar_mel_inputs (torch.FloatTensor): shape (batch, prenet_dim)
            states (torch.FloatTensor):  (batch, hidden_states, hidden_state_dim)

        Returns:
            means: means for the emission observation for each feature
                - shape: (B, hidden_states, feature_size)
            stds: standard deviations for the emission observation for each feature
                - shape: (batch, hidden_states, feature_size)
            transition_vectors: transition vector for the current hidden state
                - shape: (batch, hidden_states)
        """
        batch_size, prenet_dim = ar_mels.shape[0], ar_mels.shape[1]
        N = inputs.shape[1]

        ar_mels = ar_mels.unsqueeze(1).expand(batch_size, N, prenet_dim)
        ar_mels = torch.cat((ar_mels, inputs), dim=2)
        ar_mels = self.parametermodel(ar_mels)

        mean, std, transition_vector = (
            ar_mels[:, :, 0 : self.frame_channels],
            ar_mels[:, :, self.frame_channels : 2 * self.frame_channels],
            ar_mels[:, :, 2 * self.frame_channels :].squeeze(2),
        )
        std = F.softplus(std)
        std = self._floor_std(std)
        return mean, std, transition_vector

    def _floor_std(self, std):
        r"""
        It clamps the standard deviation to not to go below some level
        This removes the problem when the model tries to cheat for higher likelihoods by converting
        one of the gaussians to a point mass.

        Args:
            std (float Tensor): tensor containing the standard deviation to be
        """
        original_tensor = std.clone().detach()
        std = torch.clamp(std, min=self.std_floor)
        if torch.any(original_tensor != std):
            print(
                "[*] Standard deviation was floored! The model is preventing overfitting, nothing serious to worry about"
            )
        return std


class OverflowUtils:
    @staticmethod
    def get_data_parameters_for_flat_start(
        data_loader: torch.utils.data.DataLoader, out_channels: int, states_per_phone: int
    ):
        """Generates data parameters for flat starting the HMM.

        Args:
            data_loader (torch.utils.data.Dataloader): _description_
            out_channels (int): mel spectrogram channels
            states_per_phone (_type_): HMM states per phone
        """

        # State related information for transition_p
        total_state_len = 0
        total_mel_len = 0

        # Useful for data mean an std
        total_mel_sum = 0
        total_mel_sq_sum = 0

        for batch in tqdm(data_loader, leave=False):
            text_lengths = batch["token_id_lengths"]
            mels = batch["mel"]
            mel_lengths = batch["mel_lengths"]

            total_state_len += torch.sum(text_lengths)
            total_mel_len += torch.sum(mel_lengths)
            total_mel_sum += torch.sum(mels)
            total_mel_sq_sum += torch.sum(torch.pow(mels, 2))

        data_mean = total_mel_sum / (total_mel_len * out_channels)
        data_std = torch.sqrt((total_mel_sq_sum / (total_mel_len * out_channels)) - torch.pow(data_mean, 2))
        average_num_states = total_state_len / len(data_loader.dataset)
        average_mel_len = total_mel_len / len(data_loader.dataset)
        average_duration_each_state = average_mel_len / average_num_states
        init_transition_prob = 1 / average_duration_each_state

        return data_mean, data_std, (init_transition_prob * states_per_phone)

    @staticmethod
    @torch.no_grad()
    def update_flat_start_transition(model, transition_p):
        model.neural_hmm.output_net.parametermodel.flat_start_output_layer(0.0, 1.0, transition_p)

    @staticmethod
    def log_clamped(x, eps=1e-04):
        """
        Avoids the log(0) problem

        Args:
            x (torch.tensor): input tensor
            eps (float, optional): lower bound. Defaults to 1e-04.

        Returns:
            torch.tensor: :math:`log(x)`
        """
        clamped_x = torch.clamp(x, min=eps)
        return torch.log(clamped_x)

    @staticmethod
    def inverse_sigmod(x):
        r"""
        Inverse of the sigmoid function
        """
        if not torch.is_tensor(x):
            x = torch.tensor(x)
        return OverflowUtils.log_clamped(x / (1.0 - x))

    @staticmethod
    def inverse_softplus(x):
        r"""
        Inverse of the softplus function
        """
        if not torch.is_tensor(x):
            x = torch.tensor(x)
        return OverflowUtils.log_clamped(torch.exp(x) - 1.0)

    @staticmethod
    def logsumexp(x, dim):
        r"""
        Differentiable LogSumExp: Does not creates nan gradients
            when all the inputs are -inf yeilds 0 gradients.
        Args:
            x : torch.Tensor -  The input tensor
            dim: int - The dimension on which the log sum exp has to be applied
        """

        m, _ = x.max(dim=dim)
        mask = m == -float("inf")
        s = (x - m.masked_fill_(mask, 0).unsqueeze(dim=dim)).exp().sum(dim=dim)
        return s.masked_fill_(mask, 1).log() + m.masked_fill_(mask, -float("inf"))

    @staticmethod
    def double_pad(list_of_different_shape_tensors):
        r"""
        Pads the list of tensors in 2 dimensions
        """
        second_dim_lens = [len(a) for a in [i[0] for i in list_of_different_shape_tensors]]
        second_dim_max = max(second_dim_lens)
        padded_x = [F.pad(x, (0, second_dim_max - len(x[0]))) for x in list_of_different_shape_tensors]
        return nn.utils.rnn.pad_sequence(padded_x, batch_first=True)


================================================
FILE: TTS/tts/layers/overflow/decoder.py
================================================
import torch
from torch import nn

from TTS.tts.layers.glow_tts.decoder import Decoder as GlowDecoder
from TTS.tts.utils.helpers import sequence_mask


class Decoder(nn.Module):
    """Uses glow decoder with some modifications.
    ::

        Squeeze -> ActNorm -> InvertibleConv1x1 -> AffineCoupling -> Unsqueeze

    Args:
        in_channels (int): channels of input tensor.
        hidden_channels (int): hidden decoder channels.
        kernel_size (int): Coupling block kernel size. (Wavenet filter kernel size.)
        dilation_rate (int): rate to increase dilation by each layer in a decoder block.
        num_flow_blocks (int): number of decoder blocks.
        num_coupling_layers (int): number coupling layers. (number of wavenet layers.)
        dropout_p (float): wavenet dropout rate.
        sigmoid_scale (bool): enable/disable sigmoid scaling in coupling layer.
    """

    def __init__(
        self,
        in_channels,
        hidden_channels,
        kernel_size,
        dilation_rate,
        num_flow_blocks,
        num_coupling_layers,
        dropout_p=0.0,
        num_splits=4,
        num_squeeze=2,
        sigmoid_scale=False,
        c_in_channels=0,
    ):
        super().__init__()

        self.glow_decoder = GlowDecoder(
            in_channels,
            hidden_channels,
            kernel_size,
            dilation_rate,
            num_flow_blocks,
            num_coupling_layers,
            dropout_p,
            num_splits,
            num_squeeze,
            sigmoid_scale,
            c_in_channels,
        )
        self.n_sqz = num_squeeze

    def forward(self, x, x_len, g=None, reverse=False):
        """
        Input shapes:
            - x:  :math:`[B, C, T]`
            - x_len :math:`[B]`
            - g: :math:`[B, C]`

        Output shapes:
            - x:  :math:`[B, C, T]`
            - x_len :math:`[B]`
            - logget_tot :math:`[B]`
        """
        x, x_len, x_max_len = self.preprocess(x, x_len, x_len.max())
        x_mask = torch.unsqueeze(sequence_mask(x_len, x_max_len), 1).to(x.dtype)
        x, logdet_tot = self.glow_decoder(x, x_mask, g, reverse)
        return x, x_len, logdet_tot

    def preprocess(self, y, y_lengths, y_max_length):
        if y_max_length is not None:
            y_max_length = torch.div(y_max_length, self.n_sqz, rounding_mode="floor") * self.n_sqz
            y = y[:, :, :y_max_length]
        y_lengths = torch.div(y_lengths, self.n_sqz, rounding_mode="floor") * self.n_sqz
        return y, y_lengths, y_max_length

    def store_inverse(self):
        self.glow_decoder.store_inverse()


================================================
FILE: TTS/tts/layers/overflow/neural_hmm.py
================================================
from typing import List

import torch
import torch.distributions as tdist
import torch.nn.functional as F
from torch import nn
from torch.utils.checkpoint import checkpoint

from TTS.tts.layers.overflow.common_layers import Outputnet, OverflowUtils
from TTS.tts.layers.tacotron.common_layers import Prenet
from TTS.tts.utils.helpers import sequence_mask


class NeuralHMM(nn.Module):
    """Autoregressive left to right HMM model primarily used in "Neural HMMs are all you need (for high-quality attention-free TTS)"

    Paper::
        https://arxiv.org/abs/2108.13320

    Paper abstract::
        Neural sequence-to-sequence TTS has achieved significantly better output quality than statistical speech synthesis using
        HMMs. However, neural TTS is generally not probabilistic and uses non-monotonic attention. Attention failures increase
        training time and can make synthesis babble incoherently. This paper describes how the old and new paradigms can be
        combined to obtain the advantages of both worlds, by replacing attention in neural TTS with an autoregressive left-right
        no-skip hidden Markov model defined by a neural network. Based on this proposal, we modify Tacotron 2 to obtain an
        HMM-based neural TTS model with monotonic alignment, trained to maximise the full sequence likelihood without
        approximation. We also describe how to combine ideas from classical and contemporary TTS for best results. The resulting
        example system is smaller and simpler than Tacotron 2, and learns to speak with fewer iterations and less data, whilst
        achieving comparable naturalness prior to the post-net. Our approach also allows easy control over speaking rate.

    Args:
        frame_channels (int): Output dimension to generate.
        ar_order (int): Autoregressive order of the model. In ablations of Neural HMM it was found that more autoregression while giving more variation hurts naturalness of the synthesised audio.
        deterministic_transition (bool): deterministic duration generation based on duration quantiles as defiend in "S. Ronanki, O. Watts, S. King, and G. E. Henter, “Medianbased generation of synthetic speech durations using a nonparametric approach,” in Proc. SLT, 2016.". Defaults to True.
        encoder_dim (int): Channels of encoder input and character embedding tensors. Defaults to 512.
        prenet_type (str): `original` or `bn`. `original` sets the default Prenet and `bn` uses Batch Normalization version of the Prenet.
        prenet_dim (int): Dimension of the Prenet.
        prenet_n_layers (int): Number of layers in the Prenet.
        prenet_dropout (float): Dropout probability of the Prenet.
        prenet_dropout_at_inference (bool): If True, dropout is applied at inference time.
        memory_rnn_dim (int): Size of the memory RNN to process output of prenet.
        outputnet_size (List[int]): Size of the output network inside the neural HMM.
        flat_start_params (dict): Parameters for the flat start initialization of the neural HMM.
        std_floor (float): Floor value for the standard deviation of the neural HMM. Prevents model cheating by putting point mass and getting infinite likelihood at any datapoint.
        use_grad_checkpointing (bool, optional): Use gradient checkpointing to save memory. Defaults to True.
    """

    def __init__(
        self,
        frame_channels: int,
        ar_order: int,
        deterministic_transition: bool,
        encoder_dim: int,
        prenet_type: str,
        prenet_dim: int,
        prenet_n_layers: int,
        prenet_dropout: float,
        prenet_dropout_at_inference: bool,
        memory_rnn_dim: int,
        outputnet_size: List[int],
        flat_start_params: dict,
        std_floor: float,
        use_grad_checkpointing: bool = True,
    ):
        super().__init__()

        self.frame_channels = frame_channels
        self.ar_order = ar_order
        self.deterministic_transition = deterministic_transition
        self.prenet_dim = prenet_dim
        self.memory_rnn_dim = memory_rnn_dim
        self.use_grad_checkpointing = use_grad_checkpointing

        self.transition_model = TransitionModel()
        self.emission_model = EmissionModel()

        assert ar_order > 0, f"AR order must be greater than 0 provided {ar_order}"

        self.ar_order = ar_order
        self.prenet = Prenet(
            in_features=frame_channels * ar_order,
            prenet_type=prenet_type,
            prenet_dropout=prenet_dropout,
            dropout_at_inference=prenet_dropout_at_inference,
            out_features=[self.prenet_dim for _ in range(prenet_n_layers)],
            bias=False,
        )
        self.memory_rnn = nn.LSTMCell(input_size=prenet_dim, hidden_size=memory_rnn_dim)
        self.output_net = Outputnet(
            encoder_dim, memory_rnn_dim, frame_channels, outputnet_size, flat_start_params, std_floor
        )
        self.register_buffer("go_tokens", torch.zeros(ar_order, 1))

    def forward(self, inputs, inputs_len, mels, mel_lens):
        r"""HMM forward algorithm for training uses logarithmic version of Rabiner (1989) forward algorithm.

        Args:
            inputs (torch.FloatTensor): Encoder outputs
            inputs_len (torch.LongTensor): Encoder output lengths
            mels (torch.FloatTensor): Mel inputs
            mel_lens (torch.LongTensor): Length of mel inputs

        Shapes:
            - inputs: (B, T, D_out_enc)
            - inputs_len: (B)
            - mels: (B, D_mel, T_mel)
            - mel_lens: (B)

        Returns:
            log_prob (torch.FloatTensor): Log probability of the sequence
        """
        # Get dimensions of inputs
        batch_size, N, _ = inputs.shape
        T_max = torch.max(mel_lens)
        mels = mels.permute(0, 2, 1)

        # Intialize forward algorithm
        log_state_priors = self._initialize_log_state_priors(inputs)
        log_c, log_alpha_scaled, transition_matrix, means = self._initialize_forward_algorithm_variables(mels, N)

        # Initialize autoregression elements
        ar_inputs = self._add_go_token(mels)
        h_memory, c_memory = self._init_lstm_states(batch_size, self.memory_rnn_dim, mels)

        for t in range(T_max):
            # Process Autoregression
            h_memory, c_memory = self._process_ar_timestep(t, ar_inputs, h_memory, c_memory)
            # Get mean, std and transition vector from decoder for this timestep
            # Note: Gradient checkpointing currently doesn't works with multiple gpus inside a loop
            if self.use_grad_checkpointing and self.training:
                mean, std, transition_vector = checkpoint(self.output_net, h_memory, inputs)
            else:
                mean, std, transition_vector = self.output_net(h_memory, inputs)

            if t == 0:
                log_alpha_temp = log_state_priors + self.emission_model(mels[:, 0], mean, std, inputs_len)
            else:
                log_alpha_temp = self.emission_model(mels[:, t], mean, std, inputs_len) + self.transition_model(
                    log_alpha_scaled[:, t - 1, :], transition_vector, inputs_len
                )
            log_c[:, t] = torch.logsumexp(log_alpha_temp, dim=1)
            log_alpha_scaled[:, t, :] = log_alpha_temp - log_c[:, t].unsqueeze(1)
            transition_matrix[:, t] = transition_vector  # needed for absorption state calculation

            # Save for plotting
            means.append(mean.detach())

        log_c, log_alpha_scaled = self._mask_lengths(mel_lens, log_c, log_alpha_scaled)

        sum_final_log_c = self.get_absorption_state_scaling_factor(
            mel_lens, log_alpha_scaled, inputs_len, transition_matrix
        )

        log_probs = torch.sum(log_c, dim=1) + sum_final_log_c

        return log_probs, log_alpha_scaled, transition_matrix, means

    @staticmethod
    def _mask_lengths(mel_lens, log_c, log_alpha_scaled):
        """
        Mask the lengths of the forward variables so that the variable lenghts
        do not contribute in the loss calculation
        Args:
            mel_inputs (torch.FloatTensor): (batch, T, frame_channels)
            mel_inputs_lengths (torch.IntTensor): (batch)
            log_c (torch.FloatTensor): (batch, T)
        Returns:
            log_c (torch.FloatTensor) : scaled probabilities (batch, T)
            log_alpha_scaled (torch.FloatTensor): forward probabilities (batch, T, N)
        """
        mask_log_c = sequence_mask(mel_lens)
        log_c = log_c * mask_log_c
        mask_log_alpha_scaled = mask_log_c.unsqueeze(2)
        log_alpha_scaled = log_alpha_scaled * mask_log_alpha_scaled
        return log_c, log_alpha_scaled

    def _process_ar_timestep(
        self,
        t,
        ar_inputs,
        h_memory,
        c_memory,
    ):
        """
        Process autoregression in timestep
        1. At a specific t timestep
        2. Perform data dropout if applied (we did not use it)
        3. Run the autoregressive frame through the prenet (has dropout)
        4. Run the prenet output through the post prenet rnn

        Args:
            t (int): mel-spec timestep
            ar_inputs (torch.FloatTensor): go-token appended mel-spectrograms
                - shape: (b, D_out, T_out)
            h_post_prenet (torch.FloatTensor): previous timestep rnn hidden state
                - shape: (b, memory_rnn_dim)
            c_post_prenet (torch.FloatTensor): previous timestep rnn cell state
                - shape: (b, memory_rnn_dim)

        Returns:
            h_post_prenet (torch.FloatTensor): rnn hidden state of the current timestep
            c_post_prenet (torch.FloatTensor): rnn cell state of the current timestep
        """
        prenet_input = ar_inputs[:, t : t + self.ar_order].flatten(1)
        memory_inputs = self.prenet(prenet_input)
        h_memory, c_memory = self.memory_rnn(memory_inputs, (h_memory, c_memory))
        return h_memory, c_memory

    def _add_go_token(self, mel_inputs):
        """Append the go token to create the autoregressive input
        Args:
            mel_inputs (torch.FloatTensor): (batch_size, T, n_mel_channel)
        Returns:
            ar_inputs (torch.FloatTensor): (batch_size, T, n_mel_channel)
        """
        batch_size, T, _ = mel_inputs.shape
        go_tokens = self.go_tokens.unsqueeze(0).expand(batch_size, self.ar_order, self.frame_channels)
        ar_inputs = torch.cat((go_tokens, mel_inputs), dim=1)[:, :T]
        return ar_inputs

    @staticmethod
    def _initialize_forward_algorithm_variables(mel_inputs, N):
        r"""Initialize placeholders for forward algorithm variables, to use a stable
                version we will use log_alpha_scaled and the scaling constant

        Args:
            mel_inputs (torch.FloatTensor): (b, T_max, frame_channels)
            N (int): number of states
        Returns:
            log_c (torch.FloatTensor): Scaling constant (b, T_max)
        """
        b, T_max, _ = mel_inputs.shape
        log_alpha_scaled = mel_inputs.new_zeros((b, T_max, N))
        log_c = mel_inputs.new_zeros(b, T_max)
        transition_matrix = mel_inputs.new_zeros((b, T_max, N))

        # Saving for plotting later, will not have gradient tapes
        means = []
        return log_c, log_alpha_scaled, transition_matrix, means

    @staticmethod
    def _init_lstm_states(batch_size, hidden_state_dim, device_tensor):
        r"""
        Initialize Hidden and Cell states for LSTM Cell

        Args:
            batch_size (Int): batch size
            hidden_state_dim (Int): dimensions of the h and c
            device_tensor (torch.FloatTensor): useful for the device and type

        Returns:
            (torch.FloatTensor): shape (batch_size, hidden_state_dim)
                can be hidden state for LSTM
            (torch.FloatTensor): shape (batch_size, hidden_state_dim)
                can be the cell state for LSTM
        """
        return (
            device_tensor.new_zeros(batch_size, hidden_state_dim),
            device_tensor.new_zeros(batch_size, hidden_state_dim),
        )

    def get_absorption_state_scaling_factor(self, mels_len, log_alpha_scaled, inputs_len, transition_vector):
        """Returns the final scaling factor of absorption state

        Args:
            mels_len (torch.IntTensor): Input size of mels to
                    get the last timestep of log_alpha_scaled
            log_alpha_scaled (torch.FloatTEnsor): State probabilities
            text_lengths (torch.IntTensor): length of the states to
                    mask the values of states lengths
                (
                    Useful when the batch has very different lengths,
                    when the length of an observation is less than
                    the number of max states, then the log alpha after
                    the state value is filled with -infs. So we mask
                    those values so that it only consider the states
                    which are needed for that length
                )
            transition_vector (torch.FloatTensor): transtiion vector for each state per timestep

        Shapes:
            - mels_len: (batch_size)
            - log_alpha_scaled: (batch_size, N, T)
            - text_lengths: (batch_size)
            - transition_vector: (batch_size, N, T)

        Returns:
            sum_final_log_c (torch.FloatTensor): (batch_size)

        """
        N = torch.max(inputs_len)
        max_inputs_len = log_alpha_scaled.shape[2]
        state_lengths_mask = sequence_mask(inputs_len, max_len=max_inputs_len)

        last_log_alpha_scaled_index = (
            (mels_len - 1).unsqueeze(-1).expand(-1, N).unsqueeze(1)
        )  # Batch X Hidden State Size
        last_log_alpha_scaled = torch.gather(log_alpha_scaled, 1, last_log_alpha_scaled_index).squeeze(1)
        last_log_alpha_scaled = last_log_alpha_scaled.masked_fill(~state_lengths_mask, -float("inf"))

        last_transition_vector = torch.gather(transition_vector, 1, last_log_alpha_scaled_index).squeeze(1)
        last_transition_probability = torch.sigmoid(last_transition_vector)
        log_probability_of_transitioning = OverflowUtils.log_clamped(last_transition_probability)

        last_transition_probability_index = self.get_mask_for_last_item(inputs_len, inputs_len.device)
        log_probability_of_transitioning = log_probability_of_transitioning.masked_fill(
            ~last_transition_probability_index, -float("inf")
        )
        final_log_c = last_log_alpha_scaled + log_probability_of_transitioning

        # If the length of the mel is less than the number of states it will select the -inf values leading to nan gradients
        # Ideally, we should clean the dataset otherwise this is a little hack uncomment the line below
        final_log_c = final_log_c.clamp(min=torch.finfo(final_log_c.dtype).min)

        sum_final_log_c = torch.logsumexp(final_log_c, dim=1)
        return sum_final_log_c

    @staticmethod
    def get_mask_for_last_item(lengths, device, out_tensor=None):
        """Returns n-1 mask for the last item in the sequence.

        Args:
            lengths (torch.IntTensor): lengths in a batch
            device (str, optional): Defaults to "cpu".
            out_tensor (torch.Tensor, optional): uses the memory of a specific tensor.
                Defaults to None.

        Returns:
            - Shape: :math:`(b, max_len)`
        """
        max_len = torch.max(lengths).item()
        ids = (
            torch.arange(0, max_len, device=device) if out_tensor is None else torch.arange(0, max_len, out=out_tensor)
        )
        mask = ids == lengths.unsqueeze(1) - 1
        return mask

    @torch.inference_mode()
    def inference(
        self,
        inputs: torch.FloatTensor,
        input_lens: torch.LongTensor,
        sampling_temp: float,
        max_sampling_time: int,
        duration_threshold: float,
    ):
        """Inference from autoregressive neural HMM

        Args:
            inputs (torch.FloatTensor): input states
                - shape: :math:`(b, T, d)`
            input_lens (torch.LongTensor): input state lengths
                - shape: :math:`(b)`
            sampling_temp (float): sampling temperature
            max_sampling_temp (int): max sampling temperature
            duration_threshold (float): duration threshold to switch to next state
                - Use this to change the spearking rate of the synthesised audio
        """

        b = inputs.shape[0]
        outputs = {
            "hmm_outputs": [],
            "hmm_outputs_len": [],
            "alignments": [],
            "input_parameters": [],
            "output_parameters": [],
        }
        for i in range(b):
            neural_hmm_outputs, states_travelled, input_parameters, output_parameters = self.sample(
                inputs[i : i + 1], input_lens[i], sampling_temp, max_sampling_time, duration_threshold
            )

            outputs["hmm_outputs"].append(neural_hmm_outputs)
            outputs["hmm_outputs_len"].append(neural_hmm_outputs.shape[0])
            outputs["alignments"].append(states_travelled)
            outputs["input_parameters"].append(input_parameters)
            outputs["output_parameters"].append(output_parameters)

        outputs["hmm_outputs"] = nn.utils.rnn.pad_sequence(outputs["hmm_outputs"], batch_first=True)
        outputs["hmm_outputs_len"] = torch.tensor(
            outputs["hmm_outputs_len"], dtype=input_lens.dtype, device=input_lens.device
        )
        return outputs

    @torch.inference_mode()
    def sample(self, inputs, input_lens, sampling_temp, max_sampling_time, duration_threshold):
        """Samples an output from the parameter models

        Args:
            inputs (torch.FloatTensor): input states
                - shape: :math:`(1, T, d)`
            input_lens (torch.LongTensor): input state lengths
                - shape: :math:`(1)`
            sampling_temp (float): sampling temperature
            max_sampling_time (int): max sampling time
            duration_threshold (float): duration threshold to switch to next state

        Returns:
            outputs (torch.FloatTensor): Output Observations
                - Shape: :math:`(T, output_dim)`
            states_travelled (list[int]): Hidden states travelled
                - Shape: :math:`(T)`
            input_parameters (list[torch.FloatTensor]): Input parameters
            output_parameters (list[torch.FloatTensor]): Output parameters
        """
        states_travelled, outputs, t = [], [], 0

        # Sample initial state
        current_state = 0
        states_travelled.append(current_state)

        # Prepare autoregression
        prenet_input = self.go_tokens.unsqueeze(0).expand(1, self.ar_order, self.frame_channels)
        h_memory, c_memory = self._init_lstm_states(1, self.memory_rnn_dim, prenet_input)

        input_parameter_values = []
        output_parameter_values = []
        quantile = 1
        while True:
            memory_input = self.prenet(prenet_input.flatten(1).unsqueeze(0))
            # will be 1 while sampling
            h_memory, c_memory = self.memory_rnn(memory_input.squeeze(0), (h_memory, c_memory))

            z_t = inputs[:, current_state].unsqueeze(0)  # Add fake time dimension
            mean, std, transition_vector = self.output_net(h_memory, z_t)

            transition_probability = torch.sigmoid(transition_vector.flatten())
            staying_probability = torch.sigmoid(-transition_vector.flatten())

            # Save for plotting
            input_parameter_values.append([prenet_input, current_state])
            output_parameter_values.append([mean, std, transition_probability])

            x_t = self.emission_model.sample(mean, std, sampling_temp=sampling_temp)

            # Prepare autoregressive input for next iteration
            prenet_input = torch.cat((prenet_input, x_t), dim=1)[:, 1:]

            outputs.append(x_t.flatten())

            transition_matrix = torch.cat((staying_probability, transition_probability))
            quantile *= staying_probability
            if not self.deterministic_transition:
                switch = transition_matrix.multinomial(1)[0].item()
            else:
                switch = quantile < duration_threshold

            if switch:
                current_state += 1
                quantile = 1

            states_travelled.append(current_state)

            if (current_state == input_lens) or (max_sampling_time and t == max_sampling_time - 1):
                break

            t += 1

        return (
            torch.stack(outputs, dim=0),
            F.one_hot(input_lens.new_tensor(states_travelled)),
            input_parameter_values,
            output_parameter_values,
        )

    @staticmethod
    def _initialize_log_state_priors(text_embeddings):
        """Creates the log pi in forward algorithm.

        Args:
            text_embeddings (torch.FloatTensor): used to create the log pi
                    on current device

        Shapes:
            - text_embeddings: (B, T, D_out_enc)
        """
        N = text_embeddings.shape[1]
        log_state_priors = text_embeddings.new_full([N], -float("inf"))
        log_state_priors[0] = 0.0
        return log_state_priors


class TransitionModel(nn.Module):
    """Transition Model of the HMM, it represents the probability of transitioning
    form current state to all other states"""

    def forward(self, log_alpha_scaled, transition_vector, inputs_len):  # pylint: disable=no-self-use
        r"""
        product of the past state with transitional probabilities in log space

        Args:
            log_alpha_scaled (torch.Tensor): Multiply previous timestep's alphas by
                        transition matrix (in log domain)
                - shape: (batch size, N)
            transition_vector (torch.tensor): transition vector for each state
                - shape: (N)
            inputs_len (int tensor): Lengths of states in a batch
                - shape: (batch)

        Returns:
            out (torch.FloatTensor): log probability of transitioning to each state
        """
        transition_p = torch.sigmoid(transition_vector)
        staying_p = torch.sigmoid(-transition_vector)

        log_staying_probability = OverflowUtils.log_clamped(staying_p)
        log_transition_probability = OverflowUtils.log_clamped(transition_p)

        staying = log_alpha_scaled + log_staying_probability
        leaving = log_alpha_scaled + log_transition_probability
        leaving = leaving.roll(1, dims=1)
        leaving[:, 0] = -float("inf")
        inputs_len_mask = sequence_mask(inputs_len)
        out = OverflowUtils.logsumexp(torch.stack((staying, leaving), dim=2), dim=2)
        out = out.masked_fill(~inputs_len_mask, -float("inf"))  # There are no states to contribute to the loss
        return out


class EmissionModel(nn.Module):
    """Emission Model of the HMM, it represents the probability of
    emitting an observation based on the current state"""

    def __init__(self) -> None:
        super().__init__()
        self.distribution_function: tdist.Distribution = tdist.normal.Normal

    def sample(self, means, stds, sampling_temp):
        return self.distribution_function(means, stds * sampling_temp).sample() if sampling_temp > 0 else means

    def forward(self, x_t, means, stds, state_lengths):
        r"""Calculates the log probability of the the given data (x_t)
            being observed from states with given means and stds
        Args:
            x_t (float tensor) : observation at current time step
                - shape: (batch, feature_dim)
            means (float tensor): means of the distributions of hidden states
                - shape: (batch, hidden_state, feature_dim)
            stds (float tensor): standard deviations of the distributions of the hidden states
                - shape: (batch, hidden_state, feature_dim)
            state_lengths (int tensor): Lengths of states in a batch
                - shape: (batch)

        Returns:
            out (float tensor): observation log likelihoods,
                                    expressing the probability of an observation
                being generated from a state i
                shape: (batch, hidden_state)
        """
        emission_dists = self.distribution_function(means, stds)
        out = emission_dists.log_prob(x_t.unsqueeze(1))
        state_lengths_mask = sequence_mask(state_lengths).unsqueeze(2)
        out = torch.sum(out * state_lengths_mask, dim=2)
        return out


================================================
FILE: TTS/tts/layers/overflow/plotting_utils.py
================================================
from typing import Any

import matplotlib.pyplot as plt
import numpy as np
import torch


def validate_numpy_array(value: Any):
    r"""
    Validates the input and makes sure it returns a numpy array (i.e on CPU)

    Args:
        value (Any): the input value

    Raises:
        TypeError: if the value is not a numpy array or torch tensor

    Returns:
        np.ndarray: numpy array of the value
    """
    if isinstance(value, np.ndarray):
        pass
    elif isinstance(value, list):
        value = np.array(value)
    elif torch.is_tensor(value):
        value = value.cpu().numpy()
    else:
        raise TypeError("Value must be a numpy array, a torch tensor or a list")

    return value


def get_spec_from_most_probable_state(log_alpha_scaled, means, decoder=None):
    """Get the most probable state means from the log_alpha_scaled.

    Args:
        log_alpha_scaled (torch.Tensor): Log alpha scaled values.
            - Shape: :math:`(T, N)`
        means (torch.Tensor): Means of the states.
            - Shape: :math:`(N, T, D_out)`
        decoder (torch.nn.Module): Decoder module to decode the latent to melspectrogram. Defaults to None.
    """
    max_state_numbers = torch.max(log_alpha_scaled, dim=1)[1]
    max_len = means.shape[0]
    n_mel_channels = means.shape[2]
    max_state_numbers = max_state_numbers.unsqueeze(1).unsqueeze(1).expand(max_len, 1, n_mel_channels)
    means = torch.gather(means, 1, max_state_numbers).squeeze(1).to(log_alpha_scaled.dtype)
    if decoder is not None:
        mel = (
            decoder(means.T.unsqueeze(0), torch.tensor([means.shape[0]], device=means.device), reverse=True)[0]
            .squeeze(0)
            .T
        )
    else:
        mel = means
    return mel


def plot_transition_probabilities_to_numpy(states, transition_probabilities, output_fig=False):
    """Generates trainsition probabilities plot for the states and the probability of transition.

    Args:
        states (torch.IntTensor): the states
        transition_probabilities (torch.FloatTensor): the transition probabilities
    """
    states = validate_numpy_array(states)
    transition_probabilities = validate_numpy_array(transition_probabilities)

    fig, ax = plt.subplots(figsize=(30, 3))
    ax.plot(transition_probabilities, "o")
    ax.set_title("Transition probability of state")
    ax.set_xlabel("hidden state")
    ax.set_ylabel("probability")
    ax.set_xticks([i for i in range(len(transition_probabilities))])  # pylint: disable=unnecessary-comprehension
    ax.set_xticklabels([int(x) for x in states], rotation=90)
    plt.tight_layout()
    if not output_fig:
        plt.close()
    return fig


================================================
FILE: TTS/tts/layers/tacotron/__init__.py
================================================


================================================
FILE: TTS/tts/layers/tacotron/attentions.py
================================================
import torch
from scipy.stats import betabinom
from torch import nn
from torch.nn import functional as F

from TTS.tts.layers.tacotron.common_layers import Linear


class LocationLayer(nn.Module):
    """Layers for Location Sensitive Attention

    Args:
        attention_dim (int): number of channels in the input tensor.
        attention_n_filters (int, optional): number of filters in convolution. Defaults to 32.
        attention_kernel_size (int, optional): kernel size of convolution filter. Defaults to 31.
    """

    def __init__(self, attention_dim, attention_n_filters=32, attention_kernel_size=31):
        super().__init__()
        self.location_conv1d = nn.Conv1d(
            in_channels=2,
            out_channels=attention_n_filters,
            kernel_size=attention_kernel_size,
            stride=1,
            padding=(attention_kernel_size - 1) // 2,
            bias=False,
        )
        self.location_dense = Linear(attention_n_filters, attention_dim, bias=False, init_gain="tanh")

    def forward(self, attention_cat):
        """
        Shapes:
            attention_cat: [B, 2, C]
        """
        processed_attention = self.location_conv1d(attention_cat)
        processed_attention = self.location_dense(processed_attention.transpose(1, 2))
        return processed_attention


class GravesAttention(nn.Module):
    """Graves Attention as is ref1 with updates from ref2.
    ref1: https://arxiv.org/abs/1910.10288
    ref2: https://arxiv.org/pdf/1906.01083.pdf

    Args:
        query_dim (int): number of channels in query tensor.
        K (int): number of Gaussian heads to be used for computing attention.
    """

    COEF = 0.3989422917366028  # numpy.sqrt(1/(2*numpy.pi))

    def __init__(self, query_dim, K):
        super().__init__()
        self._mask_value = 1e-8
        self.K = K
        # self.attention_alignment = 0.05
        self.eps = 1e-5
        self.J = None
        self.N_a = nn.Sequential(
            nn.Linear(query_dim, query_dim, bias=True), nn.ReLU(), nn.Linear(query_dim, 3 * K, bias=True)
        )
        self.attention_weights = None
        self.mu_prev = None
        self.init_layers()

    def init_layers(self):
        torch.nn.init.constant_(self.N_a[2].bias[(2 * self.K) : (3 * self.K)], 1.0)  # bias mean
        torch.nn.init.constant_(self.N_a[2].bias[self.K : (2 * self.K)], 10)  # bias std

    def init_states(self, inputs):
        if self.J is None or inputs.shape[1] + 1 > self.J.shape[-1]:
            self.J = torch.arange(0, inputs.shape[1] + 2.0).to(inputs.device) + 0.5
        self.attention_weights = torch.zeros(inputs.shape[0], inputs.shape[1]).to(inputs.device)
        self.mu_prev = torch.zeros(inputs.shape[0], self.K).to(inputs.device)

    # pylint: disable=R0201
    # pylint: disable=unused-argument
    def preprocess_inputs(self, inputs):
        return None

    def forward(self, query, inputs, processed_inputs, mask):
        """
        Shapes:
            query: [B, C_attention_rnn]
            inputs: [B, T_in, C_encoder]
            processed_inputs: place_holder
            mask: [B, T_in]
        """
        gbk_t = self.N_a(query)
        gbk_t = gbk_t.view(gbk_t.size(0), -1, self.K)

        # attention model parameters
        # each B x K
        g_t = gbk_t[:, 0, :]
        b_t = gbk_t[:, 1, :]
        k_t = gbk_t[:, 2, :]

        # dropout to decorrelate attention heads
        g_t = torch.nn.functional.dropout(g_t, p=0.5, training=self.training)

        # attention GMM parameters
        sig_t = torch.nn.functional.softplus(b_t) + self.eps

        mu_t = self.mu_prev + torch.nn.functional.softplus(k_t)
        g_t = torch.softmax(g_t, dim=-1) + self.eps

        j = self.J[: inputs.size(1) + 1]

        # attention weights
        phi_t = g_t.unsqueeze(-1) * (1 / (1 + torch.sigmoid((mu_t.unsqueeze(-1) - j) / sig_t.unsqueeze(-1))))

        # discritize attention weights
        alpha_t = torch.sum(phi_t, 1)
        alpha_t = alpha_t[:, 1:] - alpha_t[:, :-1]
        alpha_t[alpha_t == 0] = 1e-8

        # apply masking
        if mask is not None:
            alpha_t.data.masked_fill_(~mask, self._mask_value)

        context = torch.bmm(alpha_t.unsqueeze(1), inputs).squeeze(1)
        self.attention_weights = alpha_t
        self.mu_prev = mu_t
        return context


class OriginalAttention(nn.Module):
    """Bahdanau Attention with various optional modifications.
    - Location sensitive attnetion: https://arxiv.org/abs/1712.05884
    - Forward Attention: https://arxiv.org/abs/1807.06736 + state masking at inference
    - Using sigmoid instead of softmax normalization
    - Attention windowing at inference time

    Note:
        Location Sensitive Attention extends the additive attention mechanism
    to use cumulative attention weights from previous decoder time steps with the current time step features.

        Forward attention computes most probable monotonic alignment. The modified attention probabilities at each
    timestep are computed recursively by the forward algorithm.

        Transition agent in the forward attention explicitly gates the attention mechanism whether to move forward or
    stay at each decoder timestep.

        Attention windowing is a inductive prior that prevents the model from attending to previous and future timesteps
    beyond a certain window.

    Args:
        query_dim (int): number of channels in the query tensor.
        embedding_dim (int): number of channels in the vakue tensor. In general, the value tensor is the output of the encoder layer.
        attention_dim (int): number of channels of the inner attention layers.
        location_attention (bool): enable/disable location sensitive attention.
        attention_location_n_filters (int): number of location attention filters.
        attention_location_kernel_size (int): filter size of location attention convolution layer.
        windowing (int): window size for attention windowing. if it is 5, for computing the attention, it only considers the time steps [(t-5), ..., (t+5)] of the input.
        norm (str): normalization method applied to the attention weights. 'softmax' or 'sigmoid'
        forward_attn (bool): enable/disable forward attention.
        trans_agent (bool): enable/disable transition agent in the forward attention.
        forward_attn_mask (int): enable/disable an explicit masking in forward attention. It is useful to set at especially inference time.
    """

    # Pylint gets confused by PyTorch conventions here
    # pylint: disable=attribute-defined-outside-init
    def __init__(
        self,
        query_dim,
        embedding_dim,
        attention_dim,
        location_attention,
        attention_location_n_filters,
        attention_location_kernel_size,
        windowing,
        norm,
        forward_attn,
        trans_agent,
        forward_attn_mask,
    ):
        super().__init__()
        self.query_layer = Linear(query_dim, attention_dim, bias=False, init_gain="tanh")
        self.inputs_layer = Linear(embedding_dim, attention_dim, bias=False, init_gain="tanh")
        self.v = Linear(attention_dim, 1, bias=True)
        if trans_agent:
            self.ta = nn.Linear(query_dim + embedding_dim, 1, bias=True)
        if location_attention:
            self.location_layer = LocationLayer(
                attention_dim,
                attention_location_n_filters,
                attention_location_kernel_size,
            )
        self._mask_value = -float("inf")
        self.windowing = windowing
        self.win_idx = None
        self.norm = norm
        self.forward_attn = forward_attn
        self.trans_agent = trans_agent
        self.forward_attn_mask = forward_attn_mask
        self.location_attention = location_attention

    def init_win_idx(self):
        self.win_idx = -1
        self.win_back = 2
        self.win_front = 6

    def init_forward_attn(self, inputs):
        B = inputs.shape[0]
        T = inputs.shape[1]
        self.alpha = torch.cat([torch.ones([B, 1]), torch.zeros([B, T])[:, :-1] + 1e-7], dim=1).to(inputs.device)
        self.u = (0.5 * torch.ones([B, 1])).to(inputs.device)

    def init_location_attention(self, inputs):
        B = inputs.size(0)
        T = inputs.size(1)
        self.attention_weights_cum = torch.zeros([B, T], device=inputs.device)

    def init_states(self, inputs):
        B = inputs.size(0)
        T = inputs.size(1)
        self.attention_weights = torch.zeros([B, T], device=inputs.device)
        if self.location_attention:
            self.init_location_attention(inputs)
        if self.forward_attn:
            self.init_forward_attn(inputs)
        if self.windowing:
            self.init_win_idx()

    def preprocess_inputs(self, inputs):
        return self.inputs_layer(inputs)

    def update_location_attention(self, alignments):
        self.attention_weights_cum += alignments

    def get_location_attention(self, query, processed_inputs):
        attention_cat = torch.cat((self.attention_weights.unsqueeze(1), self.attention_weights_cum.unsqueeze(1)), dim=1)
        processed_query = self.query_layer(query.unsqueeze(1))
        processed_attention_weights = self.location_layer(attention_cat)
        energies = self.v(torch.tanh(processed_query + processed_attention_weights + processed_inputs))
        energies = energies.squeeze(-1)
        return energies, processed_query

    def get_attention(self, query, processed_inputs):
        processed_query = self.query_layer(query.unsqueeze(1))
        energies = self.v(torch.tanh(processed_query + processed_inputs))
        energies = energies.squeeze(-1)
        return energies, processed_query

    def apply_windowing(self, attention, inputs):
        back_win = self.win_idx - self.win_back
        front_win = self.win_idx + self.win_front
        if back_win > 0:
            attention[:, :back_win] = -float("inf")
        if front_win < inputs.shape[1]:
            attention[:, front_win:] = -float("inf")
        # this is a trick to solve a special problem.
        # but it does not hurt.
        if self.win_idx == -1:
            attention[:, 0] = attention.max()
        # Update the window
        self.win_idx = torch.argmax(attention, 1).long()[0].item()
        return attention

    def apply_forward_attention(self, alignment):
        # forward attention
        fwd_shifted_alpha = F.pad(self.alpha[:, :-1].clone().to(alignment.device), (1, 0, 0, 0))
        # compute transition potentials
        alpha = ((1 - self.u) * self.alpha + self.u * fwd_shifted_alpha + 1e-8) * alignment
        # force incremental alignment
        if not self.training and self.forward_attn_mask:
            _, n = fwd_shifted_alpha.max(1)
            val, _ = alpha.max(1)
            for b in range(alignment.shape[0]):
                alpha[b, n[b] + 3 :] = 0
                alpha[b, : (n[b] - 1)] = 0  # ignore all previous states to prevent repetition.
                alpha[b, (n[b] - 2)] = 0.01 * val[b]  # smoothing factor for the prev step
        # renormalize attention weights
        alpha = alpha / alpha.sum(dim=1, keepdim=True)
        return alpha

    def forward(self, query, inputs, processed_inputs, mask):
        """
        shapes:
            query: [B, C_attn_rnn]
            inputs: [B, T_en, D_en]
            processed_inputs: [B, T_en, D_attn]
            mask: [B, T_en]
        """
        if self.location_attention:
            attention, _ = self.get_location_attention(query, processed_inputs)
        else:
            attention, _ = self.get_attention(query, processed_inputs)
        # apply masking
        if mask is not None:
            attention.data.masked_fill_(~mask, self._mask_value)
        # apply windowing - only in eval mode
        if not self.training and self.windowing:
            attention = self.apply_windowing(attention, inputs)

        # normalize attention values
        if self.norm == "softmax":
            alignment = torch.softmax(attention, dim=-1)
        elif self.norm == "sigmoid":
            alignment = torch.sigmoid(attention) / torch.sigmoid(attention).sum(dim=1, keepdim=True)
        else:
            raise ValueError("Unknown value for attention norm type")

        if self.location_attention:
            self.update_location_attention(alignment)

        # apply forward attention if enabled
        if self.forward_attn:
            alignment = self.apply_forward_attention(alignment)
            self.alpha = alignment

        context = torch.bmm(alignment.unsqueeze(1), inputs)
        context = context.squeeze(1)
        self.attention_weights = alignment

        # compute transition agent
        if self.forward_attn and self.trans_agent:
            ta_input = torch.cat([context, query.squeeze(1)], dim=-1)
            self.u = torch.sigmoid(self.ta(ta_input))
        return context


class MonotonicDynamicConvolutionAttention(nn.Module):
    """Dynamic convolution attention from
    https://arxiv.org/pdf/1910.10288.pdf


    query -> linear -> tanh -> linear ->|
                                        |                                            mask values
                                        v                                              |    |
               atten_w(t-1) -|-> conv1d_dynamic -> linear -|-> tanh -> + -> softmax -> * -> * -> context
                             |-> conv1d_static  -> linear -|           |
                             |-> conv1d_prior   -> log ----------------|

    query: attention rnn output.

    Note:
        Dynamic convolution attention is an alternation of the location senstive attention with
    dynamically computed convolution filters from the previous attention scores and a set of
    constraints to keep the attention alignment diagonal.
        DCA is sensitive to mixed precision training and might cause instable training.

    Args:
        query_dim (int): number of channels in the query tensor.
        embedding_dim (int): number of channels in the value tensor.
        static_filter_dim (int): number of channels in the convolution layer computing the static filters.
        static_kernel_size (int): kernel size for the convolution layer computing the static filters.
        dynamic_filter_dim (int): number of channels in the convolution layer computing the dynamic filters.
        dynamic_kernel_size (int): kernel size for the convolution layer computing the dynamic filters.
        prior_filter_len (int, optional): [description]. Defaults to 11 from the paper.
        alpha (float, optional): [description]. Defaults to 0.1 from the paper.
        beta (float, optional): [description]. Defaults to 0.9 from the paper.
    """

    def __init__(
        self,
        query_dim,
        embedding_dim,  # pylint: disable=unused-argument
        attention_dim,
        static_filter_dim,
        static_kernel_size,
        dynamic_filter_dim,
        dynamic_kernel_size,
        prior_filter_len=11,
        alpha=0.1,
        beta=0.9,
    ):
        super().__init__()
        self._mask_value = 1e-8
        self.dynamic_filter_dim = dynamic_filter_dim
        self.dynamic_kernel_size = dynamic_kernel_size
        self.prior_filter_len = prior_filter_len
        self.attention_weights = None
        # setup key and query layers
        self.query_layer = nn.Linear(query_dim, attention_dim)
        self.key_layer = nn.Linear(attention_dim, dynamic_filter_dim * dynamic_kernel_size, bias=False)
        self.static_filter_conv = nn.Conv1d(
            1,
            static_filter_dim,
            static_kernel_size,
            padding=(static_kernel_size - 1) // 2,
            bias=False,
        )
        self.static_filter_layer = nn.Linear(static_filter_dim, attention_dim, bias=False)
        self.dynamic_filter_layer = nn.Linear(dynamic_filter_dim, attention_dim)
        self.v = nn.Linear(attention_dim, 1, bias=False)

        prior = betabinom.pmf(range(prior_filter_len), prior_filter_len - 1, alpha, beta)
        self.register_buffer("prior", torch.FloatTensor(prior).flip(0))

    # pylint: disable=unused-argument
    def forward(self, query, inputs, processed_inputs, mask):
        """
        query: [B, C_attn_rnn]
        inputs: [B, T_en, D_en]
        processed_inputs: place holder.
        mask: [B, T_en]
        """
        # compute prior filters
        prior_filter = F.conv1d(
            F.pad(self.attention_weights.unsqueeze(1), (self.prior_filter_len - 1, 0)), self.prior.view(1, 1, -1)
        )
        prior_filter = torch.log(prior_filter.clamp_min_(1e-6)).squeeze(1)
        G = self.key_layer(torch.tanh(self.query_layer(query)))
        # compute dynamic filters
        dynamic_filter = F.conv1d(
            self.attention_weights.unsqueeze(0),
            G.view(-1, 1, self.dynamic_kernel_size),
            padding=(self.dynamic_kernel_size - 1) // 2,
            groups=query.size(0),
        )
        dynamic_filter = dynamic_filter.view(query.size(0), self.dynamic_filter_dim, -1).transpose(1, 2)
        # compute static filters
        static_filter = self.static_filter_conv(self.attention_weights.unsqueeze(1)).transpose(1, 2)
        alignment = (
            self.v(
                torch.tanh(self.static_filter_layer(static_filter) + self.dynamic_filter_layer(dynamic_filter))
            ).squeeze(-1)
            + prior_filter
        )
        # compute attention weights
        attention_weights = F.softmax(alignment, dim=-1)
        # apply masking
        if mask is not None:
            attention_weights.data.masked_fill_(~mask, self._mask_value)
        self.attention_weights = attention_weights
        # compute context
        context = torch.bmm(attention_weights.unsqueeze(1), inputs).squeeze(1)
        return context

    def preprocess_inputs(self, inputs):  # pylint: disable=no-self-use
        return None

    def init_states(self, inputs):
        B = inputs.size(0)
        T = inputs.size(1)
        self.attention_weights = torch.zeros([B, T], device=inputs.device)
        self.attention_weights[:, 0] = 1.0


def init_attn(
    attn_type,
    query_dim,
    embedding_dim,
    attention_dim,
    location_attention,
    attention_location_n_filters,
    attention_location_kernel_size,
    windowing,
    norm,
    forward_attn,
    trans_agent,
    forward_attn_mask,
    attn_K,
):
    if attn_type == "original":
        return OriginalAttention(
            query_dim,
            embedding_dim,
            attention_dim,
            location_attention,
            attention_location_n_filters,
            attention_location_kernel_size,
            windowing,
            norm,
            forward_attn,
            trans_agent,
            forward_attn_mask,
        )
    if attn_type == "graves":
        return GravesAttention(query_dim, attn_K)
    if attn_type == "dynamic_convolution":
        return MonotonicDynamicConvolutionAttention(
            query_dim,
            embedding_dim,
            attention_dim,
            static_filter_dim=8,
            static_kernel_size=21,
            dynamic_filter_dim=8,
            dynamic_kernel_size=21,
            prior_filter_len=11,
            alpha=0.1,
            beta=0.9,
        )

    raise RuntimeError(f" [!] Given Attention Type '{attn_type}' is not exist.")


================================================
FILE: TTS/tts/layers/tacotron/capacitron_layers.py
================================================
import torch
from torch import nn
from torch.distributions.multivariate_normal import MultivariateNormal as MVN
from torch.nn import functional as F


class CapacitronVAE(nn.Module):
    """Effective Use of Variational Embedding Capacity for prosody transfer.

    See https://arxiv.org/abs/1906.03402"""

    def __init__(
        self,
        num_mel,
        capacitron_VAE_embedding_dim,
        encoder_output_dim=256,
        reference_encoder_out_dim=128,
        speaker_embedding_dim=None,
        text_summary_embedding_dim=None,
    ):
        super().__init__()
        # Init distributions
        self.prior_distribution = MVN(
            torch.zeros(capacitron_VAE_embedding_dim), torch.eye(capacitron_VAE_embedding_dim)
        )
        self.approximate_posterior_distribution = None
        # define output ReferenceEncoder dim to the capacitron_VAE_embedding_dim
        self.encoder = ReferenceEncoder(num_mel, out_dim=reference_encoder_out_dim)

        # Init beta, the lagrange-like term for the KL distribution
        self.beta = torch.nn.Parameter(torch.log(torch.exp(torch.Tensor([1.0])) - 1), requires_grad=True)
        mlp_input_dimension = reference_encoder_out_dim

        if text_summary_embedding_dim is not None:
            self.text_summary_net = TextSummary(text_summary_embedding_dim, encoder_output_dim=encoder_output_dim)
            mlp_input_dimension += text_summary_embedding_dim
        if speaker_embedding_dim is not None:
            # TODO: Test a multispeaker model!
            mlp_input_dimension += speaker_embedding_dim
        self.post_encoder_mlp = PostEncoderMLP(mlp_input_dimension, capacitron_VAE_embedding_dim)

    def forward(self, reference_mel_info=None, text_info=None, speaker_embedding=None):
        # Use reference
        if reference_mel_info is not None:
            reference_mels = reference_mel_info[0]  # [batch_size, num_frames, num_mels]
            mel_lengths = reference_mel_info[1]  # [batch_size]
            enc_out = self.encoder(reference_mels, mel_lengths)

            # concat speaker_embedding and/or text summary embedding
            if text_info is not None:
                text_inputs = text_info[0]  # [batch_size, num_characters, num_embedding]
                input_lengths = text_info[1]
                text_summary_out = self.text_summary_net(text_inputs, input_lengths).to(reference_mels.device)
                enc_out = torch.cat([enc_out, text_summary_out], dim=-1)
            if speaker_embedding is not None:
                speaker_embedding = torch.squeeze(speaker_embedding)
                enc_out = torch.cat([enc_out, speaker_embedding], dim=-1)

            # Feed the output of the ref encoder and information about text/speaker into
            # an MLP to produce the parameteres for the approximate poterior distributions
            mu, sigma = self.post_encoder_mlp(enc_out)
            # convert to cpu because prior_distribution was created on cpu
            mu = mu.cpu()
            sigma = sigma.cpu()

            # Sample from the posterior: z ~ q(z|x)
            self.approximate_posterior_distribution = MVN(mu, torch.diag_embed(sigma))
            VAE_embedding = self.approximate_posterior_distribution.rsample()
        # Infer from the model, bypasses encoding
        else:
            # Sample from the prior: z ~ p(z)
            VAE_embedding = self.prior_distribution.sample().unsqueeze(0)

        # reshape to [batch_size, 1, capacitron_VAE_embedding_dim]
        return VAE_embedding.unsqueeze(1), self.approximate_posterior_distribution, self.prior_distribution, self.beta


class ReferenceEncoder(nn.Module):
    """NN module creating a fixed size prosody embedding from a spectrogram.

    inputs: mel spectrograms [batch_size, num_spec_frames, num_mel]
    outputs: [batch_size, embedding_dim]
    """

    def __init__(self, num_mel, out_dim):
        super().__init__()
        self.num_mel = num_mel
        filters = [1] + [32, 32, 64, 64, 128, 128]
        num_layers = len(filters) - 1
        convs = [
            nn.Conv2d(
                in_channels=filters[i], out_channels=filters[i + 1], kernel_size=(3, 3), stride=(2, 2), padding=(2, 2)
            )
            for i in range(num_layers)
        ]
        self.convs = nn.ModuleList(convs)
        self.training = False
        self.bns = nn.ModuleList([nn.BatchNorm2d(num_features=filter_size) for filter_size in filters[1:]])

        post_conv_height = self.calculate_post_conv_height(num_mel, 3, 2, 2, num_layers)
        self.recurrence = nn.LSTM(
            input_size=filters[-1] * post_conv_height, hidden_size=out_dim, batch_first=True, bidirectional=False
        )

    def forward(self, inputs, input_lengths):
        batch_size = inputs.size(0)
        x = inputs.view(batch_size, 1, -1, self.num_mel)  # [batch_size, num_channels==1, num_frames, num_mel]
        valid_lengths = input_lengths.float()  # [batch_size]
        for conv, bn in zip(self.convs, self.bns):
            x = conv(x)
            x = bn(x)
            x = F.relu(x)

            # Create the post conv width mask based on the valid lengths of the output of the convolution.
            # The valid lengths for the output of a convolution on varying length inputs is
            # ceil(input_length/stride) + 1 for stride=3 and padding=2
            # For example (kernel_size=3, stride=2, padding=2):
            # 0 0 x x x x x 0 0 -> Input = 5, 0 is zero padding, x is valid values coming from padding=2 in conv2d
            # _____
            #   x _____
            #       x _____
            #           x  ____
            #               x
            # x x x x -> Output valid length = 4
            # Since every example in te batch is zero padded and therefore have separate valid_lengths,
            # we need to mask off all the values AFTER the valid length for each example in the batch.
            # Otherwise, the convolutions create noise and a lot of not real information
            valid_lengths = (valid_lengths / 2).float()
            valid_lengths = torch.ceil(valid_lengths).to(dtype=torch.int64) + 1  # 2 is stride -- size: [batch_size]
            post_conv_max_width = x.size(2)

            mask = torch.arange(post_conv_max_width).to(inputs.device).expand(
                len(valid_lengths), post_conv_max_width
            ) < valid_lengths.unsqueeze(1)
            mask = mask.expand(1, 1, -1, -1).transpose(2, 0).transpose(-1, 2)  # [batch_size, 1, post_conv_max_width, 1]
            x = x * mask

        x = x.transpose(1, 2)
        # x: 4D tensor [batch_size, post_conv_width,
        #               num_channels==128, post_conv_height]

        post_conv_width = x.size(1)
        x = x.contiguous().view(batch_size, post_conv_width, -1)
        # x: 3D tensor [batch_size, post_conv_width,
        #               num_channels*post_conv_height]

        # Routine for fetching the last valid output of a dynamic LSTM with varying input lengths and padding
        post_conv_input_lengths = valid_lengths
        packed_seqs = nn.utils.rnn.pack_padded_sequence(
            x, post_conv_input_lengths.tolist(), batch_first=True, enforce_sorted=False
        )  # dynamic rnn sequence padding
        self.recurrence.flatten_parameters()
        _, (ht, _) = self.recurrence(packed_seqs)
        last_output = ht[-1]

        return last_output.to(inputs.device)  # [B, 128]

    @staticmethod
    def calculate_post_conv_height(height, kernel_size, stride, pad, n_convs):
        """Height of spec after n convolutions with fixed kernel/stride/pad."""
        for _ in range(n_convs):
            height = (height - kernel_size + 2 * pad) // stride + 1
        return height


class TextSummary(nn.Module):
    def __init__(self, embedding_dim, encoder_output_dim):
        super().__init__()
        self.lstm = nn.LSTM(
            encoder_output_dim,  # text embedding dimension from the text encoder
            embedding_dim,  # fixed length output summary the lstm creates from the input
            batch_first=True,
            bidirectional=False,
        )

    def forward(self, inputs, input_lengths):
        # Routine for fetching the last valid output of a dynamic LSTM with varying input lengths and padding
        packed_seqs = nn.utils.rnn.pack_padded_sequence(
            inputs, input_lengths.tolist(), batch_first=True, enforce_sorted=False
        )  # dynamic rnn sequence padding
        self.lstm.flatten_parameters()
        _, (ht, _) = self.lstm(packed_seqs)
        last_output = ht[-1]
        return last_output


class PostEncoderMLP(nn.Module):
    def __init__(self, input_size, hidden_size):
        super().__init__()
        self.hidden_size = hidden_size
        modules = [
            nn.Linear(input_size, hidden_size),  # Hidden Layer
            nn.Tanh(),
            nn.Linear(hidden_size, hidden_size * 2),
        ]  # Output layer twice the size for mean and variance
        self.net = nn.Sequential(*modules)
        self.softplus = nn.Softplus()

    def forward(self, _input):
        mlp_output = self.net(_input)
        # The mean parameter is unconstrained
        mu = mlp_output[:, : self.hidden_size]
        # The standard deviation must be positive. Parameterise with a softplus
        sigma = self.softplus(mlp_output[:, self.hidden_size :])
        return mu, sigma


================================================
FILE: TTS/tts/layers/tacotron/common_layers.py
================================================
import torch
from torch import nn
from torch.nn import functional as F


class Linear(nn.Module):
    """Linear layer with a specific initialization.

    Args:
        in_features (int): number of channels in the input tensor.
        out_features (int): number of channels in the output tensor.
        bias (bool, optional): enable/disable bias in the layer. Defaults to True.
        init_gain (str, optional): method to compute the gain in the weight initializtion based on the nonlinear activation used afterwards. Defaults to 'linear'.
    """

    def __init__(self, in_features, out_features, bias=True, init_gain="linear"):
        super().__init__()
        self.linear_layer = torch.nn.Linear(in_features, out_features, bias=bias)
        self._init_w(init_gain)

    def _init_w(self, init_gain):
        torch.nn.init.xavier_uniform_(self.linear_layer.weight, gain=torch.nn.init.calculate_gain(init_gain))

    def forward(self, x):
        return self.linear_layer(x)


class LinearBN(nn.Module):
    """Linear layer with Batch Normalization.

    x -> linear -> BN -> o

    Args:
        in_features (int): number of channels in the input tensor.
        out_features (int ): number of channels in the output tensor.
        bias (bool, optional): enable/disable bias in the linear layer. Defaults to True.
        init_gain (str, optional): method to set the gain for weight initialization. Defaults to 'linear'.
    """

    def __init__(self, in_features, out_features, bias=True, init_gain="linear"):
        super().__init__()
        self.linear_layer = torch.nn.Linear(in_features, out_features, bias=bias)
        self.batch_normalization = nn.BatchNorm1d(out_features, momentum=0.1, eps=1e-5)
        self._init_w(init_gain)

    def _init_w(self, init_gain):
        torch.nn.init.xavier_uniform_(self.linear_layer.weight, gain=torch.nn.init.calculate_gain(init_gain))

    def forward(self, x):
        """
        Shapes:
            x: [T, B, C] or [B, C]
        """
        out = self.linear_layer(x)
        if len(out.shape) == 3:
            out = out.permute(1, 2, 0)
        out = self.batch_normalization(out)
        if len(out.shape) == 3:
            out = out.permute(2, 0, 1)
        return out


class Prenet(nn.Module):
    """Tacotron specific Prenet with an optional Batch Normalization.

    Note:
        Prenet with BN improves the model performance significantly especially
    if it is enabled after learning a diagonal attention alignment with the original
    prenet. However, if the target dataset is high quality then it also works from
    the start. It is also suggested to disable dropout if BN is in use.

        prenet_type == "original"
            x -> [linear -> ReLU -> Dropout]xN -> o

        prenet_type == "bn"
            x -> [linear -> BN -> ReLU -> Dropout]xN -> o

    Args:
        in_features (int): number of channels in the input tensor and the inner layers.
        prenet_type (str, optional): prenet type "original" or "bn". Defaults to "original".
        prenet_dropout (bool, optional): dropout rate. Defaults to True.
        dropout_at_inference (bool, optional): use dropout at inference. It leads to a better quality for some models.
        out_features (list, optional): List of output channels for each prenet block.
            It also defines number of the prenet blocks based on the length of argument list.
            Defaults to [256, 256].
        bias (bool, optional): enable/disable bias in prenet linear layers. Defaults to True.
    """

    # pylint: disable=dangerous-default-value
    def __init__(
        self,
        in_features,
        prenet_type="original",
        prenet_dropout=True,
        dropout_at_inference=False,
        out_features=[256, 256],
        bias=True,
    ):
        super().__init__()
        self.prenet_type = prenet_type
        self.prenet_dropout = prenet_dropout
        self.dropout_at_inference = dropout_at_inference
        in_features = [in_features] + out_features[:-1]
        if prenet_type == "bn":
            self.linear_layers = nn.ModuleList(
                [LinearBN(in_size, out_size, bias=bias) for (in_size, out_size) in zip(in_features, out_features)]
            )
        elif prenet_type == "original":
            self.linear_layers = nn.ModuleList(
                [Linear(in_size, out_size, bias=bias) for (in_size, out_size) in zip(in_features, out_features)]
            )

    def forward(self, x):
        for linear in self.linear_layers:
            if self.prenet_dropout:
                x = F.dropout(F.relu(linear(x)), p=0.5, training=self.training or self.dropout_at_inference)
            else:
                x = F.relu(linear(x))
        return x


================================================
FILE: TTS/tts/layers/tacotron/gst_layers.py
================================================
import torch
import torch.nn.functional as F
from torch import nn


class GST(nn.Module):
    """Global Style Token Module for factorizing prosody in speech.

    See https://arxiv.org/pdf/1803.09017"""

    def __init__(self, num_mel, num_heads, num_style_tokens, gst_embedding_dim, embedded_speaker_dim=None):
        super().__init__()
        self.encoder = ReferenceEncoder(num_mel, gst_embedding_dim)
        self.style_token_layer = StyleTokenLayer(num_heads, num_style_tokens, gst_embedding_dim, embedded_speaker_dim)

    def forward(self, inputs, speaker_embedding=None):
        enc_out = self.encoder(inputs)
        # concat speaker_embedding
        if speaker_embedding is not None:
            enc_out = torch.cat([enc_out, speaker_embedding], dim=-1)
        style_embed = self.style_token_layer(enc_out)

        return style_embed


class ReferenceEncoder(nn.Module):
    """NN module creating a fixed size prosody embedding from a spectrogram.

    inputs: mel spectrograms [batch_size, num_spec_frames, num_mel]
    outputs: [batch_size, embedding_dim]
    """

    def __init__(self, num_mel, embedding_dim):
        super().__init__()
        self.num_mel = num_mel
        filters = [1] + [32, 32, 64, 64, 128, 128]
        num_layers = len(filters) - 1
        convs = [
            nn.Conv2d(
                in_channels=filters[i], out_channels=filters[i + 1], kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)
            )
            for i in range(num_layers)
        ]
        self.convs = nn.ModuleList(convs)
        self.bns = nn.ModuleList([nn.BatchNorm2d(num_features=filter_size) for filter_size in filters[1:]])

        post_conv_height = self.calculate_post_conv_height(num_mel, 3, 2, 1, num_layers)
        self.recurrence = nn.GRU(
            input_size=filters[-1] * post_conv_height, hidden_size=embedding_dim // 2, batch_first=True
        )

    def forward(self, inputs):
        batch_size = inputs.size(0)
        x = inputs.view(batch_size, 1, -1, self.num_mel)
        # x: 4D tensor [batch_size, num_channels==1, num_frames, num_mel]
        for conv, bn in zip(self.convs, self.bns):
            x = conv(x)
            x = bn(x)
            x = F.relu(x)

        x = x.transpose(1, 2)
        # x: 4D tensor [batch_size, post_conv_width,
        #               num_channels==128, post_conv_height]
        post_conv_width = x.size(1)
        x = x.contiguous().view(batch_size, post_conv_width, -1)
        # x: 3D tensor [batch_size, post_conv_width,
        #               num_channels*post_conv_height]
        self.recurrence.flatten_parameters()
        _, out = self.recurrence(x)
        # out: 3D tensor [seq_len==1, batch_size, encoding_size=128]

        return out.squeeze(0)

    @staticmethod
    def calculate_post_conv_height(height, kernel_size, stride, pad, n_convs):
        """Height of spec after n convolutions with fixed kernel/stride/pad."""
        for _ in range(n_convs):
            height = (height - kernel_size + 2 * pad) // stride + 1
        return height


class StyleTokenLayer(nn.Module):
    """NN Module attending to style tokens based on prosody encodings."""

    def __init__(self, num_heads, num_style_tokens, gst_embedding_dim, d_vector_dim=None):
        super().__init__()

        self.query_dim = gst_embedding_dim // 2

        if d_vector_dim:
            self.query_dim += d_vector_dim

        self.key_dim = gst_embedding_dim // num_heads
        self.style_tokens = nn.Parameter(torch.FloatTensor(num_style_tokens, self.key_dim))
        nn.init.normal_(self.style_tokens, mean=0, std=0.5)
        self.attention = MultiHeadAttention(
            query_dim=self.query_dim, key_dim=self.key_dim, num_units=gst_embedding_dim, num_heads=num_heads
        )

    def forward(self, inputs):
        batch_size = inputs.size(0)
        prosody_encoding = inputs.unsqueeze(1)
        # prosody_encoding: 3D tensor [batch_size, 1, encoding_size==128]
        tokens = torch.tanh(self.style_tokens).unsqueeze(0).expand(batch_size, -1, -1)
        # tokens: 3D tensor [batch_size, num tokens, token embedding size]
        style_embed = self.attention(prosody_encoding, tokens)

        return style_embed


class MultiHeadAttention(nn.Module):
    """
    input:
        query --- [N, T_q, query_dim]
        key --- [N, T_k, key_dim]
    output:
        out --- [N, T_q, num_units]
    """

    def __init__(self, query_dim, key_dim, num_units, num_heads):
        super().__init__()
        self.num_units = num_units
        self.num_heads = num_heads
        self.key_dim = key_dim

        self.W_query = nn.Linear(in_features=query_dim, out_features=num_units, bias=False)
        self.W_key = nn.Linear(in_features=key_dim, out_features=num_units, bias=False)
        self.W_value = nn.Linear(in_features=key_dim, out_features=num_units, bias=False)

    def forward(self, query, key):
        queries = self.W_query(query)  # [N, T_q, num_units]
        keys = self.W_key(key)  # [N, T_k, num_units]
        values = self.W_value(key)

        split_size = self.num_units // self.num_heads
        queries = torch.stack(torch.split(queries, split_size, dim=2), dim=0)  # [h, N, T_q, num_units/h]
        keys = torch.stack(torch.split(keys, split_size, dim=2), dim=0)  # [h, N, T_k, num_units/h]
        values = torch.stack(torch.split(values, split_size, dim=2), dim=0)  # [h, N, T_k, num_units/h]

        # score = softmax(QK^T / (d_k**0.5))
        scores = torch.matmul(queries, keys.transpose(2, 3))  # [h, N, T_q, T_k]
        scores = scores / (self.key_dim**0.5)
        scores = F.softmax(scores, dim=3)

        # out = score * V
        out = torch.matmul(scores, values)  # [h, N, T_q, num_units/h]
        out = torch.cat(torch.split(out, 1, dim=0), dim=3).squeeze(0)  # [N, T_q, num_units]

        return out


================================================
FILE: TTS/tts/layers/tacotron/tacotron.py
================================================
# coding: utf-8
# adapted from https://github.com/r9y9/tacotron_pytorch

import torch
from torch import nn

from .attentions import init_attn
from .common_layers import Prenet


class BatchNormConv1d(nn.Module):
    r"""A wrapper for Conv1d with BatchNorm. It sets the activation
    function between Conv and BatchNorm layers. BatchNorm layer
    is initialized with the TF default values for momentum and eps.

    Args:
        in_channels: size of each input sample
        out_channels: size of each output samples
        kernel_size: kernel size of conv filters
        stride: stride of conv filters
        padding: padding of conv filters
        activation: activation function set b/w Conv1d and BatchNorm

    Shapes:
        - input: (B, D)
        - output: (B, D)
    """

    def __init__(self, in_channels, out_channels, kernel_size, stride, padding, activation=None):
        super().__init__()
        self.padding = padding
        self.padder = nn.ConstantPad1d(padding, 0)
        self.conv1d = nn.Conv1d(
            in_channels, out_channels, kernel_size=kernel_size, stride=stride, padding=0, bias=False
        )
        # Following tensorflow's default parameters
        self.bn = nn.BatchNorm1d(out_channels, momentum=0.99, eps=1e-3)
        self.activation = activation
        # self.init_layers()

    def init_layers(self):
        if isinstance(self.activation, torch.nn.ReLU):
            w_gain = "relu"
        elif isinstance(self.activation, torch.nn.Tanh):
            w_gain = "tanh"
        elif self.activation is None:
            w_gain = "linear"
        else:
            raise RuntimeError("Unknown activation function")
        torch.nn.init.xavier_uniform_(self.conv1d.weight, gain=torch.nn.init.calculate_gain(w_gain))

    def forward(self, x):
        x = self.padder(x)
        x = self.conv1d(x)
        x = self.bn(x)
        if self.activation is not None:
            x = self.activation(x)
        return x


class Highway(nn.Module):
    r"""Highway layers as explained in https://arxiv.org/abs/1505.00387

    Args:
        in_features (int): size of each input sample
        out_feature (int): size of each output sample

    Shapes:
        - input: (B, *, H_in)
        - output: (B, *, H_out)
    """

    # TODO: Try GLU layer
    def __init__(self, in_features, out_feature):
        super().__init__()
        self.H = nn.Linear(in_features, out_feature)
        self.H.bias.data.zero_()
        self.T = nn.Linear(in_features, out_feature)
        self.T.bias.data.fill_(-1)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        # self.init_layers()

    def init_layers(self):
        torch.nn.init.xavier_uniform_(self.H.weight, gain=torch.nn.init.calculate_gain("relu"))
        torch.nn.init.xavier_uniform_(self.T.weight, gain=torch.nn.init.calculate_gain("sigmoid"))

    def forward(self, inputs):
        H = self.relu(self.H(inputs))
        T = self.sigmoid(self.T(inputs))
        return H * T + inputs * (1.0 - T)


class CBHG(nn.Module):
    """CBHG module: a recurrent neural network composed of:
    - 1-d convolution banks
    - Highway networks + residual connections
    - Bidirectional gated recurrent units

    Args:
        in_features (int): sample size
        K (int): max filter size in conv bank
        projections (list): conv channel sizes for conv projections
        num_highways (int): number of highways layers

    Shapes:
        - input: (B, C, T_in)
        - output: (B, T_in, C*2)
    """

    # pylint: disable=dangerous-default-value
    def __init__(
        self,
        in_features,
        K=16,
        conv_bank_features=128,
        conv_projections=[128, 128],
        highway_features=128,
        gru_features=128,
        num_highways=4,
    ):
        super().__init__()
        self.in_features = in_features
        self.conv_bank_features = conv_bank_features
        self.highway_features = highway_features
        self.gru_features = gru_features
        self.conv_projections = conv_projections
        self.relu = nn.ReLU()
        # list of conv1d bank with filter size k=1...K
        # TODO: try dilational layers instead
        self.conv1d_banks = nn.ModuleList(
            [
                BatchNormConv1d(
                    in_features,
                    conv_bank_features,
                    kernel_size=k,
                    stride=1,
                    padding=[(k - 1) // 2, k // 2],
                    activation=self.relu,
                )
                for k in range(1, K + 1)
            ]
        )
        # max pooling of conv bank, with padding
        # TODO: try average pooling OR larger kernel size
        out_features = [K * conv_bank_features] + conv_projections[:-1]
        activations = [self.relu] * (len(conv_projections) - 1)
        activations += [None]
        # setup conv1d projection layers
        layer_set = []
        for in_size, out_size, ac in zip(out_features, conv_projections, activations):
            layer = BatchNormConv1d(in_size, out_size, kernel_size=3, stride=1, padding=[1, 1], activation=ac)
            layer_set.append(layer)
        self.conv1d_projections = nn.ModuleList(layer_set)
        # setup Highway layers
        if self.highway_features != conv_projections[-1]:
            self.pre_highway = nn.Linear(conv_projections[-1], highway_features, bias=False)
        self.highways = nn.ModuleList([Highway(highway_features, highway_features) for _ in range(num_highways)])
        # bi-directional GPU layer
        self.gru = nn.GRU(gru_features, gru_features, 1, batch_first=True, bidirectional=True)

    def forward(self, inputs):
        # (B, in_features, T_in)
        x = inputs
        # (B, hid_features*K, T_in)
        # Concat conv1d bank outputs
        outs = []
        for conv1d in self.conv1d_banks:
            out = conv1d(x)
            outs.append(out)
        x = torch.cat(outs, dim=1)
        assert x.size(1) == self.conv_bank_features * len(self.conv1d_banks)
        for conv1d in self.conv1d_projections:
            x = conv1d(x)
        x += inputs
        x = x.transpose(1, 2)
        if self.highway_features != self.conv_projections[-1]:
            x = self.pre_highway(x)
        # Residual connection
        # TODO: try residual scaling as in Deep Voice 3
        # TODO: try plain residual layers
        for highway in self.highways:
            x = highway(x)
        # (B, T_in, hid_features*2)
        # TODO: replace GRU with convolution as in Deep Voice 3
        self.gru.flatten_parameters()
        outputs, _ = self.gru(x)
        return outputs


class EncoderCBHG(nn.Module):
    r"""CBHG module with Encoder specific arguments"""

    def __init__(self):
        super().__init__()
        self.cbhg = CBHG(
            128,
            K=16,
            conv_bank_features=128,
            conv_projections=[128, 128],
            highway_features=128,
            gru_features=128,
            num_highways=4,
        )

    def forward(self, x):
        return self.cbhg(x)


class Encoder(nn.Module):
    r"""Stack Prenet and CBHG module for encoder
    Args:
        inputs (FloatTensor): embedding features

    Shapes:
        - inputs: (B, T, D_in)
        - outputs: (B, T, 128 * 2)
    """

    def __init__(self, in_features):
        super().__init__()
        self.prenet = Prenet(in_features, out_features=[256, 128])
        self.cbhg = EncoderCBHG()

    def forward(self, inputs):
        # B x T x prenet_dim
        outputs = self.prenet(inputs)
        outputs = self.cbhg(outputs.transpose(1, 2))
        return outputs


class PostCBHG(nn.Module):
    def __init__(self, mel_dim):
        super().__init__()
        self.cbhg = CBHG(
            mel_dim,
            K=8,
            conv_bank_features=128,
            conv_projections=[256, mel_dim],
            highway_features=128,
            gru_features=128,
            num_highways=4,
        )

    def forward(self, x):
        return self.cbhg(x)


class Decoder(nn.Module):
    """Tacotron decoder.

    Args:
        in_channels (int): number of input channels.
        frame_channels (int): number of feature frame channels.
        r (int): number of outputs per time step (reduction rate).
        memory_size (int): size of the past window. if <= 0 memory_size = r
        attn_type (string): type of attention used in decoder.
        attn_windowing (bool): if true, define an attention window centered to maximum
            attention response. It provides more robust attention alignment especially
            at interence time.
        attn_norm (string): attention normalization function. 'sigmoid' or 'softmax'.
        prenet_type (string): 'original' or 'bn'.
        prenet_dropout (float): prenet dropout rate.
        forward_attn (bool): if true, use forward attention method. https://arxiv.org/abs/1807.06736
        trans_agent (bool): if true, use transition agent. https://arxiv.org/abs/1807.06736
        forward_attn_mask (bool): if true, mask attention values smaller than a threshold.
        location_attn (bool): if true, use location sensitive attention.
        attn_K (int): number of attention heads for GravesAttention.
        separate_stopnet (bool): if true, detach stopnet input to prevent gradient flow.
        d_vector_dim (int): size of speaker embedding vector, for multi-speaker training.
        max_decoder_steps (int): Maximum number of steps allowed for the decoder. Defaults to 500.
    """

    # Pylint gets confused by PyTorch conventions here
    # pylint: disable=attribute-defined-outside-init

    def __init__(
        self,
        in_channels,
        frame_channels,
        r,
        memory_size,
        attn_type,
        attn_windowing,
        attn_norm,
        prenet_type,
        prenet_dropout,
        forward_attn,
        trans_agent,
        forward_attn_mask,
        location_attn,
        attn_K,
        separate_stopnet,
        max_decoder_steps,
    ):
        super().__init__()
        self.r_init = r
        self.r = r
        self.in_channels = in_channels
        self.max_decoder_steps = max_decoder_steps
        self.use_memory_queue = memory_size > 0
        self.memory_size = memory_size if memory_size > 0 else r
        self.frame_channels = frame_channels
        self.separate_stopnet = separate_stopnet
        self.query_dim = 256
        # memory -> |Prenet| -> processed_memory
        prenet_dim = frame_channels * self.memory_size if self.use_memory_queue else frame_channels
        self.prenet = Prenet(prenet_dim, prenet_type, prenet_dropout, out_features=[256, 128])
        # processed_inputs, processed_memory -> |Attention| -> Attention, attention, RNN_State
        # attention_rnn generates queries for the attention mechanism
        self.attention_rnn = nn.GRUCell(in_channels + 128, self.query_dim)
        self.attention = init_attn(
            attn_type=attn_type,
            query_dim=self.query_dim,
            embedding_dim=in_channels,
            attention_dim=128,
            location_attention=location_attn,
            attention_location_n_filters=32,
            attention_location_kernel_size=31,
            windowing=attn_windowing,
            norm=attn_norm,
            forward_attn=forward_attn,
            trans_agent=trans_agent,
            forward_attn_mask=forward_attn_mask,
            attn_K=attn_K,
        )
        # (processed_memory | attention context) -> |Linear| -> decoder_RNN_input
        self.project_to_decoder_in = nn.Linear(256 + in_channels, 256)
        # decoder_RNN_input -> |RNN| -> RNN_state
        self.decoder_rnns = nn.ModuleList([nn.GRUCell(256, 256) for _ in range(2)])
        # RNN_state -> |Linear| -> mel_spec
        self.proj_to_mel = nn.Linear(256, frame_channels * self.r_init)
        # learn init values instead of zero init.
        self.stopnet = StopNet(256 + frame_channels * self.r_init)

    def set_r(self, new_r):
        self.r = new_r

    def _reshape_memory(self, memory):
        """
        Reshape the spectrograms for given 'r'
        """
        # Grouping multiple frames if necessary
        if memory.size(-1) == self.frame_channels:
            memory = memory.view(memory.shape[0], memory.size(1) // self.r, -1)
        # Time first (T_decoder, B, frame_channels)
        memory = memory.transpose(0, 1)
        return memory

    def _init_states(self, inputs):
        """
        Initialization of decoder states
        """
        B = inputs.size(0)
        # go frame as zeros matrix
        if self.use_memory_queue:
            self.memory_input = torch.zeros(1, device=inputs.device).repeat(B, self.frame_channels * self.memory_size)
        else:
            self.memory_input = torch.zeros(1, device=inputs.device).repeat(B, self.frame_channels)
        # decoder states
        self.attention_rnn_hidden = torch.zeros(1, device=inputs.device).repeat(B, 256)
        self.decoder_rnn_hiddens = [
            torch.zeros(1, device=inputs.device).repeat(B, 256) for idx in range(len(self.decoder_rnns))
        ]
        self.context_vec = inputs.data.new(B, self.in_channels).zero_()
        # cache attention inputs
        self.processed_inputs = self.attention.preprocess_inputs(inputs)

    def _parse_outputs(self, outputs, attentions, stop_tokens):
        # Back to batch first
        attentions = torch.stack(attentions).transpose(0, 1)
        stop_tokens = torch.stack(stop_tokens).transpose(0, 1)
        outputs = torch.stack(outputs).transpose(0, 1).contiguous()
        outputs = outputs.view(outputs.size(0), -1, self.frame_channels)
        outputs = outputs.transpose(1, 2)
        return outputs, attentions, stop_tokens

    def decode(self, inputs, mask=None):
        # Prenet
        processed_memory = self.prenet(self.memory_input)
        # Attention RNN
        self.attention_rnn_hidden = self.attention_rnn(
            torch.cat((processed_memory, self.context_vec), -1), self.attention_rnn_hidden
        )
        self.context_vec = self.attention(self.attention_rnn_hidden, inputs, self.processed_inputs, mask)
        # Concat RNN output and attention context vector
        decoder_input = self.project_to_decoder_in(torch.cat((self.attention_rnn_hidden, self.context_vec), -1))

        # Pass through the decoder RNNs
        for idx, decoder_rnn in enumerate(self.decoder_rnns):
            self.decoder_rnn_hiddens[idx] = decoder_rnn(decoder_input, self.decoder_rnn_hiddens[idx])
            # Residual connection
            decoder_input = self.decoder_rnn_hiddens[idx] + decoder_input
        decoder_output = decoder_input

        # predict mel vectors from decoder vectors
        output = self.proj_to_mel(decoder_output)
        # output = torch.sigmoid(output)
        # predict stop token
        stopnet_input = torch.cat([decoder_output, output], -1)
        if self.separate_stopnet:
            stop_token = self.stopnet(stopnet_input.detach())
        else:
            stop_token = self.stopnet(stopnet_input)
        output = output[:, : self.r * self.frame_channels]
        return output, stop_token, self.attention.attention_weights

    def _update_memory_input(self, new_memory):
        if self.use_memory_queue:
            if self.memory_size > self.r:
                # memory queue size is larger than number of frames per decoder iter
                self.memory_input = torch.cat(
                    [new_memory, self.memory_input[:, : (self.memory_size - self.r) * self.frame_channels].clone()],
                    dim=-1,
                )
            else:
                # memory queue size smaller than number of frames per decoder iter
                self.memory_input = new_memory[:, : self.memory_size * self.frame_channels]
        else:
            # use only the last frame prediction
            # assert new_memory.shape[-1] == self.r * self.frame_channels
            self.memory_input = new_memory[:, self.frame_channels * (self.r - 1) :]

    def forward(self, inputs, memory, mask):
        """
        Args:
            inputs: Encoder outputs.
            memory: Decoder memory (autoregression. If None (at eval-time),
              decoder outputs are used as decoder inputs. If None, it uses the last
              output as the input.
            mask: Attention mask for sequence padding.

        Shapes:
            - inputs: (B, T, D_out_enc)
            - memory: (B, T_mel, D_mel)
        """
        # Run greedy decoding if memory is None
        memory = self._reshape_memory(memory)
        outputs = []
        attentions = []
        stop_tokens = []
        t = 0
        self._init_states(inputs)
        self.attention.init_states(inputs)
        while len(outputs) < memory.size(0):
            if t > 0:
                new_memory = memory[t - 1]
                self._update_memory_input(new_memory)

            output, stop_token, attention = self.decode(inputs, mask)
            outputs += [output]
            attentions += [attention]
            stop_tokens += [stop_token.squeeze(1)]
            t += 1
        return self._parse_outputs(outputs, attentions, stop_tokens)

    def inference(self, inputs):
        """
        Args:
            inputs: encoder outputs.
        Shapes:
            - inputs: batch x time x encoder_out_dim
        """
        outputs = []
        attentions = []
        stop_tokens = []
        t = 0
        self._init_states(inputs)
        self.attention.init_states(inputs)
        while True:
            if t > 0:
                new_memory = outputs[-1]
                self._update_memory_input(new_memory)
            output, stop_token, attention = self.decode(inputs, None)
            stop_token = torch.sigmoid(stop_token.data)
            outputs += [output]
            attentions += [attention]
            stop_tokens += [stop_token]
            t += 1
            if t > inputs.shape[1] / 4 and (stop_token > 0.6 or attention[:, -1].item() > 0.6):
                break
            if t > self.max_decoder_steps:
                print("   | > Decoder stopped with 'max_decoder_steps")
                break
        return self._parse_outputs(outputs, attentions, stop_tokens)


class StopNet(nn.Module):
    r"""Stopnet signalling decoder to stop inference.
    Args:
        in_features (int): feature dimension of input.
    """

    def __init__(self, in_features):
        super().__init__()
        self.dropout = nn.Dropout(0.1)
        self.linear = nn.Linear(in_features, 1)
        torch.nn.init.xavier_uniform_(self.linear.weight, gain=torch.nn.init.calculate_gain("linear"))

    def forward(self, inputs):
        outputs = self.dropout(inputs)
        outputs = self.linear(outputs)
        return outputs


================================================
FILE: TTS/tts/layers/tacotron/tacotron2.py
================================================
import torch
from torch import nn
from torch.nn import functional as F

from .attentions import init_attn
from .common_layers import Linear, Prenet


# pylint: disable=no-value-for-parameter
# pylint: disable=unexpected-keyword-arg
class ConvBNBlock(nn.Module):
    r"""Convolutions with Batch Normalization and non-linear activation.

    Args:
        in_channels (int): number of input channels.
        out_channels (int): number of output channels.
        kernel_size (int): convolution kernel size.
        activation (str): 'relu', 'tanh', None (linear).

    Shapes:
        - input: (B, C_in, T)
        - output: (B, C_out, T)
    """

    def __init__(self, in_channels, out_channels, kernel_size, activation=None):
        super().__init__()
        assert (kernel_size - 1) % 2 == 0
        padding = (kernel_size - 1) // 2
        self.convolution1d = nn.Conv1d(in_channels, out_channels, kernel_size, padding=padding)
        self.batch_normalization = nn.BatchNorm1d(out_channels, momentum=0.1, eps=1e-5)
        self.dropout = nn.Dropout(p=0.5)
        if activation == "relu":
            self.activation = nn.ReLU()
        elif activation == "tanh":
            self.activation = nn.Tanh()
        else:
            self.activation = nn.Identity()

    def forward(self, x):
        o = self.convolution1d(x)
        o = self.batch_normalization(o)
        o = self.activation(o)
        o = self.dropout(o)
        return o


class Postnet(nn.Module):
    r"""Tacotron2 Postnet

    Args:
        in_out_channels (int): number of output channels.

    Shapes:
        - input: (B, C_in, T)
        - output: (B, C_in, T)
    """

    def __init__(self, in_out_channels, num_convs=5):
        super().__init__()
        self.convolutions = nn.ModuleList()
        self.convolutions.append(ConvBNBlock(in_out_channels, 512, kernel_size=5, activation="tanh"))
        for _ in range(1, num_convs - 1):
            self.convolutions.append(ConvBNBlock(512, 512, kernel_size=5, activation="tanh"))
        self.convolutions.append(ConvBNBlock(512, in_out_channels, kernel_size=5, activation=None))

    def forward(self, x):
        o = x
        for layer in self.convolutions:
            o = layer(o)
        return o


class Encoder(nn.Module):
    r"""Tacotron2 Encoder

    Args:
        in_out_channels (int): number of input and output channels.

    Shapes:
        - input: (B, C_in, T)
        - output: (B, C_in, T)
    """

    def __init__(self, in_out_channels=512):
        super().__init__()
        self.convolutions = nn.ModuleList()
        for _ in range(3):
            self.convolutions.append(ConvBNBlock(in_out_channels, in_out_channels, 5, "relu"))
        self.lstm = nn.LSTM(
            in_out_channels, int(in_out_channels / 2), num_layers=1, batch_first=True, bias=True, bidirectional=True
        )
        self.rnn_state = None

    def forward(self, x, input_lengths):
        o = x
        for layer in self.convolutions:
            o = layer(o)
        o = o.transpose(1, 2)
        o = nn.utils.rnn.pack_padded_sequence(o, input_lengths.cpu(), batch_first=True)
        self.lstm.flatten_parameters()
        o, _ = self.lstm(o)
        o, _ = nn.utils.rnn.pad_packed_sequence(o, batch_first=True)
        return o

    def inference(self, x):
        o = x
        for layer in self.convolutions:
            o = layer(o)
        o = o.transpose(1, 2)
        # self.lstm.flatten_parameters()
        o, _ = self.lstm(o)
        return o


# adapted from https://github.com/NVIDIA/tacotron2/
class Decoder(nn.Module):
    """Tacotron2 decoder. We don't use Zoneout but Dropout between RNN layers.

    Args:
        in_channels (int): number of input channels.
        frame_channels (int): number of feature frame channels.
        r (int): number of outputs per time step (reduction rate).
        memory_size (int): size of the past window. if <= 0 memory_size = r
        attn_type (string): type of attention used in decoder.
        attn_win (bool): if true, define an attention window centered to maximum
            attention response. It provides more robust attention alignment especially
            at interence time.
        attn_norm (string): attention normalization function. 'sigmoid' or 'softmax'.
        prenet_type (string): 'original' or 'bn'.
        prenet_dropout (float): prenet dropout rate.
        forward_attn (bool): if true, use forward attention method. https://arxiv.org/abs/1807.06736
        trans_agent (bool): if true, use transition agent. https://arxiv.org/abs/1807.06736
        forward_attn_mask (bool): if true, mask attention values smaller than a threshold.
        location_attn (bool): if true, use location sensitive attention.
        attn_K (int): number of attention heads for GravesAttention.
        separate_stopnet (bool): if true, detach stopnet input to prevent gradient flow.
        max_decoder_steps (int): Maximum number of steps allowed for the decoder. Defaults to 10000.
    """

    # Pylint gets confused by PyTorch conventions here
    # pylint: disable=attribute-defined-outside-init
    def __init__(
        self,
        in_channels,
        frame_channels,
        r,
        attn_type,
        attn_win,
        attn_norm,
        prenet_type,
        prenet_dropout,
        forward_attn,
        trans_agent,
        forward_attn_mask,
        location_attn,
        attn_K,
        separate_stopnet,
        max_decoder_steps,
    ):
        super().__init__()
        self.frame_channels = frame_channels
        self.r_init = r
        self.r = r
        self.encoder_embedding_dim = in_channels
        self.separate_stopnet = separate_stopnet
        self.max_decoder_steps = max_decoder_steps
        self.stop_threshold = 0.5

        # model dimensions
        self.query_dim = 1024
        self.decoder_rnn_dim = 1024
        self.prenet_dim = 256
        self.attn_dim = 128
        self.p_attention_dropout = 0.1
        self.p_decoder_dropout = 0.1

        # memory -> |Prenet| -> processed_memory
        prenet_dim = self.frame_channels
        self.prenet = Prenet(
            prenet_dim, prenet_type, prenet_dropout, out_features=[self.prenet_dim, self.prenet_dim], bias=False
        )

        self.attention_rnn = nn.LSTMCell(self.prenet_dim + in_channels, self.query_dim, bias=True)

        self.attention = init_attn(
            attn_type=attn_type,
            query_dim=self.query_dim,
            embedding_dim=in_channels,
            attention_dim=128,
            location_attention=location_attn,
            attention_location_n_filters=32,
            attention_location_kernel_size=31,
            windowing=attn_win,
            norm=attn_norm,
            forward_attn=forward_attn,
            trans_agent=trans_agent,
            forward_attn_mask=forward_attn_mask,
            attn_K=attn_K,
        )

        self.decoder_rnn = nn.LSTMCell(self.query_dim + in_channels, self.decoder_rnn_dim, bias=True)

        self.linear_projection = Linear(self.decoder_rnn_dim + in_channels, self.frame_channels * self.r_init)

        self.stopnet = nn.Sequential(
            nn.Dropout(0.1),
            Linear(self.decoder_rnn_dim + self.frame_channels * self.r_init, 1, bias=True, init_gain="sigmoid"),
        )
        self.memory_truncated = None

    def set_r(self, new_r):
        self.r = new_r

    def get_go_frame(self, inputs):
        B = inputs.size(0)
        memory = torch.zeros(1, device=inputs.device).repeat(B, self.frame_channels * self.r)
        return memory

    def _init_states(self, inputs, mask, keep_states=False):
        B = inputs.size(0)
        # T = inputs.size(1)
        if not keep_states:
            self.query = torch.zeros(1, device=inputs.device).repeat(B, self.query_dim)
            self.attention_rnn_cell_state = torch.zeros(1, device=inputs.device).repeat(B, self.query_dim)
            self.decoder_hidden = torch.zeros(1, device=inputs.device).repeat(B, self.decoder_rnn_dim)
            self.decoder_cell = torch.zeros(1, device=inputs.device).repeat(B, self.decoder_rnn_dim)
            self.context = torch.zeros(1, device=inputs.device).repeat(B, self.encoder_embedding_dim)
        self.inputs = inputs
        self.processed_inputs = self.attention.preprocess_inputs(inputs)
        self.mask = mask

    def _reshape_memory(self, memory):
        """
        Reshape the spectrograms for given 'r'
        """
        # Grouping multiple frames if necessary
        if memory.size(-1) == self.frame_channels:
            memory = memory.view(memory.shape[0], memory.size(1) // self.r, -1)
        # Time first (T_decoder, B, frame_channels)
        memory = memory.transpose(0, 1)
        return memory

    def _parse_outputs(self, outputs, stop_tokens, alignments):
        alignments = torch.stack(alignments).transpose(0, 1)
        stop_tokens = torch.stack(stop_tokens).transpose(0, 1)
        outputs = torch.stack(outputs).transpose(0, 1).contiguous()
        outputs = outputs.view(outputs.size(0), -1, self.frame_channels)
        outputs = outputs.transpose(1, 2)
        return outputs, stop_tokens, alignments

    def _update_memory(self, memory):
        if len(memory.shape) == 2:
            return memory[:, self.frame_channels * (self.r - 1) :]
        return memory[:, :, self.frame_channels * (self.r - 1) :]

    def decode(self, memory):
        """
        shapes:
           - memory: B x r * self.frame_channels
        """
        # self.context: B x D_en
        # query_input: B x D_en + (r * self.frame_channels)
        query_input = torch.cat((memory, self.context), -1)
        # self.query and self.attention_rnn_cell_state : B x D_attn_rnn
        self.query, self.attention_rnn_cell_state = self.attention_rnn(
            query_input, (self.query, self.attention_rnn_cell_state)
        )
        self.query = F.dropout(self.query, self.p_attention_dropout, self.training)
        self.attention_rnn_cell_state = F.dropout(
            self.attention_rnn_cell_state, self.p_attention_dropout, self.training
        )
        # B x D_en
        self.context = self.attention(self.query, self.inputs, self.processed_inputs, self.mask)
        # B x (D_en + D_attn_rnn)
        decoder_rnn_input = torch.cat((self.query, self.context), -1)
        # self.decoder_hidden and self.decoder_cell: B x D_decoder_rnn
        self.decoder_hidden, self.decoder_cell = self.decoder_rnn(
            decoder_rnn_input, (self.decoder_hidden, self.decoder_cell)
        )
        self.decoder_hidden = F.dropout(self.decoder_hidden, self.p_decoder_dropout, self.training)
        # B x (D_decoder_rnn + D_en)
        decoder_hidden_context = torch.cat((self.decoder_hidden, self.context), dim=1)
        # B x (self.r * self.frame_channels)
        decoder_output = self.linear_projection(decoder_hidden_context)
        # B x (D_decoder_rnn + (self.r * self.frame_channels))
        stopnet_input = torch.cat((self.decoder_hidden, decoder_output), dim=1)
        if self.separate_stopnet:
            stop_token = self.stopnet(stopnet_input.detach())
        else:
            stop_token = self.stopnet(stopnet_input)
        # select outputs for the reduction rate self.r
        decoder_output = decoder_output[:, : self.r * self.frame_channels]
        return decoder_output, self.attention.attention_weights, stop_token

    def forward(self, inputs, memories, mask):
        r"""Train Decoder with teacher forcing.
        Args:
            inputs: Encoder outputs.
            memories: Feature frames for teacher-forcing.
            mask: Attention mask for sequence padding.

        Shapes:
            - inputs: (B, T, D_out_enc)
            - memory: (B, T_mel, D_mel)
            - outputs: (B, T_mel, D_mel)
            - alignments: (B, T_in, T_out)
            - stop_tokens: (B, T_out)
        """
        memory = self.get_go_frame(inputs).unsqueeze(0)
        memories = self._reshape_memory(memories)
        memories = torch.cat((memory, memories), dim=0)
        memories = self._update_memory(memories)
        memories = self.prenet(memories)

        self._init_states(inputs, mask=mask)
        self.attention.init_states(inputs)

        outputs, stop_tokens, alignments = [], [], []
        while len(outputs) < memories.size(0) - 1:
            memory = memories[len(outputs)]
            decoder_output, attention_weights, stop_token = self.decode(memory)
            outputs += [decoder_output.squeeze(1)]
            stop_tokens += [stop_token.squeeze(1)]
            alignments += [attention_weights]

        outputs, stop_tokens, alignments = self._parse_outputs(outputs, stop_tokens, alignments)
        return outputs, alignments, stop_tokens

    def inference(self, inputs):
        r"""Decoder inference without teacher forcing and use
        Stopnet to stop decoder.
        Args:
            inputs: Encoder outputs.

        Shapes:
            - inputs: (B, T, D_out_enc)
            - outputs: (B, T_mel, D_mel)
            - alignments: (B, T_in, T_out)
            - stop_tokens: (B, T_out)
        """
        memory = self.get_go_frame(inputs)
        memory = self._update_memory(memory)

        self._init_states(inputs, mask=None)
        self.attention.init_states(inputs)

        outputs, stop_tokens, alignments, t = [], [], [], 0
        while True:
            memory = self.prenet(memory)
            decoder_output, alignment, stop_token = self.decode(memory)
            stop_token = torch.sigmoid(stop_token.data)
            outputs += [decoder_output.squeeze(1)]
            stop_tokens += [stop_token]
            alignments += [alignment]

            if stop_token > self.stop_threshold and t > inputs.shape[0] // 2:
                break
            if len(outputs) == self.max_decoder_steps:
                print(f"   > Decoder stopped with `max_decoder_steps` {self.max_decoder_steps}")
                break

            memory = self._update_memory(decoder_output)
            t += 1

        outputs, stop_tokens, alignments = self._parse_outputs(outputs, stop_tokens, alignments)

        return outputs, alignments, stop_tokens

    def inference_truncated(self, inputs):
        """
        Preserve decoder states for continuous inference
        """
        if self.memory_truncated is None:
            self.memory_truncated = self.get_go_frame(inputs)
            self._init_states(inputs, mask=None, keep_states=False)
        else:
            self._init_states(inputs, mask=None, keep_states=True)

        self.attention.init_states(inputs)
        outputs, stop_tokens, alignments, t = [], [], [], 0
        while True:
            memory = self.prenet(self.memory_truncated)
            decoder_output, alignment, stop_token = self.decode(memory)
            stop_token = torch.sigmoid(stop_token.data)
            outputs += [decoder_output.squeeze(1)]
            stop_tokens += [stop_token]
            alignments += [alignment]

            if stop_token > 0.7:
                break
            if len(outputs) == self.max_decoder_steps:
                print("   | > Decoder stopped with 'max_decoder_steps")
                break

            self.memory_truncated = decoder_output
            t += 1

        outputs, stop_tokens, alignments = self._parse_outputs(outputs, stop_tokens, alignments)

        return outputs, alignments, stop_tokens

    def inference_step(self, inputs, t, memory=None):
        """
        For debug purposes
        """
        if t == 0:
            memory = self.get_go_frame(inputs)
            self._init_states(inputs, mask=None)

        memory = self.prenet(memory)
        decoder_output, stop_token, alignment = self.decode(memory)
        stop_token = torch.sigmoid(stop_token.data)
        memory = decoder_output
        return decoder_output, stop_token, alignment


================================================
FILE: TTS/tts/layers/vits/discriminator.py
================================================
import torch
from torch import nn
from torch.nn.modules.conv import Conv1d

from TTS.vocoder.models.hifigan_discriminator import DiscriminatorP, MultiPeriodDiscriminator


class DiscriminatorS(torch.nn.Module):
    """HiFiGAN Scale Discriminator. Channel sizes are different from the original HiFiGAN.

    Args:
        use_spectral_norm (bool): if `True` swith to spectral norm instead of weight norm.
    """

    def __init__(self, use_spectral_norm=False):
        super().__init__()
        norm_f = nn.utils.spectral_norm if use_spectral_norm else nn.utils.weight_norm
        self.convs = nn.ModuleList(
            [
                norm_f(Conv1d(1, 16, 15, 1, padding=7)),
                norm_f(Conv1d(16, 64, 41, 4, groups=4, padding=20)),
                norm_f(Conv1d(64, 256, 41, 4, groups=16, padding=20)),
                norm_f(Conv1d(256, 1024, 41, 4, groups=64, padding=20)),
                norm_f(Conv1d(1024, 1024, 41, 4, groups=256, padding=20)),
                norm_f(Conv1d(1024, 1024, 5, 1, padding=2)),
            ]
        )
        self.conv_post = norm_f(Conv1d(1024, 1, 3, 1, padding=1))

    def forward(self, x):
        """
        Args:
            x (Tensor): input waveform.

        Returns:
            Tensor: discriminator scores.
            List[Tensor]: list of features from the convolutiona layers.
        """
        feat = []
        for l in self.convs:
            x = l(x)
            x = torch.nn.functional.leaky_relu(x, 0.1)
            feat.append(x)
        x = self.conv_post(x)
        feat.append(x)
        x = torch.flatten(x, 1, -1)
        return x, feat


class VitsDiscriminator(nn.Module):
    """VITS discriminator wrapping one Scale Discriminator and a stack of Period Discriminator.

    ::
        waveform -> ScaleDiscriminator() -> scores_sd, feats_sd --> append() -> scores, feats
               |--> MultiPeriodDiscriminator() -> scores_mpd, feats_mpd ^

    Args:
        use_spectral_norm (bool): if `True` swith to spectral norm instead of weight norm.
    """

    def __init__(self, periods=(2, 3, 5, 7, 11), use_spectral_norm=False):
        super().__init__()
        self.nets = nn.ModuleList()
        self.nets.append(DiscriminatorS(use_spectral_norm=use_spectral_norm))
        self.nets.extend([DiscriminatorP(i, use_spectral_norm=use_spectral_norm) for i in periods])

    def forward(self, x, x_hat=None):
        """
        Args:
            x (Tensor): ground truth waveform.
            x_hat (Tensor): predicted waveform.

        Returns:
            List[Tensor]: discriminator scores.
            List[List[Tensor]]: list of list of features from each layers of each discriminator.
        """
        x_scores = []
        x_hat_scores = [] if x_hat is not None else None
        x_feats = []
        x_hat_feats = [] if x_hat is not None else None
        for net in self.nets:
            x_score, x_feat = net(x)
            x_scores.append(x_score)
            x_feats.append(x_feat)
            if x_hat is not None:
                x_hat_score, x_hat_feat = net(x_hat)
                x_hat_scores.append(x_hat_score)
                x_hat_feats.append(x_hat_feat)
        return x_scores, x_feats, x_hat_scores, x_hat_feats


================================================
FILE: TTS/tts/layers/vits/networks.py
================================================
import math

import torch
from torch import nn

from TTS.tts.layers.glow_tts.glow import WN
from TTS.tts.layers.glow_tts.transformer import RelativePositionTransformer
from TTS.tts.utils.helpers import sequence_mask

LRELU_SLOPE = 0.1


def convert_pad_shape(pad_shape):
    l = pad_shape[::-1]
    pad_shape = [item for sublist in l for item in sublist]
    return pad_shape


def init_weights(m, mean=0.0, std=0.01):
    classname = m.__class__.__name__
    if classname.find("Conv") != -1:
        m.weight.data.normal_(mean, std)


def get_padding(kernel_size, dilation=1):
    return int((kernel_size * dilation - dilation) / 2)


class TextEncoder(nn.Module):
    def __init__(
        self,
        n_vocab: int,
        out_channels: int,
        hidden_channels: int,
        hidden_channels_ffn: int,
        num_heads: int,
        num_layers: int,
        kernel_size: int,
        dropout_p: float,
        language_emb_dim: int = None,
    ):
        """Text Encoder for VITS model.

        Args:
            n_vocab (int): Number of characters for the embedding layer.
            out_channels (int): Number of channels for the output.
            hidden_channels (int): Number of channels for the hidden layers.
            hidden_channels_ffn (int): Number of channels for the convolutional layers.
            num_heads (int): Number of attention heads for the Transformer layers.
            num_layers (int): Number of Transformer layers.
            kernel_size (int): Kernel size for the FFN layers in Transformer network.
            dropout_p (float): Dropout rate for the Transformer layers.
        """
        super().__init__()
        self.out_channels = out_channels
        self.hidden_channels = hidden_channels

        self.emb = nn.Embedding(n_vocab, hidden_channels)

        nn.init.normal_(self.emb.weight, 0.0, hidden_channels**-0.5)

        if language_emb_dim:
            hidden_channels += language_emb_dim

        self.encoder = RelativePositionTransformer(
            in_channels=hidden_channels,
            out_channels=hidden_channels,
            hidden_channels=hidden_channels,
            hidden_channels_ffn=hidden_channels_ffn,
            num_heads=num_heads,
            num_layers=num_layers,
            kernel_size=kernel_size,
            dropout_p=dropout_p,
            layer_norm_type="2",
            rel_attn_window_size=4,
        )

        self.proj = nn.Conv1d(hidden_channels, out_channels * 2, 1)

    def forward(self, x, x_lengths, lang_emb=None):
        """
        Shapes:
            - x: :math:`[B, T]`
            - x_length: :math:`[B]`
        """
        assert x.shape[0] == x_lengths.shape[0]
        x = self.emb(x) * math.sqrt(self.hidden_channels)  # [b, t, h]

        # concat the lang emb in embedding chars
        if lang_emb is not None:
            x = torch.cat((x, lang_emb.transpose(2, 1).expand(x.size(0), x.size(1), -1)), dim=-1)

        x = torch.transpose(x, 1, -1)  # [b, h, t]
        x_mask = torch.unsqueeze(sequence_mask(x_lengths, x.size(2)), 1).to(x.dtype)  # [b, 1, t]

        x = self.encoder(x * x_mask, x_mask)
        stats = self.proj(x) * x_mask

        m, logs = torch.split(stats, self.out_channels, dim=1)
        return x, m, logs, x_mask


class ResidualCouplingBlock(nn.Module):
    def __init__(
        self,
        channels,
        hidden_channels,
        kernel_size,
        dilation_rate,
        num_layers,
        dropout_p=0,
        cond_channels=0,
        mean_only=False,
    ):
        assert channels % 2 == 0, "channels should be divisible by 2"
        super().__init__()
        self.half_channels = channels // 2
        self.mean_only = mean_only
        # input layer
        self.pre = nn.Conv1d(self.half_channels, hidden_channels, 1)
        # coupling layers
        self.enc = WN(
            hidden_channels,
            hidden_channels,
            kernel_size,
            dilation_rate,
            num_layers,
            dropout_p=dropout_p,
            c_in_channels=cond_channels,
        )
        # output layer
        # Initializing last layer to 0 makes the affine coupling layers
        # do nothing at first.  This helps with training stability
        self.post = nn.Conv1d(hidden_channels, self.half_channels * (2 - mean_only), 1)
        self.post.weight.data.zero_()
        self.post.bias.data.zero_()

    def forward(self, x, x_mask, g=None, reverse=False):
        """
        Note:
            Set `reverse` to True for inference.

        Shapes:
            - x: :math:`[B, C, T]`
            - x_mask: :math:`[B, 1, T]`
            - g: :math:`[B, C, 1]`
        """
        x0, x1 = torch.split(x, [self.half_channels] * 2, 1)
        h = self.pre(x0) * x_mask
        h = self.enc(h, x_mask, g=g)
        stats = self.post(h) * x_mask
        if not self.mean_only:
            m, log_scale = torch.split(stats, [self.half_channels] * 2, 1)
        else:
            m = stats
            log_scale = torch.zeros_like(m)

        if not reverse:
            x1 = m + x1 * torch.exp(log_scale) * x_mask
            x = torch.cat([x0, x1], 1)
            logdet = torch.sum(log_scale, [1, 2])
            return x, logdet
        else:
            x1 = (x1 - m) * torch.exp(-log_scale) * x_mask
            x = torch.cat([x0, x1], 1)
            return x


class ResidualCouplingBlocks(nn.Module):
    def __init__(
        self,
        channels: int,
        hidden_channels: int,
        kernel_size: int,
        dilation_rate: int,
        num_layers: int,
        num_flows=4,
        cond_channels=0,
    ):
        """Redisual Coupling blocks for VITS flow layers.

        Args:
            channels (int): Number of input and output tensor channels.
            hidden_channels (int): Number of hidden network channels.
            kernel_size (int): Kernel size of the WaveNet layers.
            dilation_rate (int): Dilation rate of the WaveNet layers.
            num_layers (int): Number of the WaveNet layers.
            num_flows (int, optional): Number of Residual Coupling blocks. Defaults to 4.
            cond_channels (int, optional): Number of channels of the conditioning tensor. Defaults to 0.
        """
        super().__init__()
        self.channels = channels
        self.hidden_channels = hidden_channels
        self.kernel_size = kernel_size
        self.dilation_rate = dilation_rate
        self.num_layers = num_layers
        self.num_flows = num_flows
        self.cond_channels = cond_channels

        self.flows = nn.ModuleList()
        for _ in range(num_flows):
            self.flows.append(
                ResidualCouplingBlock(
                    channels,
                    hidden_channels,
                    kernel_size,
                    dilation_rate,
                    num_layers,
                    cond_channels=cond_channels,
                    mean_only=True,
                )
            )

    def forward(self, x, x_mask, g=None, reverse=False):
        """
        Note:
            Set `reverse` to True for inference.

        Shapes:
            - x: :math:`[B, C, T]`
            - x_mask: :math:`[B, 1, T]`
            - g: :math:`[B, C, 1]`
        """
        if not reverse:
            for flow in self.flows:
                x, _ = flow(x, x_mask, g=g, reverse=reverse)
                x = torch.flip(x, [1])
        else:
            for flow in reversed(self.flows):
                x = torch.flip(x, [1])
                x = flow(x, x_mask, g=g, reverse=reverse)
        return x


class PosteriorEncoder(nn.Module):
    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        hidden_channels: int,
        kernel_size: int,
        dilation_rate: int,
        num_layers: int,
        cond_channels=0,
    ):
        """Posterior Encoder of VITS model.

        ::
            x -> conv1x1() -> WaveNet() (non-causal) -> conv1x1() -> split() -> [m, s] -> sample(m, s) -> z

        Args:
            in_channels (int): Number of input tensor channels.
            out_channels (int): Number of output tensor channels.
            hidden_channels (int): Number of hidden channels.
            kernel_size (int): Kernel size of the WaveNet convolution layers.
            dilation_rate (int): Dilation rate of the WaveNet layers.
            num_layers (int): Number of the WaveNet layers.
            cond_channels (int, optional): Number of conditioning tensor channels. Defaults to 0.
        """
        super().__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.hidden_channels = hidden_channels
        self.kernel_size = kernel_size
        self.dilation_rate = dilation_rate
        self.num_layers = num_layers
        self.cond_channels = cond_channels

        self.pre = nn.Conv1d(in_channels, hidden_channels, 1)
        self.enc = WN(
            hidden_channels, hidden_channels, kernel_size, dilation_rate, num_layers, c_in_channels=cond_channels
        )
        self.proj = nn.Conv1d(hidden_channels, out_channels * 2, 1)

    def forward(self, x, x_lengths, g=None):
        """
        Shapes:
            - x: :math:`[B, C, T]`
            - x_lengths: :math:`[B, 1]`
            - g: :math:`[B, C, 1]`
        """
        x_mask = torch.unsqueeze(sequence_mask(x_lengths, x.size(2)), 1).to(x.dtype)
        x = self.pre(x) * x_mask
        x = self.enc(x, x_mask, g=g)
        stats = self.proj(x) * x_mask
        mean, log_scale = torch.split(stats, self.out_channels, dim=1)
        z = (mean + torch.randn_like(mean) * torch.exp(log_scale)) * x_mask
        return z, mean, log_scale, x_mask


================================================
FILE: TTS/tts/layers/vits/stochastic_duration_predictor.py
================================================
import math

import torch
from torch import nn
from torch.nn import functional as F

from TTS.tts.layers.generic.normalization import LayerNorm2
from TTS.tts.layers.vits.transforms import piecewise_rational_quadratic_transform


class DilatedDepthSeparableConv(nn.Module):
    def __init__(self, channels, kernel_size, num_layers, dropout_p=0.0) -> torch.tensor:
        """Dilated Depth-wise Separable Convolution module.

        ::
            x |-> DDSConv(x) -> LayerNorm(x) -> GeLU(x) -> Conv1x1(x) -> LayerNorm(x) -> GeLU(x) -> + -> o
              |-------------------------------------------------------------------------------------^

        Args:
            channels ([type]): [description]
            kernel_size ([type]): [description]
            num_layers ([type]): [description]
            dropout_p (float, optional): [description]. Defaults to 0.0.

        Returns:
            torch.tensor: Network output masked by the input sequence mask.
        """
        super().__init__()
        self.num_layers = num_layers

        self.convs_sep = nn.ModuleList()
        self.convs_1x1 = nn.ModuleList()
        self.norms_1 = nn.ModuleList()
        self.norms_2 = nn.ModuleList()
        for i in range(num_layers):
            dilation = kernel_size**i
            padding = (kernel_size * dilation - dilation) // 2
            self.convs_sep.append(
                nn.Conv1d(channels, channels, kernel_size, groups=channels, dilation=dilation, padding=padding)
            )
            self.convs_1x1.append(nn.Conv1d(channels, channels, 1))
            self.norms_1.append(LayerNorm2(channels))
            self.norms_2.append(LayerNorm2(channels))
        self.dropout = nn.Dropout(dropout_p)

    def forward(self, x, x_mask, g=None):
        """
        Shapes:
            - x: :math:`[B, C, T]`
            - x_mask: :math:`[B, 1, T]`
        """
        if g is not None:
            x = x + g
        for i in range(self.num_layers):
            y = self.convs_sep[i](x * x_mask)
            y = self.norms_1[i](y)
            y = F.gelu(y)
            y = self.convs_1x1[i](y)
            y = self.norms_2[i](y)
            y = F.gelu(y)
            y = self.dropout(y)
            x = x + y
        return x * x_mask


class ElementwiseAffine(nn.Module):
    """Element-wise affine transform like no-population stats BatchNorm alternative.

    Args:
        channels (int): Number of input tensor channels.
    """

    def __init__(self, channels):
        super().__init__()
        self.translation = nn.Parameter(torch.zeros(channels, 1))
        self.log_scale = nn.Parameter(torch.zeros(channels, 1))

    def forward(self, x, x_mask, reverse=False, **kwargs):  # pylint: disable=unused-argument
        if not reverse:
            y = (x * torch.exp(self.log_scale) + self.translation) * x_mask
            logdet = torch.sum(self.log_scale * x_mask, [1, 2])
            return y, logdet
        x = (x - self.translation) * torch.exp(-self.log_scale) * x_mask
        return x


class ConvFlow(nn.Module):
    """Dilated depth separable convolutional based spline flow.

    Args:
        in_channels (int): Number of input tensor channels.
        hidden_channels (int): Number of in network channels.
        kernel_size (int): Convolutional kernel size.
        num_layers (int): Number of convolutional layers.
        num_bins (int, optional): Number of spline bins. Defaults to 10.
        tail_bound (float, optional): Tail bound for PRQT. Defaults to 5.0.
    """

    def __init__(
        self,
        in_channels: int,
        hidden_channels: int,
        kernel_size: int,
        num_layers: int,
        num_bins=10,
        tail_bound=5.0,
    ):
        super().__init__()
        self.num_bins = num_bins
        self.tail_bound = tail_bound
        self.hidden_channels = hidden_channels
        self.half_channels = in_channels // 2

        self.pre = nn.Conv1d(self.half_channels, hidden_channels, 1)
        self.convs = DilatedDepthSeparableConv(hidden_channels, kernel_size, num_layers, dropout_p=0.0)
        self.proj = nn.Conv1d(hidden_channels, self.half_channels * (num_bins * 3 - 1), 1)
        self.proj.weight.data.zero_()
        self.proj.bias.data.zero_()

    def forward(self, x, x_mask, g=None, reverse=False):
        x0, x1 = torch.split(x, [self.half_channels] * 2, 1)
        h = self.pre(x0)
        h = self.convs(h, x_mask, g=g)
        h = self.proj(h) * x_mask

        b, c, t = x0.shape
        h = h.reshape(b, c, -1, t).permute(0, 1, 3, 2)  # [b, cx?, t] -> [b, c, t, ?]

        unnormalized_widths = h[..., : self.num_bins] / math.sqrt(self.hidden_channels)
        unnormalized_heights = h[..., self.num_bins : 2 * self.num_bins] / math.sqrt(self.hidden_channels)
        unnormalized_derivatives = h[..., 2 * self.num_bins :]

        x1, logabsdet = piecewise_rational_quadratic_transform(
            x1,
            unnormalized_widths,
            unnormalized_heights,
            unnormalized_derivatives,
            inverse=reverse,
            tails="linear",
            tail_bound=self.tail_bound,
        )

        x = torch.cat([x0, x1], 1) * x_mask
        logdet = torch.sum(logabsdet * x_mask, [1, 2])
        if not reverse:
            return x, logdet
        return x


class StochasticDurationPredictor(nn.Module):
    """Stochastic duration predictor with Spline Flows.

    It applies Variational Dequantization and Variationsl Data Augmentation.

    Paper:
        SDP: https://arxiv.org/pdf/2106.06103.pdf
        Spline Flow: https://arxiv.org/abs/1906.04032

    ::
        ## Inference

        x -> TextCondEncoder() -> Flow() -> dr_hat
        noise ----------------------^

        ## Training
                                                                              |---------------------|
        x -> TextCondEncoder() -> + -> PosteriorEncoder() -> split() -> z_u, z_v -> (d - z_u) -> concat() -> Flow() -> noise
        d -> DurCondEncoder()  -> ^                                                    |
        |------------------------------------------------------------------------------|

    Args:
        in_channels (int): Number of input tensor channels.
        hidden_channels (int): Number of hidden channels.
        kernel_size (int): Kernel size of convolutional layers.
        dropout_p (float): Dropout rate.
        num_flows (int, optional): Number of flow blocks. Defaults to 4.
        cond_channels (int, optional): Number of channels of conditioning tensor. Defaults to 0.
    """

    def __init__(
        self,
        in_channels: int,
        hidden_channels: int,
        kernel_size: int,
        dropout_p: float,
        num_flows=4,
        cond_channels=0,
        language_emb_dim=0,
    ):
        super().__init__()

        # add language embedding dim in the input
        if language_emb_dim:
            in_channels += language_emb_dim

        # condition encoder text
        self.pre = nn.Conv1d(in_channels, hidden_channels, 1)
        self.convs = DilatedDepthSeparableConv(hidden_channels, kernel_size, num_layers=3, dropout_p=dropout_p)
        self.proj = nn.Conv1d(hidden_channels, hidden_channels, 1)

        # posterior encoder
        self.flows = nn.ModuleList()
        self.flows.append(ElementwiseAffine(2))
        self.flows += [ConvFlow(2, hidden_channels, kernel_size, num_layers=3) for _ in range(num_flows)]

        # condition encoder duration
        self.post_pre = nn.Conv1d(1, hidden_channels, 1)
        self.post_convs = DilatedDepthSeparableConv(hidden_channels, kernel_size, num_layers=3, dropout_p=dropout_p)
        self.post_proj = nn.Conv1d(hidden_channels, hidden_channels, 1)

        # flow layers
        self.post_flows = nn.ModuleList()
        self.post_flows.append(ElementwiseAffine(2))
        self.post_flows += [ConvFlow(2, hidden_channels, kernel_size, num_layers=3) for _ in range(num_flows)]

        if cond_channels != 0 and cond_channels is not None:
            self.cond = nn.Conv1d(cond_channels, hidden_channels, 1)

        if language_emb_dim != 0 and language_emb_dim is not None:
            self.cond_lang = nn.Conv1d(language_emb_dim, hidden_channels, 1)

    def forward(self, x, x_mask, dr=None, g=None, lang_emb=None, reverse=False, noise_scale=1.0):
        """
        Shapes:
            - x: :math:`[B, C, T]`
            - x_mask: :math:`[B, 1, T]`
            - dr: :math:`[B, 1, T]`
            - g: :math:`[B, C]`
        """
        # condition encoder text
        x = self.pre(x)
        if g is not None:
            x = x + self.cond(g)

        if lang_emb is not None:
            x = x + self.cond_lang(lang_emb)

        x = self.convs(x, x_mask)
        x = self.proj(x) * x_mask

        if not reverse:
            flows = self.flows
            assert dr is not None

            # condition encoder duration
            h = self.post_pre(dr)
            h = self.post_convs(h, x_mask)
            h = self.post_proj(h) * x_mask
            noise = torch.randn(dr.size(0), 2, dr.size(2)).to(device=x.device, dtype=x.dtype) * x_mask
            z_q = noise

            # posterior encoder
            logdet_tot_q = 0.0
            for idx, flow in enumerate(self.post_flows):
                z_q, logdet_q = flow(z_q, x_mask, g=(x + h))
                logdet_tot_q = logdet_tot_q + logdet_q
                if idx > 0:
                    z_q = torch.flip(z_q, [1])

            z_u, z_v = torch.split(z_q, [1, 1], 1)
            u = torch.sigmoid(z_u) * x_mask
            z0 = (dr - u) * x_mask

            # posterior encoder - neg log likelihood
            logdet_tot_q += torch.sum((F.logsigmoid(z_u) + F.logsigmoid(-z_u)) * x_mask, [1, 2])
            nll_posterior_encoder = (
                torch.sum(-0.5 * (math.log(2 * math.pi) + (noise**2)) * x_mask, [1, 2]) - logdet_tot_q
            )

            z0 = torch.log(torch.clamp_min(z0, 1e-5)) * x_mask
            logdet_tot = torch.sum(-z0, [1, 2])
            z = torch.cat([z0, z_v], 1)

            # flow layers
            for idx, flow in enumerate(flows):
                z, logdet = flow(z, x_mask, g=x, reverse=reverse)
                logdet_tot = logdet_tot + logdet
                if idx > 0:
                    z = torch.flip(z, [1])

            # flow layers - neg log likelihood
            nll_flow_layers = torch.sum(0.5 * (math.log(2 * math.pi) + (z**2)) * x_mask, [1, 2]) - logdet_tot
            return nll_flow_layers + nll_posterior_encoder

        flows = list(reversed(self.flows))
        flows = flows[:-2] + [flows[-1]]  # remove a useless vflow
        z = torch.randn(x.size(0), 2, x.size(2)).to(device=x.device, dtype=x.dtype) * noise_scale
        for flow in flows:
            z = torch.flip(z, [1])
            z = flow(z, x_mask, g=x, reverse=reverse)

        z0, _ = torch.split(z, [1, 1], 1)
        logw = z0
        return logw


================================================
FILE: TTS/tts/layers/vits/transforms.py
================================================
# adopted from https://github.com/bayesiains/nflows

import numpy as np
import torch
from torch.nn import functional as F

DEFAULT_MIN_BIN_WIDTH = 1e-3
DEFAULT_MIN_BIN_HEIGHT = 1e-3
DEFAULT_MIN_DERIVATIVE = 1e-3


def piecewise_rational_quadratic_transform(
    inputs,
    unnormalized_widths,
    unnormalized_heights,
    unnormalized_derivatives,
    inverse=False,
    tails=None,
    tail_bound=1.0,
    min_bin_width=DEFAULT_MIN_BIN_WIDTH,
    min_bin_height=DEFAULT_MIN_BIN_HEIGHT,
    min_derivative=DEFAULT_MIN_DERIVATIVE,
):
    if tails is None:
        spline_fn = rational_quadratic_spline
        spline_kwargs = {}
    else:
        spline_fn = unconstrained_rational_quadratic_spline
        spline_kwargs = {"tails": tails, "tail_bound": tail_bound}

    outputs, logabsdet = spline_fn(
        inputs=inputs,
        unnormalized_widths=unnormalized_widths,
        unnormalized_heights=unnormalized_heights,
        unnormalized_derivatives=unnormalized_derivatives,
        inverse=inverse,
        min_bin_width=min_bin_width,
        min_bin_height=min_bin_height,
        min_derivative=min_derivative,
        **spline_kwargs,
    )
    return outputs, logabsdet


def searchsorted(bin_locations, inputs, eps=1e-6):
    bin_locations[..., -1] += eps
    return torch.sum(inputs[..., None] >= bin_locations, dim=-1) - 1


def unconstrained_rational_quadratic_spline(
    inputs,
    unnormalized_widths,
    unnormalized_heights,
    unnormalized_derivatives,
    inverse=False,
    tails="linear",
    tail_bound=1.0,
    min_bin_width=DEFAULT_MIN_BIN_WIDTH,
    min_bin_height=DEFAULT_MIN_BIN_HEIGHT,
    min_derivative=DEFAULT_MIN_DERIVATIVE,
):
    inside_interval_mask = (inputs >= -tail_bound) & (inputs <= tail_bound)
    outside_interval_mask = ~inside_interval_mask

    outputs = torch.zeros_like(inputs)
    logabsdet = torch.zeros_like(inputs)

    if tails == "linear":
        unnormalized_derivatives = F.pad(unnormalized_derivatives, pad=(1, 1))
        constant = np.log(np.exp(1 - min_derivative) - 1)
        unnormalized_derivatives[..., 0] = constant
        unnormalized_derivatives[..., -1] = constant

        outputs[outside_interval_mask] = inputs[outside_interval_mask]
        logabsdet[outside_interval_mask] = 0
    else:
        raise RuntimeError("{} tails are not implemented.".format(tails))

    outputs[inside_interval_mask], logabsdet[inside_interval_mask] = rational_quadratic_spline(
        inputs=inputs[inside_interval_mask],
        unnormalized_widths=unnormalized_widths[inside_interval_mask, :],
        unnormalized_heights=unnormalized_heights[inside_interval_mask, :],
        unnormalized_derivatives=unnormalized_derivatives[inside_interval_mask, :],
        inverse=inverse,
        left=-tail_bound,
        right=tail_bound,
        bottom=-tail_bound,
        top=tail_bound,
        min_bin_width=min_bin_width,
        min_bin_height=min_bin_height,
        min_derivative=min_derivative,
    )

    return outputs, logabsdet


def rational_quadratic_spline(
    inputs,
    unnormalized_widths,
    unnormalized_heights,
    unnormalized_derivatives,
    inverse=False,
    left=0.0,
    right=1.0,
    bottom=0.0,
    top=1.0,
    min_bin_width=DEFAULT_MIN_BIN_WIDTH,
    min_bin_height=DEFAULT_MIN_BIN_HEIGHT,
    min_derivative=DEFAULT_MIN_DERIVATIVE,
):
    if torch.min(inputs) < left or torch.max(inputs) > right:
        raise ValueError("Input to a transform is not within its domain")

    num_bins = unnormalized_widths.shape[-1]

    if min_bin_width * num_bins > 1.0:
        raise ValueError("Minimal bin width too large for the number of bins")
    if min_bin_height * num_bins > 1.0:
        raise ValueError("Minimal bin height too large for the number of bins")

    widths = F.softmax(unnormalized_widths, dim=-1)
    widths = min_bin_width + (1 - min_bin_width * num_bins) * widths
    cumwidths = torch.cumsum(widths, dim=-1)
    cumwidths = F.pad(cumwidths, pad=(1, 0), mode="constant", value=0.0)
    cumwidths = (right - left) * cumwidths + left
    cumwidths[..., 0] = left
    cumwidths[..., -1] = right
    widths = cumwidths[..., 1:] - cumwidths[..., :-1]

    derivatives = min_derivative + F.softplus(unnormalized_derivatives)

    heights = F.softmax(unnormalized_heights, dim=-1)
    heights = min_bin_height + (1 - min_bin_height * num_bins) * heights
    cumheights = torch.cumsum(heights, dim=-1)
    cumheights = F.pad(cumheights, pad=(1, 0), mode="constant", value=0.0)
    cumheights = (top - bottom) * cumheights + bottom
    cumheights[..., 0] = bottom
    cumheights[..., -1] = top
    heights = cumheights[..., 1:] - cumheights[..., :-1]

    if inverse:
        bin_idx = searchsorted(cumheights, inputs)[..., None]
    else:
        bin_idx = searchsorted(cumwidths, inputs)[..., None]

    input_cumwidths = cumwidths.gather(-1, bin_idx)[..., 0]
    input_bin_widths = widths.gather(-1, bin_idx)[..., 0]

    input_cumheights = cumheights.gather(-1, bin_idx)[..., 0]
    delta = heights / widths
    input_delta = delta.gather(-1, bin_idx)[..., 0]

    input_derivatives = derivatives.gather(-1, bin_idx)[..., 0]
    input_derivatives_plus_one = derivatives[..., 1:].gather(-1, bin_idx)[..., 0]

    input_heights = heights.gather(-1, bin_idx)[..., 0]

    if inverse:
        a = (inputs - input_cumheights) * (
            input_derivatives + input_derivatives_plus_one - 2 * input_delta
        ) + input_heights * (input_delta - input_derivatives)
        b = input_heights * input_derivatives - (inputs - input_cumheights) * (
            input_derivatives + input_derivatives_plus_one - 2 * input_delta
        )
        c = -input_delta * (inputs - input_cumheights)

        discriminant = b.pow(2) - 4 * a * c
        assert (discriminant >= 0).all()

        root = (2 * c) / (-b - torch.sqrt(discriminant))
        outputs = root * input_bin_widths + input_cumwidths

        theta_one_minus_theta = root * (1 - root)
        denominator = input_delta + (
            (input_derivatives + input_derivatives_plus_one - 2 * input_delta) * theta_one_minus_theta
        )
        derivative_numerator = input_delta.pow(2) * (
            input_derivatives_plus_one * root.pow(2)
            + 2 * input_delta * theta_one_minus_theta
            + input_derivatives * (1 - root).pow(2)
        )
        logabsdet = torch.log(derivative_numerator) - 2 * torch.log(denominator)

        return outputs, -logabsdet
    else:
        theta = (inputs - input_cumwidths) / input_bin_widths
        theta_one_minus_theta = theta * (1 - theta)

        numerator = input_heights * (input_delta * theta.pow(2) + input_derivatives * theta_one_minus_theta)
        denominator = input_delta + (
            (input_derivatives + input_derivatives_plus_one - 2 * input_delta) * theta_one_minus_theta
        )
        outputs = input_cumheights + numerator / denominator

        derivative_numerator = input_delta.pow(2) * (
            input_derivatives_plus_one * theta.pow(2)
            + 2 * input_delta * theta_one_minus_theta
            + input_derivatives * (1 - theta).pow(2)
        )
        logabsdet = torch.log(derivative_numerator) - 2 * torch.log(denominator)

        return outputs, logabsdet


================================================
FILE: TTS/tts/models/__init__.py
================================================
from typing import Dict, List, Union

from TTS.utils.generic_utils import find_module


def setup_model(config: "Coqpit", samples: Union[List[List], List[Dict]] = None) -> "BaseTTS":
    print(" > Using model: {}".format(config.model))
    # fetch the right model implementation.
    if "base_model" in config and config["base_model"] is not None:
        MyModel = find_module("TTS.tts.models", config.base_model.lower())
    else:
        MyModel = find_module("TTS.tts.models", config.model.lower())
    model = MyModel.init_from_config(config, samples)
    return model


================================================
FILE: TTS/tts/models/align_tts.py
================================================
from dataclasses import dataclass, field
from typing import Dict, List, Union

import torch
from coqpit import Coqpit
from torch import nn

from TTS.tts.layers.align_tts.mdn import MDNBlock
from TTS.tts.layers.feed_forward.decoder import Decoder
from TTS.tts.layers.feed_forward.duration_predictor import DurationPredictor
from TTS.tts.layers.feed_forward.encoder import Encoder
from TTS.tts.layers.generic.pos_encoding import PositionalEncoding
from TTS.tts.models.base_tts import BaseTTS
from TTS.tts.utils.helpers import generate_path, maximum_path, sequence_mask
from TTS.tts.utils.speakers import SpeakerManager
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.tts.utils.visual import plot_alignment, plot_spectrogram
from TTS.utils.io import load_fsspec


@dataclass
class AlignTTSArgs(Coqpit):
    """
    Args:
        num_chars (int):
            number of unique input to characters
        out_channels (int):
            number of output tensor channels. It is equal to the expected spectrogram size.
        hidden_channels (int):
            number of channels in all the model layers.
        hidden_channels_ffn (int):
            number of channels in transformer's conv layers.
        hidden_channels_dp (int):
            number of channels in duration predictor network.
        num_heads (int):
            number of attention heads in transformer networks.
        num_transformer_layers (int):
            number of layers in encoder and decoder transformer blocks.
        dropout_p (int):
            dropout rate in transformer layers.
        length_scale (int, optional):
            coefficient to set the speech speed. <1 slower, >1 faster. Defaults to 1.
        num_speakers (int, optional):
            number of speakers for multi-speaker training. Defaults to 0.
        external_c (bool, optional):
            enable external speaker embeddings. Defaults to False.
        c_in_channels (int, optional):
            number of channels in speaker embedding vectors. Defaults to 0.
    """

    num_chars: int = None
    out_channels: int = 80
    hidden_channels: int = 256
    hidden_channels_dp: int = 256
    encoder_type: str = "fftransformer"
    encoder_params: dict = field(
        default_factory=lambda: {"hidden_channels_ffn": 1024, "num_heads": 2, "num_layers": 6, "dropout_p": 0.1}
    )
    decoder_type: str = "fftransformer"
    decoder_params: dict = field(
        default_factory=lambda: {"hidden_channels_ffn": 1024, "num_heads": 2, "num_layers": 6, "dropout_p": 0.1}
    )
    length_scale: float = 1.0
    num_speakers: int = 0
    use_speaker_embedding: bool = False
    use_d_vector_file: bool = False
    d_vector_dim: int = 0


class AlignTTS(BaseTTS):
    """AlignTTS with modified duration predictor.
    https://arxiv.org/pdf/2003.01950.pdf

    Encoder -> DurationPredictor -> Decoder

    Check :class:`AlignTTSArgs` for the class arguments.

    Paper Abstract:
        Targeting at both high efficiency and performance, we propose AlignTTS to predict the
        mel-spectrum in parallel. AlignTTS is based on a Feed-Forward Transformer which generates mel-spectrum from a
        sequence of characters, and the duration of each character is determined by a duration predictor.Instead of
        adopting the attention mechanism in Transformer TTS to align text to mel-spectrum, the alignment loss is presented
        to consider all possible alignments in training by use of dynamic programming. Experiments on the LJSpeech dataset s
        how that our model achieves not only state-of-the-art performance which outperforms Transformer TTS by 0.03 in mean
        option score (MOS), but also a high efficiency which is more than 50 times faster than real-time.

    Note:
        Original model uses a separate character embedding layer for duration predictor. However, it causes the
        duration predictor to overfit and prevents learning higher level interactions among characters. Therefore,
        we predict durations based on encoder outputs which has higher level information about input characters. This
        enables training without phases as in the original paper.

        Original model uses Transormers in encoder and decoder layers. However, here you can set the architecture
        differently based on your requirements using ```encoder_type``` and ```decoder_type``` parameters.

    Examples:
        >>> from TTS.tts.configs.align_tts_config import AlignTTSConfig
        >>> config = AlignTTSConfig()
        >>> model = AlignTTS(config)

    """

    # pylint: disable=dangerous-default-value

    def __init__(
        self,
        config: "AlignTTSConfig",
        ap: "AudioProcessor" = None,
        tokenizer: "TTSTokenizer" = None,
        speaker_manager: SpeakerManager = None,
    ):
        super().__init__(config, ap, tokenizer, speaker_manager)
        self.speaker_manager = speaker_manager
        self.phase = -1
        self.length_scale = (
            float(config.model_args.length_scale)
            if isinstance(config.model_args.length_scale, int)
            else config.model_args.length_scale
        )

        self.emb = nn.Embedding(self.config.model_args.num_chars, self.config.model_args.hidden_channels)

        self.embedded_speaker_dim = 0
        self.init_multispeaker(config)

        self.pos_encoder = PositionalEncoding(config.model_args.hidden_channels)
        self.encoder = Encoder(
            config.model_args.hidden_channels,
            config.model_args.hidden_channels,
            config.model_args.encoder_type,
            config.model_args.encoder_params,
            self.embedded_speaker_dim,
        )
        self.decoder = Decoder(
            config.model_args.out_channels,
            config.model_args.hidden_channels,
            config.model_args.decoder_type,
            config.model_args.decoder_params,
        )
        self.duration_predictor = DurationPredictor(config.model_args.hidden_channels_dp)

        self.mod_layer = nn.Conv1d(config.model_args.hidden_channels, config.model_args.hidden_channels, 1)

        self.mdn_block = MDNBlock(config.model_args.hidden_channels, 2 * config.model_args.out_channels)

        if self.embedded_speaker_dim > 0 and self.embedded_speaker_dim != config.model_args.hidden_channels:
            self.proj_g = nn.Conv1d(self.embedded_speaker_dim, config.model_args.hidden_channels, 1)

    @staticmethod
    def compute_log_probs(mu, log_sigma, y):
        # pylint: disable=protected-access, c-extension-no-member
        y = y.transpose(1, 2).unsqueeze(1)  # [B, 1, T1, D]
        mu = mu.transpose(1, 2).unsqueeze(2)  # [B, T2, 1, D]
        log_sigma = log_sigma.transpose(1, 2).unsqueeze(2)  # [B, T2, 1, D]
        expanded_y, expanded_mu = torch.broadcast_tensors(y, mu)
        exponential = -0.5 * torch.mean(
            torch._C._nn.mse_loss(expanded_y, expanded_mu, 0) / torch.pow(log_sigma.exp(), 2), dim=-1
        )  # B, L, T
        logp = exponential - 0.5 * log_sigma.mean(dim=-1)
        return logp

    def compute_align_path(self, mu, log_sigma, y, x_mask, y_mask):
        # find the max alignment path
        attn_mask = torch.unsqueeze(x_mask, -1) * torch.unsqueeze(y_mask, 2)
        log_p = self.compute_log_probs(mu, log_sigma, y)
        # [B, T_en, T_dec]
        attn = maximum_path(log_p, attn_mask.squeeze(1)).unsqueeze(1)
        dr_mas = torch.sum(attn, -1)
        return dr_mas.squeeze(1), log_p

    @staticmethod
    def generate_attn(dr, x_mask, y_mask=None):
        # compute decode mask from the durations
        if y_mask is None:
            y_lengths = dr.sum(1).long()
            y_lengths[y_lengths < 1] = 1
            y_mask = torch.unsqueeze(sequence_mask(y_lengths, None), 1).to(dr.dtype)
        attn_mask = torch.unsqueeze(x_mask, -1) * torch.unsqueeze(y_mask, 2)
        attn = generate_path(dr, attn_mask.squeeze(1)).to(dr.dtype)
        return attn

    def expand_encoder_outputs(self, en, dr, x_mask, y_mask):
        """Generate attention alignment map from durations and
        expand encoder outputs

        Examples::
            - encoder output: [a,b,c,d]
            - durations: [1, 3, 2, 1]

            - expanded: [a, b, b, b, c, c, d]
            - attention map: [[0, 0, 0, 0, 0, 0, 1],
                             [0, 0, 0, 0, 1, 1, 0],
                             [0, 1, 1, 1, 0, 0, 0],
                             [1, 0, 0, 0, 0, 0, 0]]
        """
        attn = self.generate_attn(dr, x_mask, y_mask)
        o_en_ex = torch.matmul(attn.squeeze(1).transpose(1, 2), en.transpose(1, 2)).transpose(1, 2)
        return o_en_ex, attn

    def format_durations(self, o_dr_log, x_mask):
        o_dr = (torch.exp(o_dr_log) - 1) * x_mask * self.length_scale
        o_dr[o_dr < 1] = 1.0
        o_dr = torch.round(o_dr)
        return o_dr

    @staticmethod
    def _concat_speaker_embedding(o_en, g):
        g_exp = g.expand(-1, -1, o_en.size(-1))  # [B, C, T_en]
        o_en = torch.cat([o_en, g_exp], 1)
        return o_en

    def _sum_speaker_embedding(self, x, g):
        # project g to decoder dim.
        if hasattr(self, "proj_g"):
            g = self.proj_g(g)

        return x + g

    def _forward_encoder(self, x, x_lengths, g=None):
        if hasattr(self, "emb_g"):
            g = nn.functional.normalize(self.speaker_embedding(g))  # [B, C, 1]

        if g is not None:
            g = g.unsqueeze(-1)

        # [B, T, C]
        x_emb = self.emb(x)
        # [B, C, T]
        x_emb = torch.transpose(x_emb, 1, -1)

        # compute sequence masks
        x_mask = torch.unsqueeze(sequence_mask(x_lengths, x.shape[1]), 1).to(x.dtype)

        # encoder pass
        o_en = self.encoder(x_emb, x_mask)

        # speaker conditioning for duration predictor
        if g is not None:
            o_en_dp = self._concat_speaker_embedding(o_en, g)
        else:
            o_en_dp = o_en
        return o_en, o_en_dp, x_mask, g

    def _forward_decoder(self, o_en, o_en_dp, dr, x_mask, y_lengths, g):
        y_mask = torch.unsqueeze(sequence_mask(y_lengths, None), 1).to(o_en_dp.dtype)
        # expand o_en with durations
        o_en_ex, attn = self.expand_encoder_outputs(o_en, dr, x_mask, y_mask)
        # positional encoding
        if hasattr(self, "pos_encoder"):
            o_en_ex = self.pos_encoder(o_en_ex, y_mask)
        # speaker embedding
        if g is not None:
            o_en_ex = self._sum_speaker_embedding(o_en_ex, g)
        # decoder pass
        o_de = self.decoder(o_en_ex, y_mask, g=g)
        return o_de, attn.transpose(1, 2)

    def _forward_mdn(self, o_en, y, y_lengths, x_mask):
        # MAS potentials and alignment
        mu, log_sigma = self.mdn_block(o_en)
        y_mask = torch.unsqueeze(sequence_mask(y_lengths, None), 1).to(o_en.dtype)
        dr_mas, logp = self.compute_align_path(mu, log_sigma, y, x_mask, y_mask)
        return dr_mas, mu, log_sigma, logp

    def forward(
        self, x, x_lengths, y, y_lengths, aux_input={"d_vectors": None}, phase=None
    ):  # pylint: disable=unused-argument
        """
        Shapes:
            - x: :math:`[B, T_max]`
            - x_lengths: :math:`[B]`
            - y_lengths: :math:`[B]`
            - dr: :math:`[B, T_max]`
            - g: :math:`[B, C]`
        """
        y = y.transpose(1, 2)
        g = aux_input["d_vectors"] if "d_vectors" in aux_input else None
        o_de, o_dr_log, dr_mas_log, attn, mu, log_sigma, logp = None, None, None, None, None, None, None
        if phase == 0:
            # train encoder and MDN
            o_en, o_en_dp, x_mask, g = self._forward_encoder(x, x_lengths, g)
            dr_mas, mu, log_sigma, logp = self._forward_mdn(o_en, y, y_lengths, x_mask)
            y_mask = torch.unsqueeze(sequence_mask(y_lengths, None), 1).to(o_en_dp.dtype)
            attn = self.generate_attn(dr_mas, x_mask, y_mask)
        elif phase == 1:
            # train decoder
            o_en, o_en_dp, x_mask, g = self._forward_encoder(x, x_lengths, g)
            dr_mas, _, _, _ = self._forward_mdn(o_en, y, y_lengths, x_mask)
            o_de, attn = self._forward_decoder(o_en.detach(), o_en_dp.detach(), dr_mas.detach(), x_mask, y_lengths, g=g)
        elif phase == 2:
            # train the whole except duration predictor
            o_en, o_en_dp, x_mask, g = self._forward_encoder(x, x_lengths, g)
            dr_mas, mu, log_sigma, logp = self._forward_mdn(o_en, y, y_lengths, x_mask)
            o_de, attn = self._forward_decoder(o_en, o_en_dp, dr_mas, x_mask, y_lengths, g=g)
        elif phase == 3:
            # train duration predictor
            o_en, o_en_dp, x_mask, g = self._forward_encoder(x, x_lengths, g)
            o_dr_log = self.duration_predictor(x, x_mask)
            dr_mas, mu, log_sigma, logp = self._forward_mdn(o_en, y, y_lengths, x_mask)
            o_de, attn = self._forward_decoder(o_en, o_en_dp, dr_mas, x_mask, y_lengths, g=g)
            o_dr_log = o_dr_log.squeeze(1)
        else:
            o_en, o_en_dp, x_mask, g = self._forward_encoder(x, x_lengths, g)
            o_dr_log = self.duration_predictor(o_en_dp.detach(), x_mask)
            dr_mas, mu, log_sigma, logp = self._forward_mdn(o_en, y, y_lengths, x_mask)
            o_de, attn = self._forward_decoder(o_en, o_en_dp, dr_mas, x_mask, y_lengths, g=g)
            o_dr_log = o_dr_log.squeeze(1)
        dr_mas_log = torch.log(dr_mas + 1).squeeze(1)
        outputs = {
            "model_outputs": o_de.transpose(1, 2),
            "alignments": attn,
            "durations_log": o_dr_log,
            "durations_mas_log": dr_mas_log,
            "mu": mu,
            "log_sigma": log_sigma,
            "logp": logp,
        }
        return outputs

    @torch.no_grad()
    def inference(self, x, aux_input={"d_vectors": None}):  # pylint: disable=unused-argument
        """
        Shapes:
            - x: :math:`[B, T_max]`
            - x_lengths: :math:`[B]`
            - g: :math:`[B, C]`
        """
        g = aux_input["d_vectors"] if "d_vectors" in aux_input else None
        x_lengths = torch.tensor(x.shape[1:2]).to(x.device)
        # pad input to prevent dropping the last word
        # x = torch.nn.functional.pad(x, pad=(0, 5), mode='constant', value=0)
        o_en, o_en_dp, x_mask, g = self._forward_encoder(x, x_lengths, g)
        # o_dr_log = self.duration_predictor(x, x_mask)
        o_dr_log = self.duration_predictor(o_en_dp, x_mask)
        # duration predictor pass
        o_dr = self.format_durations(o_dr_log, x_mask).squeeze(1)
        y_lengths = o_dr.sum(1)
        o_de, attn = self._forward_decoder(o_en, o_en_dp, o_dr, x_mask, y_lengths, g=g)
        outputs = {"model_outputs": o_de.transpose(1, 2), "alignments": attn}
        return outputs

    def train_step(self, batch: dict, criterion: nn.Module):
        text_input = batch["text_input"]
        text_lengths = batch["text_lengths"]
        mel_input = batch["mel_input"]
        mel_lengths = batch["mel_lengths"]
        d_vectors = batch["d_vectors"]
        speaker_ids = batch["speaker_ids"]

        aux_input = {"d_vectors": d_vectors, "speaker_ids": speaker_ids}
        outputs = self.forward(text_input, text_lengths, mel_input, mel_lengths, aux_input, self.phase)
        loss_dict = criterion(
            outputs["logp"],
            outputs["model_outputs"],
            mel_input,
            mel_lengths,
            outputs["durations_log"],
            outputs["durations_mas_log"],
            text_lengths,
            phase=self.phase,
        )

        return outputs, loss_dict

    def _create_logs(self, batch, outputs, ap):  # pylint: disable=no-self-use
        model_outputs = outputs["model_outputs"]
        alignments = outputs["alignments"]
        mel_input = batch["mel_input"]

        pred_spec = model_outputs[0].data.cpu().numpy()
        gt_spec = mel_input[0].data.cpu().numpy()
        align_img = alignments[0].data.cpu().numpy()

        figures = {
            "prediction": plot_spectrogram(pred_spec, ap, output_fig=False),
            "ground_truth": plot_spectrogram(gt_spec, ap, output_fig=False),
            "alignment": plot_alignment(align_img, output_fig=False),
        }

        # Sample audio
        train_audio = ap.inv_melspectrogram(pred_spec.T)
        return figures, {"audio": train_audio}

    def train_log(
        self, batch: dict, outputs: dict, logger: "Logger", assets: dict, steps: int
    ) -> None:  # pylint: disable=no-self-use
        figures, audios = self._create_logs(batch, outputs, self.ap)
        logger.train_figures(steps, figures)
        logger.train_audios(steps, audios, self.ap.sample_rate)

    def eval_step(self, batch: dict, criterion: nn.Module):
        return self.train_step(batch, criterion)

    def eval_log(self, batch: dict, outputs: dict, logger: "Logger", assets: dict, steps: int) -> None:
        figures, audios = self._create_logs(batch, outputs, self.ap)
        logger.eval_figures(steps, figures)
        logger.eval_audios(steps, audios, self.ap.sample_rate)

    def load_checkpoint(
        self, config, checkpoint_path, eval=False, cache=False
    ):  # pylint: disable=unused-argument, redefined-builtin
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
        self.load_state_dict(state["model"])
        if eval:
            self.eval()
            assert not self.training

    def get_criterion(self):
        from TTS.tts.layers.losses import AlignTTSLoss  # pylint: disable=import-outside-toplevel

        return AlignTTSLoss(self.config)

    @staticmethod
    def _set_phase(config, global_step):
        """Decide AlignTTS training phase"""
        if isinstance(config.phase_start_steps, list):
            vals = [i < global_step for i in config.phase_start_steps]
            if not True in vals:
                phase = 0
            else:
                phase = (
                    len(config.phase_start_steps)
                    - [i < global_step for i in config.phase_start_steps][::-1].index(True)
                    - 1
                )
        else:
            phase = None
        return phase

    def on_epoch_start(self, trainer):
        """Set AlignTTS training phase on epoch start."""
        self.phase = self._set_phase(trainer.config, trainer.total_steps_done)

    @staticmethod
    def init_from_config(config: "AlignTTSConfig", samples: Union[List[List], List[Dict]] = None):
        """Initiate model from config

        Args:
            config (AlignTTSConfig): Model config.
            samples (Union[List[List], List[Dict]]): Training samples to parse speaker ids for training.
                Defaults to None.
        """
        from TTS.utils.audio import AudioProcessor

        ap = AudioProcessor.init_from_config(config)
        tokenizer, new_config = TTSTokenizer.init_from_config(config)
        speaker_manager = SpeakerManager.init_from_config(config, samples)
        return AlignTTS(new_config, ap, tokenizer, speaker_manager)


================================================
FILE: TTS/tts/models/base_tacotron.py
================================================
import copy
from abc import abstractmethod
from typing import Dict, Tuple

import torch
from coqpit import Coqpit
from torch import nn

from TTS.tts.layers.losses import TacotronLoss
from TTS.tts.models.base_tts import BaseTTS
from TTS.tts.utils.helpers import sequence_mask
from TTS.tts.utils.speakers import SpeakerManager
from TTS.tts.utils.synthesis import synthesis
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.tts.utils.visual import plot_alignment, plot_spectrogram
from TTS.utils.generic_utils import format_aux_input
from TTS.utils.io import load_fsspec
from TTS.utils.training import gradual_training_scheduler


class BaseTacotron(BaseTTS):
    """Base class shared by Tacotron and Tacotron2"""

    def __init__(
        self,
        config: "TacotronConfig",
        ap: "AudioProcessor",
        tokenizer: "TTSTokenizer",
        speaker_manager: SpeakerManager = None,
    ):
        super().__init__(config, ap, tokenizer, speaker_manager)

        # pass all config fields as class attributes
        for key in config:
            setattr(self, key, config[key])

        # layers
        self.embedding = None
        self.encoder = None
        self.decoder = None
        self.postnet = None

        # init tensors
        self.embedded_speakers = None
        self.embedded_speakers_projected = None

        # global style token
        if self.gst and self.use_gst:
            self.decoder_in_features += self.gst.gst_embedding_dim  # add gst embedding dim
            self.gst_layer = None

        # Capacitron
        if self.capacitron_vae and self.use_capacitron_vae:
            self.decoder_in_features += self.capacitron_vae.capacitron_VAE_embedding_dim  # add capacitron embedding dim
            self.capacitron_vae_layer = None

        # additional layers
        self.decoder_backward = None
        self.coarse_decoder = None

    @staticmethod
    def _format_aux_input(aux_input: Dict) -> Dict:
        """Set missing fields to their default values"""
        if aux_input:
            return format_aux_input({"d_vectors": None, "speaker_ids": None}, aux_input)
        return None

    #############################
    # INIT FUNCTIONS
    #############################

    def _init_backward_decoder(self):
        """Init the backward decoder for Forward-Backward decoding."""
        self.decoder_backward = copy.deepcopy(self.decoder)

    def _init_coarse_decoder(self):
        """Init the coarse decoder for Double-Decoder Consistency."""
        self.coarse_decoder = copy.deepcopy(self.decoder)
        self.coarse_decoder.r_init = self.ddc_r
        self.coarse_decoder.set_r(self.ddc_r)

    #############################
    # CORE FUNCTIONS
    #############################

    @abstractmethod
    def forward(self):
        pass

    @abstractmethod
    def inference(self):
        pass

    def load_checkpoint(
        self, config, checkpoint_path, eval=False, cache=False
    ):  # pylint: disable=unused-argument, redefined-builtin
        """Load model checkpoint and set up internals.

        Args:
            config (Coqpi): model configuration.
            checkpoint_path (str): path to checkpoint file.
            eval (bool, optional): whether to load model for evaluation.
            cache (bool, optional): If True, cache the file locally for subsequent calls. It is cached under `get_user_data_dir()/tts_cache`. Defaults to False.
        """
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
        self.load_state_dict(state["model"])
        # TODO: set r in run-time by taking it from the new config
        if "r" in state:
            # set r from the state (for compatibility with older checkpoints)
            self.decoder.set_r(state["r"])
        elif "config" in state:
            # set r from config used at training time (for inference)
            self.decoder.set_r(state["config"]["r"])
        else:
            # set r from the new config (for new-models)
            self.decoder.set_r(config.r)
        if eval:
            self.eval()
            print(f" > Model's reduction rate `r` is set to: {self.decoder.r}")
            assert not self.training

    def get_criterion(self) -> nn.Module:
        """Get the model criterion used in training."""
        return TacotronLoss(self.config)

    @staticmethod
    def init_from_config(config: Coqpit):
        """Initialize model from config."""
        from TTS.utils.audio import AudioProcessor

        ap = AudioProcessor.init_from_config(config)
        tokenizer = TTSTokenizer.init_from_config(config)
        speaker_manager = SpeakerManager.init_from_config(config)
        return BaseTacotron(config, ap, tokenizer, speaker_manager)

    ##########################
    # TEST AND LOG FUNCTIONS #
    ##########################

    def test_run(self, assets: Dict) -> Tuple[Dict, Dict]:
        """Generic test run for `tts` models used by `Trainer`.

        You can override this for a different behaviour.

        Args:
            assets (dict): A dict of training assets. For `tts` models, it must include `{'audio_processor': ap}`.

        Returns:
            Tuple[Dict, Dict]: Test figures and audios to be projected to Tensorboard.
        """
        print(" | > Synthesizing test sentences.")
        test_audios = {}
        test_figures = {}
        test_sentences = self.config.test_sentences
        aux_inputs = self._get_test_aux_input()
        for idx, sen in enumerate(test_sentences):
            outputs_dict = synthesis(
                self,
                sen,
                self.config,
                "cuda" in str(next(self.parameters()).device),
                speaker_id=aux_inputs["speaker_id"],
                d_vector=aux_inputs["d_vector"],
                style_wav=aux_inputs["style_wav"],
                use_griffin_lim=True,
                do_trim_silence=False,
            )
            test_audios["{}-audio".format(idx)] = outputs_dict["wav"]
            test_figures["{}-prediction".format(idx)] = plot_spectrogram(
                outputs_dict["outputs"]["model_outputs"], self.ap, output_fig=False
            )
            test_figures["{}-alignment".format(idx)] = plot_alignment(
                outputs_dict["outputs"]["alignments"], output_fig=False
            )
        return {"figures": test_figures, "audios": test_audios}

    def test_log(
        self, outputs: dict, logger: "Logger", assets: dict, steps: int  # pylint: disable=unused-argument
    ) -> None:
        logger.test_audios(steps, outputs["audios"], self.ap.sample_rate)
        logger.test_figures(steps, outputs["figures"])

    #############################
    # COMMON COMPUTE FUNCTIONS
    #############################

    def compute_masks(self, text_lengths, mel_lengths):
        """Compute masks  against sequence paddings."""
        # B x T_in_max (boolean)
        input_mask = sequence_mask(text_lengths)
        output_mask = None
        if mel_lengths is not None:
            max_len = mel_lengths.max()
            r = self.decoder.r
            max_len = max_len + (r - (max_len % r)) if max_len % r > 0 else max_len
            output_mask = sequence_mask(mel_lengths, max_len=max_len)
        return input_mask, output_mask

    def _backward_pass(self, mel_specs, encoder_outputs, mask):
        """Run backwards decoder"""
        decoder_outputs_b, alignments_b, _ = self.decoder_backward(
            encoder_outputs, torch.flip(mel_specs, dims=(1,)), mask
        )
        decoder_outputs_b = decoder_outputs_b.transpose(1, 2).contiguous()
        return decoder_outputs_b, alignments_b

    def _coarse_decoder_pass(self, mel_specs, encoder_outputs, alignments, input_mask):
        """Double Decoder Consistency"""
        T = mel_specs.shape[1]
        if T % self.coarse_decoder.r > 0:
            padding_size = self.coarse_decoder.r - (T % self.coarse_decoder.r)
            mel_specs = torch.nn.functional.pad(mel_specs, (0, 0, 0, padding_size, 0, 0))
        decoder_outputs_backward, alignments_backward, _ = self.coarse_decoder(
            encoder_outputs.detach(), mel_specs, input_mask
        )
        # scale_factor = self.decoder.r_init / self.decoder.r
        alignments_backward = torch.nn.functional.interpolate(
            alignments_backward.transpose(1, 2),
            size=alignments.shape[1],
            mode="nearest",
        ).transpose(1, 2)
        decoder_outputs_backward = decoder_outputs_backward.transpose(1, 2)
        decoder_outputs_backward = decoder_outputs_backward[:, :T, :]
        return decoder_outputs_backward, alignments_backward

    #############################
    # EMBEDDING FUNCTIONS
    #############################

    def compute_gst(self, inputs, style_input, speaker_embedding=None):
        """Compute global style token"""
        if isinstance(style_input, dict):
            # multiply each style token with a weight
            query = torch.zeros(1, 1, self.gst.gst_embedding_dim // 2).type_as(inputs)
            if speaker_embedding is not None:
                query = torch.cat([query, speaker_embedding.reshape(1, 1, -1)], dim=-1)

            _GST = torch.tanh(self.gst_layer.style_token_layer.style_tokens)
            gst_outputs = torch.zeros(1, 1, self.gst.gst_embedding_dim).type_as(inputs)
            for k_token, v_amplifier in style_input.items():
                key = _GST[int(k_token)].unsqueeze(0).expand(1, -1, -1)
                gst_outputs_att = self.gst_layer.style_token_layer.attention(query, key)
                gst_outputs = gst_outputs + gst_outputs_att * v_amplifier
        elif style_input is None:
            # ignore style token and return zero tensor
            gst_outputs = torch.zeros(1, 1, self.gst.gst_embedding_dim).type_as(inputs)
        else:
            # compute style tokens
            gst_outputs = self.gst_layer(style_input, speaker_embedding)  # pylint: disable=not-callable
        inputs = self._concat_speaker_embedding(inputs, gst_outputs)
        return inputs

    def compute_capacitron_VAE_embedding(self, inputs, reference_mel_info, text_info=None, speaker_embedding=None):
        """Capacitron Variational Autoencoder"""
        (
            VAE_outputs,
            posterior_distribution,
            prior_distribution,
            capacitron_beta,
        ) = self.capacitron_vae_layer(
            reference_mel_info,
            text_info,
            speaker_embedding,  # pylint: disable=not-callable
        )

        VAE_outputs = VAE_outputs.to(inputs.device)
        encoder_output = self._concat_speaker_embedding(
            inputs, VAE_outputs
        )  # concatenate to the output of the basic tacotron encoder
        return (
            encoder_output,
            posterior_distribution,
            prior_distribution,
            capacitron_beta,
        )

    @staticmethod
    def _add_speaker_embedding(outputs, embedded_speakers):
        embedded_speakers_ = embedded_speakers.expand(outputs.size(0), outputs.size(1), -1)
        outputs = outputs + embedded_speakers_
        return outputs

    @staticmethod
    def _concat_speaker_embedding(outputs, embedded_speakers):
        embedded_speakers_ = embedded_speakers.expand(outputs.size(0), outputs.size(1), -1)
        outputs = torch.cat([outputs, embedded_speakers_], dim=-1)
        return outputs

    #############################
    # CALLBACKS
    #############################

    def on_epoch_start(self, trainer):
        """Callback for setting values wrt gradual training schedule.

        Args:
            trainer (TrainerTTS): TTS trainer object that is used to train this model.
        """
        if self.gradual_training:
            r, trainer.config.batch_size = gradual_training_scheduler(trainer.total_steps_done, trainer.config)
            trainer.config.r = r
            self.decoder.set_r(r)
            if trainer.config.bidirectional_decoder:
                trainer.model.decoder_backward.set_r(r)
            print(f"\n > Number of output frames: {self.decoder.r}")


================================================
FILE: TTS/tts/models/base_tts.py
================================================
import os
import random
from typing import Dict, List, Tuple, Union

import torch
import torch.distributed as dist
from coqpit import Coqpit
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.data.sampler import WeightedRandomSampler
from trainer.torch import DistributedSampler, DistributedSamplerWrapper

from TTS.model import BaseTrainerModel
from TTS.tts.datasets.dataset import TTSDataset
from TTS.tts.utils.data import get_length_balancer_weights
from TTS.tts.utils.languages import LanguageManager, get_language_balancer_weights
from TTS.tts.utils.speakers import SpeakerManager, get_speaker_balancer_weights, get_speaker_manager
from TTS.tts.utils.synthesis import synthesis
from TTS.tts.utils.visual import plot_alignment, plot_spectrogram

# pylint: skip-file


class BaseTTS(BaseTrainerModel):
    """Base `tts` class. Every new `tts` model must inherit this.

    It defines common `tts` specific functions on top of `Model` implementation.
    """

    MODEL_TYPE = "tts"

    def __init__(
        self,
        config: Coqpit,
        ap: "AudioProcessor",
        tokenizer: "TTSTokenizer",
        speaker_manager: SpeakerManager = None,
        language_manager: LanguageManager = None,
    ):
        super().__init__()
        self.config = config
        self.ap = ap
        self.tokenizer = tokenizer
        self.speaker_manager = speaker_manager
        self.language_manager = language_manager
        self._set_model_args(config)

    def _set_model_args(self, config: Coqpit):
        """Setup model args based on the config type (`ModelConfig` or `ModelArgs`).

        `ModelArgs` has all the fields reuqired to initialize the model architecture.

        `ModelConfig` has all the fields required for training, inference and containes `ModelArgs`.

        If the config is for training with a name like "*Config", then the model args are embeded in the
        config.model_args

        If the config is for the model with a name like "*Args", then we assign the directly.
        """
        # don't use isintance not to import recursively
        if "Config" in config.__class__.__name__:
            config_num_chars = (
                self.config.model_args.num_chars if hasattr(self.config, "model_args") else self.config.num_chars
            )
            num_chars = config_num_chars if self.tokenizer is None else self.tokenizer.characters.num_chars
            if "characters" in config:
                self.config.num_chars = num_chars
                if hasattr(self.config, "model_args"):
                    config.model_args.num_chars = num_chars
                    self.args = self.config.model_args
            else:
                self.config = config
                self.args = config.model_args
        elif "Args" in config.__class__.__name__:
            self.args = config
        else:
            raise ValueError("config must be either a *Config or *Args")

    def init_multispeaker(self, config: Coqpit, data: List = None):
        """Initialize a speaker embedding layer if needen and define expected embedding channel size for defining
        `in_channels` size of the connected layers.

        This implementation yields 3 possible outcomes:

        1. If `config.use_speaker_embedding` and `config.use_d_vector_file are False, do nothing.
        2. If `config.use_d_vector_file` is True, set expected embedding channel size to `config.d_vector_dim` or 512.
        3. If `config.use_speaker_embedding`, initialize a speaker embedding layer with channel size of
        `config.d_vector_dim` or 512.

        You can override this function for new models.

        Args:
            config (Coqpit): Model configuration.
        """
        # set number of speakers
        if self.speaker_manager is not None:
            self.num_speakers = self.speaker_manager.num_speakers
        elif hasattr(config, "num_speakers"):
            self.num_speakers = config.num_speakers

        # set ultimate speaker embedding size
        if config.use_speaker_embedding or config.use_d_vector_file:
            self.embedded_speaker_dim = (
                config.d_vector_dim if "d_vector_dim" in config and config.d_vector_dim is not None else 512
            )
        # init speaker embedding layer
        if config.use_speaker_embedding and not config.use_d_vector_file:
            print(" > Init speaker_embedding layer.")
            self.speaker_embedding = nn.Embedding(self.num_speakers, self.embedded_speaker_dim)
            self.speaker_embedding.weight.data.normal_(0, 0.3)

    def get_aux_input(self, **kwargs) -> Dict:
        """Prepare and return `aux_input` used by `forward()`"""
        return {"speaker_id": None, "style_wav": None, "d_vector": None, "language_id": None}

    def get_aux_input_from_test_sentences(self, sentence_info):
        if hasattr(self.config, "model_args"):
            config = self.config.model_args
        else:
            config = self.config

        # extract speaker and language info
        text, speaker_name, style_wav, language_name = None, None, None, None

        if isinstance(sentence_info, list):
            if len(sentence_info) == 1:
                text = sentence_info[0]
            elif len(sentence_info) == 2:
                text, speaker_name = sentence_info
            elif len(sentence_info) == 3:
                text, speaker_name, style_wav = sentence_info
            elif len(sentence_info) == 4:
                text, speaker_name, style_wav, language_name = sentence_info
        else:
            text = sentence_info

        # get speaker  id/d_vector
        speaker_id, d_vector, language_id = None, None, None
        if self.speaker_manager is not None:
            if config.use_d_vector_file:
                if speaker_name is None:
                    d_vector = self.speaker_manager.get_random_embedding()
                else:
                    d_vector = self.speaker_manager.get_d_vector_by_name(speaker_name)
            elif config.use_speaker_embedding:
                if speaker_name is None:
                    speaker_id = self.speaker_manager.get_random_id()
                else:
                    speaker_id = self.speaker_manager.name_to_id[speaker_name]

        # get language id
        if self.language_manager is not None and config.use_language_embedding and language_name is not None:
            language_id = self.language_manager.name_to_id[language_name]

        return {
            "text": text,
            "speaker_id": speaker_id,
            "style_wav": style_wav,
            "d_vector": d_vector,
            "language_id": language_id,
        }

    def format_batch(self, batch: Dict) -> Dict:
        """Generic batch formatting for `TTSDataset`.

        You must override this if you use a custom dataset.

        Args:
            batch (Dict): [description]

        Returns:
            Dict: [description]
        """
        # setup input batch
        text_input = batch["token_id"]
        text_lengths = batch["token_id_lengths"]
        speaker_names = batch["speaker_names"]
        linear_input = batch["linear"]
        mel_input = batch["mel"]
        mel_lengths = batch["mel_lengths"]
        stop_targets = batch["stop_targets"]
        item_idx = batch["item_idxs"]
        d_vectors = batch["d_vectors"]
        speaker_ids = batch["speaker_ids"]
        attn_mask = batch["attns"]
        waveform = batch["waveform"]
        pitch = batch["pitch"]
        energy = batch["energy"]
        language_ids = batch["language_ids"]
        max_text_length = torch.max(text_lengths.float())
        max_spec_length = torch.max(mel_lengths.float())

        # compute durations from attention masks
        durations = None
        if attn_mask is not None:
            durations = torch.zeros(attn_mask.shape[0], attn_mask.shape[2])
            for idx, am in enumerate(attn_mask):
                # compute raw durations
                c_idxs = am[:, : text_lengths[idx], : mel_lengths[idx]].max(1)[1]
                # c_idxs, counts = torch.unique_consecutive(c_idxs, return_counts=True)
                c_idxs, counts = torch.unique(c_idxs, return_counts=True)
                dur = torch.ones([text_lengths[idx]]).to(counts.dtype)
                dur[c_idxs] = counts
                # smooth the durations and set any 0 duration to 1
                # by cutting off from the largest duration indeces.
                extra_frames = dur.sum() - mel_lengths[idx]
                largest_idxs = torch.argsort(-dur)[:extra_frames]
                dur[largest_idxs] -= 1
                assert (
                    dur.sum() == mel_lengths[idx]
                ), f" [!] total duration {dur.sum()} vs spectrogram length {mel_lengths[idx]}"
                durations[idx, : text_lengths[idx]] = dur

        # set stop targets wrt reduction factor
        stop_targets = stop_targets.view(text_input.shape[0], stop_targets.size(1) // self.config.r, -1)
        stop_targets = (stop_targets.sum(2) > 0.0).unsqueeze(2).float().squeeze(2)
        stop_target_lengths = torch.divide(mel_lengths, self.config.r).ceil_()

        return {
            "text_input": text_input,
            "text_lengths": text_lengths,
            "speaker_names": speaker_names,
            "mel_input": mel_input,
            "mel_lengths": mel_lengths,
            "linear_input": linear_input,
            "stop_targets": stop_targets,
            "stop_target_lengths": stop_target_lengths,
            "attn_mask": attn_mask,
            "durations": durations,
            "speaker_ids": speaker_ids,
            "d_vectors": d_vectors,
            "max_text_length": float(max_text_length),
            "max_spec_length": float(max_spec_length),
            "item_idx": item_idx,
            "waveform": waveform,
            "pitch": pitch,
            "energy": energy,
            "language_ids": language_ids,
            "audio_unique_names": batch["audio_unique_names"],
        }

    def get_sampler(self, config: Coqpit, dataset: TTSDataset, num_gpus=1):
        weights = None
        data_items = dataset.samples

        if getattr(config, "use_language_weighted_sampler", False):
            alpha = getattr(config, "language_weighted_sampler_alpha", 1.0)
            print(" > Using Language weighted sampler with alpha:", alpha)
            weights = get_language_balancer_weights(data_items) * alpha

        if getattr(config, "use_speaker_weighted_sampler", False):
            alpha = getattr(config, "speaker_weighted_sampler_alpha", 1.0)
            print(" > Using Speaker weighted sampler with alpha:", alpha)
            if weights is not None:
                weights += get_speaker_balancer_weights(data_items) * alpha
            else:
                weights = get_speaker_balancer_weights(data_items) * alpha

        if getattr(config, "use_length_weighted_sampler", False):
            alpha = getattr(config, "length_weighted_sampler_alpha", 1.0)
            print(" > Using Length weighted sampler with alpha:", alpha)
            if weights is not None:
                weights += get_length_balancer_weights(data_items) * alpha
            else:
                weights = get_length_balancer_weights(data_items) * alpha

        if weights is not None:
            sampler = WeightedRandomSampler(weights, len(weights))
        else:
            sampler = None

        # sampler for DDP
        if sampler is None:
            sampler = DistributedSampler(dataset) if num_gpus > 1 else None
        else:  # If a sampler is already defined use this sampler and DDP sampler together
            sampler = DistributedSamplerWrapper(sampler) if num_gpus > 1 else sampler

        return sampler

    def get_data_loader(
        self,
        config: Coqpit,
        assets: Dict,
        is_eval: bool,
        samples: Union[List[Dict], List[List]],
        verbose: bool,
        num_gpus: int,
        rank: int = None,
    ) -> "DataLoader":
        if is_eval and not config.run_eval:
            loader = None
        else:
            # setup multi-speaker attributes
            if self.speaker_manager is not None:
                if hasattr(config, "model_args"):
                    speaker_id_mapping = (
                        self.speaker_manager.name_to_id if config.model_args.use_speaker_embedding else None
                    )
                    d_vector_mapping = self.speaker_manager.embeddings if config.model_args.use_d_vector_file else None
                    config.use_d_vector_file = config.model_args.use_d_vector_file
                else:
                    speaker_id_mapping = self.speaker_manager.name_to_id if config.use_speaker_embedding else None
                    d_vector_mapping = self.speaker_manager.embeddings if config.use_d_vector_file else None
            else:
                speaker_id_mapping = None
                d_vector_mapping = None

            # setup multi-lingual attributes
            if self.language_manager is not None:
                language_id_mapping = self.language_manager.name_to_id if self.args.use_language_embedding else None
            else:
                language_id_mapping = None

            # init dataloader
            dataset = TTSDataset(
                outputs_per_step=config.r if "r" in config else 1,
                compute_linear_spec=config.model.lower() == "tacotron" or config.compute_linear_spec,
                compute_f0=config.get("compute_f0", False),
                f0_cache_path=config.get("f0_cache_path", None),
                compute_energy=config.get("compute_energy", False),
                energy_cache_path=config.get("energy_cache_path", None),
                samples=samples,
                ap=self.ap,
                return_wav=config.return_wav if "return_wav" in config else False,
                batch_group_size=0 if is_eval else config.batch_group_size * config.batch_size,
                min_text_len=config.min_text_len,
                max_text_len=config.max_text_len,
                min_audio_len=config.min_audio_len,
                max_audio_len=config.max_audio_len,
                phoneme_cache_path=config.phoneme_cache_path,
                precompute_num_workers=config.precompute_num_workers,
                use_noise_augment=False if is_eval else config.use_noise_augment,
                verbose=verbose,
                speaker_id_mapping=speaker_id_mapping,
                d_vector_mapping=d_vector_mapping if config.use_d_vector_file else None,
                tokenizer=self.tokenizer,
                start_by_longest=config.start_by_longest,
                language_id_mapping=language_id_mapping,
            )

            # wait all the DDP process to be ready
            if num_gpus > 1:
                dist.barrier()

            # sort input sequences from short to long
            dataset.preprocess_samples()

            # get samplers
            sampler = self.get_sampler(config, dataset, num_gpus)

            loader = DataLoader(
                dataset,
                batch_size=config.eval_batch_size if is_eval else config.batch_size,
                shuffle=config.shuffle if sampler is None else False,  # if there is no other sampler
                collate_fn=dataset.collate_fn,
                drop_last=config.drop_last,  # setting this False might cause issues in AMP training.
                sampler=sampler,
                num_workers=config.num_eval_loader_workers if is_eval else config.num_loader_workers,
                pin_memory=False,
            )
        return loader

    def _get_test_aux_input(
        self,
    ) -> Dict:
        d_vector = None
        if self.config.use_d_vector_file:
            d_vector = [self.speaker_manager.embeddings[name]["embedding"] for name in self.speaker_manager.embeddings]
            d_vector = (random.sample(sorted(d_vector), 1),)

        aux_inputs = {
            "speaker_id": None
            if not self.config.use_speaker_embedding
            else random.sample(sorted(self.speaker_manager.name_to_id.values()), 1),
            "d_vector": d_vector,
            "style_wav": None,  # TODO: handle GST style input
        }
        return aux_inputs

    def test_run(self, assets: Dict) -> Tuple[Dict, Dict]:
        """Generic test run for `tts` models used by `Trainer`.

        You can override this for a different behaviour.

        Args:
            assets (dict): A dict of training assets. For `tts` models, it must include `{'audio_processor': ap}`.

        Returns:
            Tuple[Dict, Dict]: Test figures and audios to be projected to Tensorboard.
        """
        print(" | > Synthesizing test sentences.")
        test_audios = {}
        test_figures = {}
        test_sentences = self.config.test_sentences
        aux_inputs = self._get_test_aux_input()
        for idx, sen in enumerate(test_sentences):
            if isinstance(sen, list):
                aux_inputs = self.get_aux_input_from_test_sentences(sen)
                sen = aux_inputs["text"]
            outputs_dict = synthesis(
                self,
                sen,
                self.config,
                "cuda" in str(next(self.parameters()).device),
                speaker_id=aux_inputs["speaker_id"],
                d_vector=aux_inputs["d_vector"],
                style_wav=aux_inputs["style_wav"],
                use_griffin_lim=True,
                do_trim_silence=False,
            )
            test_audios["{}-audio".format(idx)] = outputs_dict["wav"]
            test_figures["{}-prediction".format(idx)] = plot_spectrogram(
                outputs_dict["outputs"]["model_outputs"], self.ap, output_fig=False
            )
            test_figures["{}-alignment".format(idx)] = plot_alignment(
                outputs_dict["outputs"]["alignments"], output_fig=False
            )
        return test_figures, test_audios

    def on_init_start(self, trainer):
        """Save the speaker.pth and language_ids.json at the beginning of the training. Also update both paths."""
        if self.speaker_manager is not None:
            output_path = os.path.join(trainer.output_path, "speakers.pth")
            self.speaker_manager.save_ids_to_file(output_path)
            trainer.config.speakers_file = output_path
            # some models don't have `model_args` set
            if hasattr(trainer.config, "model_args"):
                trainer.config.model_args.speakers_file = output_path
            trainer.config.save_json(os.path.join(trainer.output_path, "config.json"))
            print(f" > `speakers.pth` is saved to {output_path}.")
            print(" > `speakers_file` is updated in the config.json.")

        if self.language_manager is not None:
            output_path = os.path.join(trainer.output_path, "language_ids.json")
            self.language_manager.save_ids_to_file(output_path)
            trainer.config.language_ids_file = output_path
            if hasattr(trainer.config, "model_args"):
                trainer.config.model_args.language_ids_file = output_path
            trainer.config.save_json(os.path.join(trainer.output_path, "config.json"))
            print(f" > `language_ids.json` is saved to {output_path}.")
            print(" > `language_ids_file` is updated in the config.json.")


================================================
FILE: TTS/tts/models/forward_tts.py
================================================
from dataclasses import dataclass, field
from typing import Dict, List, Tuple, Union

import torch
from coqpit import Coqpit
from torch import nn
from torch.cuda.amp.autocast_mode import autocast

from TTS.tts.layers.feed_forward.decoder import Decoder
from TTS.tts.layers.feed_forward.encoder import Encoder
from TTS.tts.layers.generic.aligner import AlignmentNetwork
from TTS.tts.layers.generic.pos_encoding import PositionalEncoding
from TTS.tts.layers.glow_tts.duration_predictor import DurationPredictor
from TTS.tts.models.base_tts import BaseTTS
from TTS.tts.utils.helpers import average_over_durations, generate_path, maximum_path, sequence_mask
from TTS.tts.utils.speakers import SpeakerManager
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.tts.utils.visual import plot_alignment, plot_avg_energy, plot_avg_pitch, plot_spectrogram
from TTS.utils.io import load_fsspec


@dataclass
class ForwardTTSArgs(Coqpit):
    """ForwardTTS Model arguments.

    Args:

        num_chars (int):
            Number of characters in the vocabulary. Defaults to 100.

        out_channels (int):
            Number of output channels. Defaults to 80.

        hidden_channels (int):
            Number of base hidden channels of the model. Defaults to 512.

        use_aligner (bool):
            Whether to use aligner network to learn the text to speech alignment or use pre-computed durations.
            If set False, durations should be computed by `TTS/bin/compute_attention_masks.py` and path to the
            pre-computed durations must be provided to `config.datasets[0].meta_file_attn_mask`. Defaults to True.

        use_pitch (bool):
            Use pitch predictor to learn the pitch. Defaults to True.

        use_energy (bool):
            Use energy predictor to learn the energy. Defaults to True.

        duration_predictor_hidden_channels (int):
            Number of hidden channels in the duration predictor. Defaults to 256.

        duration_predictor_dropout_p (float):
            Dropout rate for the duration predictor. Defaults to 0.1.

        duration_predictor_kernel_size (int):
            Kernel size of conv layers in the duration predictor. Defaults to 3.

        pitch_predictor_hidden_channels (int):
            Number of hidden channels in the pitch predictor. Defaults to 256.

        pitch_predictor_dropout_p (float):
            Dropout rate for the pitch predictor. Defaults to 0.1.

        pitch_predictor_kernel_size (int):
            Kernel size of conv layers in the pitch predictor. Defaults to 3.

        pitch_embedding_kernel_size (int):
            Kernel size of the projection layer in the pitch predictor. Defaults to 3.

        energy_predictor_hidden_channels (int):
            Number of hidden channels in the energy predictor. Defaults to 256.

        energy_predictor_dropout_p (float):
            Dropout rate for the energy predictor. Defaults to 0.1.

        energy_predictor_kernel_size (int):
            Kernel size of conv layers in the energy predictor. Defaults to 3.

        energy_embedding_kernel_size (int):
            Kernel size of the projection layer in the energy predictor. Defaults to 3.

        positional_encoding (bool):
            Whether to use positional encoding. Defaults to True.

        positional_encoding_use_scale (bool):
            Whether to use a learnable scale coeff in the positional encoding. Defaults to True.

        length_scale (int):
            Length scale that multiplies the predicted durations. Larger values result slower speech. Defaults to 1.0.

        encoder_type (str):
            Type of the encoder module. One of the encoders available in :class:`TTS.tts.layers.feed_forward.encoder`.
            Defaults to `fftransformer` as in the paper.

        encoder_params (dict):
            Parameters of the encoder module. Defaults to ```{"hidden_channels_ffn": 1024, "num_heads": 1, "num_layers": 6, "dropout_p": 0.1}```

        decoder_type (str):
            Type of the decoder module. One of the decoders available in :class:`TTS.tts.layers.feed_forward.decoder`.
            Defaults to `fftransformer` as in the paper.

        decoder_params (str):
            Parameters of the decoder module. Defaults to ```{"hidden_channels_ffn": 1024, "num_heads": 1, "num_layers": 6, "dropout_p": 0.1}```

        detach_duration_predictor (bool):
            Detach the input to the duration predictor from the earlier computation graph so that the duraiton loss
            does not pass to the earlier layers. Defaults to True.

        max_duration (int):
            Maximum duration accepted by the model. Defaults to 75.

        num_speakers (int):
            Number of speakers for the speaker embedding layer. Defaults to 0.

        speakers_file (str):
            Path to the speaker mapping file for the Speaker Manager. Defaults to None.

        speaker_embedding_channels (int):
            Number of speaker embedding channels. Defaults to 256.

        use_d_vector_file (bool):
            Enable/Disable the use of d-vectors for multi-speaker training. Defaults to False.

        d_vector_dim (int):
            Number of d-vector channels. Defaults to 0.

    """

    num_chars: int = None
    out_channels: int = 80
    hidden_channels: int = 384
    use_aligner: bool = True
    # pitch params
    use_pitch: bool = True
    pitch_predictor_hidden_channels: int = 256
    pitch_predictor_kernel_size: int = 3
    pitch_predictor_dropout_p: float = 0.1
    pitch_embedding_kernel_size: int = 3

    # energy params
    use_energy: bool = False
    energy_predictor_hidden_channels: int = 256
    energy_predictor_kernel_size: int = 3
    energy_predictor_dropout_p: float = 0.1
    energy_embedding_kernel_size: int = 3

    # duration params
    duration_predictor_hidden_channels: int = 256
    duration_predictor_kernel_size: int = 3
    duration_predictor_dropout_p: float = 0.1

    positional_encoding: bool = True
    poisitonal_encoding_use_scale: bool = True
    length_scale: int = 1
    encoder_type: str = "fftransformer"
    encoder_params: dict = field(
        default_factory=lambda: {"hidden_channels_ffn": 1024, "num_heads": 1, "num_layers": 6, "dropout_p": 0.1}
    )
    decoder_type: str = "fftransformer"
    decoder_params: dict = field(
        default_factory=lambda: {"hidden_channels_ffn": 1024, "num_heads": 1, "num_layers": 6, "dropout_p": 0.1}
    )
    detach_duration_predictor: bool = False
    max_duration: int = 75
    num_speakers: int = 1
    use_speaker_embedding: bool = False
    speakers_file: str = None
    use_d_vector_file: bool = False
    d_vector_dim: int = None
    d_vector_file: str = None


class ForwardTTS(BaseTTS):
    """General forward TTS model implementation that uses an encoder-decoder architecture with an optional alignment
    network and a pitch predictor.

    If the alignment network is used, the model learns the text-to-speech alignment
    from the data instead of using pre-computed durations.

    If the pitch predictor is used, the model trains a pitch predictor that predicts average pitch value for each
    input character as in the FastPitch model.

    `ForwardTTS` can be configured to one of these architectures,

        - FastPitch
        - SpeedySpeech
        - FastSpeech
        - FastSpeech2 (requires average speech energy predictor)

    Args:
        config (Coqpit): Model coqpit class.
        speaker_manager (SpeakerManager): Speaker manager for multi-speaker training. Only used for multi-speaker models.
            Defaults to None.

    Examples:
        >>> from TTS.tts.models.fast_pitch import ForwardTTS, ForwardTTSArgs
        >>> config = ForwardTTSArgs()
        >>> model = ForwardTTS(config)
    """

    # pylint: disable=dangerous-default-value
    def __init__(
        self,
        config: Coqpit,
        ap: "AudioProcessor" = None,
        tokenizer: "TTSTokenizer" = None,
        speaker_manager: SpeakerManager = None,
    ):
        super().__init__(config, ap, tokenizer, speaker_manager)
        self._set_model_args(config)

        self.init_multispeaker(config)

        self.max_duration = self.args.max_duration
        self.use_aligner = self.args.use_aligner
        self.use_pitch = self.args.use_pitch
        self.use_energy = self.args.use_energy
        self.binary_loss_weight = 0.0

        self.length_scale = (
            float(self.args.length_scale) if isinstance(self.args.length_scale, int) else self.args.length_scale
        )

        self.emb = nn.Embedding(self.args.num_chars, self.args.hidden_channels)

        self.encoder = Encoder(
            self.args.hidden_channels,
            self.args.hidden_channels,
            self.args.encoder_type,
            self.args.encoder_params,
            self.embedded_speaker_dim,
        )

        if self.args.positional_encoding:
            self.pos_encoder = PositionalEncoding(self.args.hidden_channels)

        self.decoder = Decoder(
            self.args.out_channels,
            self.args.hidden_channels,
            self.args.decoder_type,
            self.args.decoder_params,
        )

        self.duration_predictor = DurationPredictor(
            self.args.hidden_channels + self.embedded_speaker_dim,
            self.args.duration_predictor_hidden_channels,
            self.args.duration_predictor_kernel_size,
            self.args.duration_predictor_dropout_p,
        )

        if self.args.use_pitch:
            self.pitch_predictor = DurationPredictor(
                self.args.hidden_channels + self.embedded_speaker_dim,
                self.args.pitch_predictor_hidden_channels,
                self.args.pitch_predictor_kernel_size,
                self.args.pitch_predictor_dropout_p,
            )
            self.pitch_emb = nn.Conv1d(
                1,
                self.args.hidden_channels,
                kernel_size=self.args.pitch_embedding_kernel_size,
                padding=int((self.args.pitch_embedding_kernel_size - 1) / 2),
            )

        if self.args.use_energy:
            self.energy_predictor = DurationPredictor(
                self.args.hidden_channels + self.embedded_speaker_dim,
                self.args.energy_predictor_hidden_channels,
                self.args.energy_predictor_kernel_size,
                self.args.energy_predictor_dropout_p,
            )
            self.energy_emb = nn.Conv1d(
                1,
                self.args.hidden_channels,
                kernel_size=self.args.energy_embedding_kernel_size,
                padding=int((self.args.energy_embedding_kernel_size - 1) / 2),
            )

        if self.args.use_aligner:
            self.aligner = AlignmentNetwork(
                in_query_channels=self.args.out_channels, in_key_channels=self.args.hidden_channels
            )

    def init_multispeaker(self, config: Coqpit):
        """Init for multi-speaker training.

        Args:
            config (Coqpit): Model configuration.
        """
        self.embedded_speaker_dim = 0
        # init speaker manager
        if self.speaker_manager is None and (config.use_d_vector_file or config.use_speaker_embedding):
            raise ValueError(
                " > SpeakerManager is not provided. You must provide the SpeakerManager before initializing a multi-speaker model."
            )
        # set number of speakers
        if self.speaker_manager is not None:
            self.num_speakers = self.speaker_manager.num_speakers
        # init d-vector embedding
        if config.use_d_vector_file:
            self.embedded_speaker_dim = config.d_vector_dim
            if self.args.d_vector_dim != self.args.hidden_channels:
                self.proj_g = nn.Conv1d(self.args.d_vector_dim, self.args.hidden_channels, 1)
        # init speaker embedding layer
        if config.use_speaker_embedding and not config.use_d_vector_file:
            print(" > Init speaker_embedding layer.")
            self.emb_g = nn.Embedding(self.num_speakers, self.args.hidden_channels)
            nn.init.uniform_(self.emb_g.weight, -0.1, 0.1)

    @staticmethod
    def generate_attn(dr, x_mask, y_mask=None):
        """Generate an attention mask from the durations.

        Shapes
           - dr: :math:`(B, T_{en})`
           - x_mask: :math:`(B, T_{en})`
           - y_mask: :math:`(B, T_{de})`
        """
        # compute decode mask from the durations
        if y_mask is None:
            y_lengths = dr.sum(1).long()
            y_lengths[y_lengths < 1] = 1
            y_mask = torch.unsqueeze(sequence_mask(y_lengths, None), 1).to(dr.dtype)
        attn_mask = torch.unsqueeze(x_mask, -1) * torch.unsqueeze(y_mask, 2)
        attn = generate_path(dr, attn_mask.squeeze(1)).to(dr.dtype)
        return attn

    def expand_encoder_outputs(self, en, dr, x_mask, y_mask):
        """Generate attention alignment map from durations and
        expand encoder outputs

        Shapes:
            - en: :math:`(B, D_{en}, T_{en})`
            - dr: :math:`(B, T_{en})`
            - x_mask: :math:`(B, T_{en})`
            - y_mask: :math:`(B, T_{de})`

        Examples::

            encoder output: [a,b,c,d]
            durations: [1, 3, 2, 1]

            expanded: [a, b, b, b, c, c, d]
            attention map: [[0, 0, 0, 0, 0, 0, 1],
                            [0, 0, 0, 0, 1, 1, 0],
                            [0, 1, 1, 1, 0, 0, 0],
                            [1, 0, 0, 0, 0, 0, 0]]
        """
        attn = self.generate_attn(dr, x_mask, y_mask)
        o_en_ex = torch.matmul(attn.squeeze(1).transpose(1, 2).to(en.dtype), en.transpose(1, 2)).transpose(1, 2)
        return o_en_ex, attn

    def format_durations(self, o_dr_log, x_mask):
        """Format predicted durations.
        1. Convert to linear scale from log scale
        2. Apply the length scale for speed adjustment
        3. Apply masking.
        4. Cast 0 durations to 1.
        5. Round the duration values.

        Args:
            o_dr_log: Log scale durations.
            x_mask: Input text mask.

        Shapes:
            - o_dr_log: :math:`(B, T_{de})`
            - x_mask: :math:`(B, T_{en})`
        """
        o_dr = (torch.exp(o_dr_log) - 1) * x_mask * self.length_scale
        o_dr[o_dr < 1] = 1.0
        o_dr = torch.round(o_dr)
        return o_dr

    def _forward_encoder(
        self, x: torch.LongTensor, x_mask: torch.FloatTensor, g: torch.FloatTensor = None
    ) -> Tuple[torch.FloatTensor, torch.FloatTensor, torch.FloatTensor, torch.FloatTensor, torch.FloatTensor]:
        """Encoding forward pass.

        1. Embed speaker IDs if multi-speaker mode.
        2. Embed character sequences.
        3. Run the encoder network.
        4. Sum encoder outputs and speaker embeddings

        Args:
            x (torch.LongTensor): Input sequence IDs.
            x_mask (torch.FloatTensor): Input squence mask.
            g (torch.FloatTensor, optional): Conditioning vectors. In general speaker embeddings. Defaults to None.

        Returns:
            Tuple[torch.tensor, torch.tensor, torch.tensor, torch.tensor, torch.tensor]:
                encoder output, encoder output for the duration predictor, input sequence mask, speaker embeddings,
                character embeddings

        Shapes:
            - x: :math:`(B, T_{en})`
            - x_mask: :math:`(B, 1, T_{en})`
            - g: :math:`(B, C)`
        """
        if hasattr(self, "emb_g"):
            g = self.emb_g(g)  # [B, C, 1]
        if g is not None:
            g = g.unsqueeze(-1)
        # [B, T, C]
        x_emb = self.emb(x)
        # encoder pass
        o_en = self.encoder(torch.transpose(x_emb, 1, -1), x_mask)
        # speaker conditioning
        # TODO: try different ways of conditioning
        if g is not None:
            o_en = o_en + g
        return o_en, x_mask, g, x_emb

    def _forward_decoder(
        self,
        o_en: torch.FloatTensor,
        dr: torch.IntTensor,
        x_mask: torch.FloatTensor,
        y_lengths: torch.IntTensor,
        g: torch.FloatTensor,
    ) -> Tuple[torch.FloatTensor, torch.FloatTensor]:
        """Decoding forward pass.

        1. Compute the decoder output mask
        2. Expand encoder output with the durations.
        3. Apply position encoding.
        4. Add speaker embeddings if multi-speaker mode.
        5. Run the decoder.

        Args:
            o_en (torch.FloatTensor): Encoder output.
            dr (torch.IntTensor): Ground truth durations or alignment network durations.
            x_mask (torch.IntTensor): Input sequence mask.
            y_lengths (torch.IntTensor): Output sequence lengths.
            g (torch.FloatTensor): Conditioning vectors. In general speaker embeddings.

        Returns:
            Tuple[torch.FloatTensor, torch.FloatTensor]: Decoder output, attention map from durations.
        """
        y_mask = torch.unsqueeze(sequence_mask(y_lengths, None), 1).to(o_en.dtype)
        # expand o_en with durations
        o_en_ex, attn = self.expand_encoder_outputs(o_en, dr, x_mask, y_mask)
        # positional encoding
        if hasattr(self, "pos_encoder"):
            o_en_ex = self.pos_encoder(o_en_ex, y_mask)
        # decoder pass
        o_de = self.decoder(o_en_ex, y_mask, g=g)
        return o_de.transpose(1, 2), attn.transpose(1, 2)

    def _forward_pitch_predictor(
        self,
        o_en: torch.FloatTensor,
        x_mask: torch.IntTensor,
        pitch: torch.FloatTensor = None,
        dr: torch.IntTensor = None,
    ) -> Tuple[torch.FloatTensor, torch.FloatTensor]:
        """Pitch predictor forward pass.

        1. Predict pitch from encoder outputs.
        2. In training - Compute average pitch values for each input character from the ground truth pitch values.
        3. Embed average pitch values.

        Args:
            o_en (torch.FloatTensor): Encoder output.
            x_mask (torch.IntTensor): Input sequence mask.
            pitch (torch.FloatTensor, optional): Ground truth pitch values. Defaults to None.
            dr (torch.IntTensor, optional): Ground truth durations. Defaults to None.

        Returns:
            Tuple[torch.FloatTensor, torch.FloatTensor]: Pitch embedding, pitch prediction.

        Shapes:
            - o_en: :math:`(B, C, T_{en})`
            - x_mask: :math:`(B, 1, T_{en})`
            - pitch: :math:`(B, 1, T_{de})`
            - dr: :math:`(B, T_{en})`
        """
        o_pitch = self.pitch_predictor(o_en, x_mask)
        if pitch is not None:
            avg_pitch = average_over_durations(pitch, dr)
            o_pitch_emb = self.pitch_emb(avg_pitch)
            return o_pitch_emb, o_pitch, avg_pitch
        o_pitch_emb = self.pitch_emb(o_pitch)
        return o_pitch_emb, o_pitch

    def _forward_energy_predictor(
        self,
        o_en: torch.FloatTensor,
        x_mask: torch.IntTensor,
        energy: torch.FloatTensor = None,
        dr: torch.IntTensor = None,
    ) -> Tuple[torch.FloatTensor, torch.FloatTensor]:
        """Energy predictor forward pass.

        1. Predict energy from encoder outputs.
        2. In training - Compute average pitch values for each input character from the ground truth pitch values.
        3. Embed average energy values.

        Args:
            o_en (torch.FloatTensor): Encoder output.
            x_mask (torch.IntTensor): Input sequence mask.
            energy (torch.FloatTensor, optional): Ground truth energy values. Defaults to None.
            dr (torch.IntTensor, optional): Ground truth durations. Defaults to None.

        Returns:
            Tuple[torch.FloatTensor, torch.FloatTensor]: Energy embedding, energy prediction.

        Shapes:
            - o_en: :math:`(B, C, T_{en})`
            - x_mask: :math:`(B, 1, T_{en})`
            - pitch: :math:`(B, 1, T_{de})`
            - dr: :math:`(B, T_{en})`
        """
        o_energy = self.energy_predictor(o_en, x_mask)
        if energy is not None:
            avg_energy = average_over_durations(energy, dr)
            o_energy_emb = self.energy_emb(avg_energy)
            return o_energy_emb, o_energy, avg_energy
        o_energy_emb = self.energy_emb(o_energy)
        return o_energy_emb, o_energy

    def _forward_aligner(
        self, x: torch.FloatTensor, y: torch.FloatTensor, x_mask: torch.IntTensor, y_mask: torch.IntTensor
    ) -> Tuple[torch.IntTensor, torch.FloatTensor, torch.FloatTensor, torch.FloatTensor]:
        """Aligner forward pass.

        1. Compute a mask to apply to the attention map.
        2. Run the alignment network.
        3. Apply MAS to compute the hard alignment map.
        4. Compute the durations from the hard alignment map.

        Args:
            x (torch.FloatTensor): Input sequence.
            y (torch.FloatTensor): Output sequence.
            x_mask (torch.IntTensor): Input sequence mask.
            y_mask (torch.IntTensor): Output sequence mask.

        Returns:
            Tuple[torch.IntTensor, torch.FloatTensor, torch.FloatTensor, torch.FloatTensor]:
                Durations from the hard alignment map, soft alignment potentials, log scale alignment potentials,
                hard alignment map.

        Shapes:
            - x: :math:`[B, T_en, C_en]`
            - y: :math:`[B, T_de, C_de]`
            - x_mask: :math:`[B, 1, T_en]`
            - y_mask: :math:`[B, 1, T_de]`

            - o_alignment_dur: :math:`[B, T_en]`
            - alignment_soft: :math:`[B, T_en, T_de]`
            - alignment_logprob: :math:`[B, 1, T_de, T_en]`
            - alignment_mas: :math:`[B, T_en, T_de]`
        """
        attn_mask = torch.unsqueeze(x_mask, -1) * torch.unsqueeze(y_mask, 2)
        alignment_soft, alignment_logprob = self.aligner(y.transpose(1, 2), x.transpose(1, 2), x_mask, None)
        alignment_mas = maximum_path(
            alignment_soft.squeeze(1).transpose(1, 2).contiguous(), attn_mask.squeeze(1).contiguous()
        )
        o_alignment_dur = torch.sum(alignment_mas, -1).int()
        alignment_soft = alignment_soft.squeeze(1).transpose(1, 2)
        return o_alignment_dur, alignment_soft, alignment_logprob, alignment_mas

    def _set_speaker_input(self, aux_input: Dict):
        d_vectors = aux_input.get("d_vectors", None)
        speaker_ids = aux_input.get("speaker_ids", None)

        if d_vectors is not None and speaker_ids is not None:
            raise ValueError("[!] Cannot use d-vectors and speaker-ids together.")

        if speaker_ids is not None and not hasattr(self, "emb_g"):
            raise ValueError("[!] Cannot use speaker-ids without enabling speaker embedding.")

        g = speaker_ids if speaker_ids is not None else d_vectors
        return g

    def forward(
        self,
        x: torch.LongTensor,
        x_lengths: torch.LongTensor,
        y_lengths: torch.LongTensor,
        y: torch.FloatTensor = None,
        dr: torch.IntTensor = None,
        pitch: torch.FloatTensor = None,
        energy: torch.FloatTensor = None,
        aux_input: Dict = {"d_vectors": None, "speaker_ids": None},  # pylint: disable=unused-argument
    ) -> Dict:
        """Model's forward pass.

        Args:
            x (torch.LongTensor): Input character sequences.
            x_lengths (torch.LongTensor): Input sequence lengths.
            y_lengths (torch.LongTensor): Output sequnce lengths. Defaults to None.
            y (torch.FloatTensor): Spectrogram frames. Only used when the alignment network is on. Defaults to None.
            dr (torch.IntTensor): Character durations over the spectrogram frames. Only used when the alignment network is off. Defaults to None.
            pitch (torch.FloatTensor): Pitch values for each spectrogram frame. Only used when the pitch predictor is on. Defaults to None.
            energy (torch.FloatTensor): energy values for each spectrogram frame. Only used when the energy predictor is on. Defaults to None.
            aux_input (Dict): Auxiliary model inputs for multi-speaker training. Defaults to `{"d_vectors": 0, "speaker_ids": None}`.

        Shapes:
            - x: :math:`[B, T_max]`
            - x_lengths: :math:`[B]`
            - y_lengths: :math:`[B]`
            - y: :math:`[B, T_max2]`
            - dr: :math:`[B, T_max]`
            - g: :math:`[B, C]`
            - pitch: :math:`[B, 1, T]`
        """
        g = self._set_speaker_input(aux_input)
        # compute sequence masks
        y_mask = torch.unsqueeze(sequence_mask(y_lengths, None), 1).float()
        x_mask = torch.unsqueeze(sequence_mask(x_lengths, x.shape[1]), 1).float()
        # encoder pass
        o_en, x_mask, g, x_emb = self._forward_encoder(x, x_mask, g)
        # duration predictor pass
        if self.args.detach_duration_predictor:
            o_dr_log = self.duration_predictor(o_en.detach(), x_mask)
        else:
            o_dr_log = self.duration_predictor(o_en, x_mask)
        o_dr = torch.clamp(torch.exp(o_dr_log) - 1, 0, self.max_duration)
        # generate attn mask from predicted durations
        o_attn = self.generate_attn(o_dr.squeeze(1), x_mask)
        # aligner
        o_alignment_dur = None
        alignment_soft = None
        alignment_logprob = None
        alignment_mas = None
        if self.use_aligner:
            o_alignment_dur, alignment_soft, alignment_logprob, alignment_mas = self._forward_aligner(
                x_emb, y, x_mask, y_mask
            )
            alignment_soft = alignment_soft.transpose(1, 2)
            alignment_mas = alignment_mas.transpose(1, 2)
            dr = o_alignment_dur
        # pitch predictor pass
        o_pitch = None
        avg_pitch = None
        if self.args.use_pitch:
            o_pitch_emb, o_pitch, avg_pitch = self._forward_pitch_predictor(o_en, x_mask, pitch, dr)
            o_en = o_en + o_pitch_emb
        # energy predictor pass
        o_energy = None
        avg_energy = None
        if self.args.use_energy:
            o_energy_emb, o_energy, avg_energy = self._forward_energy_predictor(o_en, x_mask, energy, dr)
            o_en = o_en + o_energy_emb
        # decoder pass
        o_de, attn = self._forward_decoder(
            o_en, dr, x_mask, y_lengths, g=None
        )  # TODO: maybe pass speaker embedding (g) too
        outputs = {
            "model_outputs": o_de,  # [B, T, C]
            "durations_log": o_dr_log.squeeze(1),  # [B, T]
            "durations": o_dr.squeeze(1),  # [B, T]
            "attn_durations": o_attn,  # for visualization [B, T_en, T_de']
            "pitch_avg": o_pitch,
            "pitch_avg_gt": avg_pitch,
            "energy_avg": o_energy,
            "energy_avg_gt": avg_energy,
            "alignments": attn,  # [B, T_de, T_en]
            "alignment_soft": alignment_soft,
            "alignment_mas": alignment_mas,
            "o_alignment_dur": o_alignment_dur,
            "alignment_logprob": alignment_logprob,
            "x_mask": x_mask,
            "y_mask": y_mask,
        }
        return outputs

    @torch.no_grad()
    def inference(self, x, aux_input={"d_vectors": None, "speaker_ids": None}):  # pylint: disable=unused-argument
        """Model's inference pass.

        Args:
            x (torch.LongTensor): Input character sequence.
            aux_input (Dict): Auxiliary model inputs. Defaults to `{"d_vectors": None, "speaker_ids": None}`.

        Shapes:
            - x: [B, T_max]
            - x_lengths: [B]
            - g: [B, C]
        """
        g = self._set_speaker_input(aux_input)
        x_lengths = torch.tensor(x.shape[1:2]).to(x.device)
        x_mask = torch.unsqueeze(sequence_mask(x_lengths, x.shape[1]), 1).to(x.dtype).float()
        # encoder pass
        o_en, x_mask, g, _ = self._forward_encoder(x, x_mask, g)
        # duration predictor pass
        o_dr_log = self.duration_predictor(o_en, x_mask)
        o_dr = self.format_durations(o_dr_log, x_mask).squeeze(1)
        y_lengths = o_dr.sum(1)
        # pitch predictor pass
        o_pitch = None
        if self.args.use_pitch:
            o_pitch_emb, o_pitch = self._forward_pitch_predictor(o_en, x_mask)
            o_en = o_en + o_pitch_emb
        # energy predictor pass
        o_energy = None
        if self.args.use_energy:
            o_energy_emb, o_energy = self._forward_energy_predictor(o_en, x_mask)
            o_en = o_en + o_energy_emb
        # decoder pass
        o_de, attn = self._forward_decoder(o_en, o_dr, x_mask, y_lengths, g=None)
        outputs = {
            "model_outputs": o_de,
            "alignments": attn,
            "pitch": o_pitch,
            "energy": o_energy,
            "durations_log": o_dr_log,
        }
        return outputs

    def train_step(self, batch: dict, criterion: nn.Module):
        text_input = batch["text_input"]
        text_lengths = batch["text_lengths"]
        mel_input = batch["mel_input"]
        mel_lengths = batch["mel_lengths"]
        pitch = batch["pitch"] if self.args.use_pitch else None
        energy = batch["energy"] if self.args.use_energy else None
        d_vectors = batch["d_vectors"]
        speaker_ids = batch["speaker_ids"]
        durations = batch["durations"]
        aux_input = {"d_vectors": d_vectors, "speaker_ids": speaker_ids}

        # forward pass
        outputs = self.forward(
            text_input,
            text_lengths,
            mel_lengths,
            y=mel_input,
            dr=durations,
            pitch=pitch,
            energy=energy,
            aux_input=aux_input,
        )
        # use aligner's output as the duration target
        if self.use_aligner:
            durations = outputs["o_alignment_dur"]
        # use float32 in AMP
        with autocast(enabled=False):
            # compute loss
            loss_dict = criterion(
                decoder_output=outputs["model_outputs"],
                decoder_target=mel_input,
                decoder_output_lens=mel_lengths,
                dur_output=outputs["durations_log"],
                dur_target=durations,
                pitch_output=outputs["pitch_avg"] if self.use_pitch else None,
                pitch_target=outputs["pitch_avg_gt"] if self.use_pitch else None,
                energy_output=outputs["energy_avg"] if self.use_energy else None,
                energy_target=outputs["energy_avg_gt"] if self.use_energy else None,
                input_lens=text_lengths,
                alignment_logprob=outputs["alignment_logprob"] if self.use_aligner else None,
                alignment_soft=outputs["alignment_soft"],
                alignment_hard=outputs["alignment_mas"],
                binary_loss_weight=self.binary_loss_weight,
            )
            # compute duration error
            durations_pred = outputs["durations"]
            duration_error = torch.abs(durations - durations_pred).sum() / text_lengths.sum()
            loss_dict["duration_error"] = duration_error

        return outputs, loss_dict

    def _create_logs(self, batch, outputs, ap):
        """Create common logger outputs."""
        model_outputs = outputs["model_outputs"]
        alignments = outputs["alignments"]
        mel_input = batch["mel_input"]

        pred_spec = model_outputs[0].data.cpu().numpy()
        gt_spec = mel_input[0].data.cpu().numpy()
        align_img = alignments[0].data.cpu().numpy()

        figures = {
            "prediction": plot_spectrogram(pred_spec, ap, output_fig=False),
            "ground_truth": plot_spectrogram(gt_spec, ap, output_fig=False),
            "alignment": plot_alignment(align_img, output_fig=False),
        }

        # plot pitch figures
        if self.args.use_pitch:
            pitch_avg = abs(outputs["pitch_avg_gt"][0, 0].data.cpu().numpy())
            pitch_avg_hat = abs(outputs["pitch_avg"][0, 0].data.cpu().numpy())
            chars = self.tokenizer.decode(batch["text_input"][0].data.cpu().numpy())
            pitch_figures = {
                "pitch_ground_truth": plot_avg_pitch(pitch_avg, chars, output_fig=False),
                "pitch_avg_predicted": plot_avg_pitch(pitch_avg_hat, chars, output_fig=False),
            }
            figures.update(pitch_figures)

        # plot energy figures
        if self.args.use_energy:
            energy_avg = abs(outputs["energy_avg_gt"][0, 0].data.cpu().numpy())
            energy_avg_hat = abs(outputs["energy_avg"][0, 0].data.cpu().numpy())
            chars = self.tokenizer.decode(batch["text_input"][0].data.cpu().numpy())
            energy_figures = {
                "energy_ground_truth": plot_avg_energy(energy_avg, chars, output_fig=False),
                "energy_avg_predicted": plot_avg_energy(energy_avg_hat, chars, output_fig=False),
            }
            figures.update(energy_figures)

        # plot the attention mask computed from the predicted durations
        if "attn_durations" in outputs:
            alignments_hat = outputs["attn_durations"][0].data.cpu().numpy()
            figures["alignment_hat"] = plot_alignment(alignments_hat.T, output_fig=False)

        # Sample audio
        train_audio = ap.inv_melspectrogram(pred_spec.T)
        return figures, {"audio": train_audio}

    def train_log(
        self, batch: dict, outputs: dict, logger: "Logger", assets: dict, steps: int
    ) -> None:  # pylint: disable=no-self-use
        figures, audios = self._create_logs(batch, outputs, self.ap)
        logger.train_figures(steps, figures)
        logger.train_audios(steps, audios, self.ap.sample_rate)

    def eval_step(self, batch: dict, criterion: nn.Module):
        return self.train_step(batch, criterion)

    def eval_log(self, batch: dict, outputs: dict, logger: "Logger", assets: dict, steps: int) -> None:
        figures, audios = self._create_logs(batch, outputs, self.ap)
        logger.eval_figures(steps, figures)
        logger.eval_audios(steps, audios, self.ap.sample_rate)

    def load_checkpoint(
        self, config, checkpoint_path, eval=False, cache=False
    ):  # pylint: disable=unused-argument, redefined-builtin
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
        self.load_state_dict(state["model"])
        if eval:
            self.eval()
            assert not self.training

    def get_criterion(self):
        from TTS.tts.layers.losses import ForwardTTSLoss  # pylint: disable=import-outside-toplevel

        return ForwardTTSLoss(self.config)

    def on_train_step_start(self, trainer):
        """Schedule binary loss weight."""
        self.binary_loss_weight = min(trainer.epochs_done / self.config.binary_loss_warmup_epochs, 1.0) * 1.0

    @staticmethod
    def init_from_config(config: "ForwardTTSConfig", samples: Union[List[List], List[Dict]] = None):
        """Initiate model from config

        Args:
            config (ForwardTTSConfig): Model config.
            samples (Union[List[List], List[Dict]]): Training samples to parse speaker ids for training.
                Defaults to None.
        """
        from TTS.utils.audio import AudioProcessor

        ap = AudioProcessor.init_from_config(config)
        tokenizer, new_config = TTSTokenizer.init_from_config(config)
        speaker_manager = SpeakerManager.init_from_config(config, samples)
        return ForwardTTS(new_config, ap, tokenizer, speaker_manager)


================================================
FILE: TTS/tts/models/glow_tts.py
================================================
import math
from typing import Dict, List, Tuple, Union

import torch
from coqpit import Coqpit
from torch import nn
from torch.cuda.amp.autocast_mode import autocast
from torch.nn import functional as F

from TTS.tts.configs.glow_tts_config import GlowTTSConfig
from TTS.tts.layers.glow_tts.decoder import Decoder
from TTS.tts.layers.glow_tts.encoder import Encoder
from TTS.tts.models.base_tts import BaseTTS
from TTS.tts.utils.helpers import generate_path, maximum_path, sequence_mask
from TTS.tts.utils.speakers import SpeakerManager
from TTS.tts.utils.synthesis import synthesis
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.tts.utils.visual import plot_alignment, plot_spectrogram
from TTS.utils.io import load_fsspec


class GlowTTS(BaseTTS):
    """GlowTTS model.

    Paper::
        https://arxiv.org/abs/2005.11129

    Paper abstract::
        Recently, text-to-speech (TTS) models such as FastSpeech and ParaNet have been proposed to generate
        mel-spectrograms from text in parallel. Despite the advantage, the parallel TTS models cannot be trained
        without guidance from autoregressive TTS models as their external aligners. In this work, we propose Glow-TTS,
        a flow-based generative model for parallel TTS that does not require any external aligner. By combining the
        properties of flows and dynamic programming, the proposed model searches for the most probable monotonic
        alignment between text and the latent representation of speech on its own. We demonstrate that enforcing hard
        monotonic alignments enables robust TTS, which generalizes to long utterances, and employing generative flows
        enables fast, diverse, and controllable speech synthesis. Glow-TTS obtains an order-of-magnitude speed-up over
        the autoregressive model, Tacotron 2, at synthesis with comparable speech quality. We further show that our
        model can be easily extended to a multi-speaker setting.

    Check :class:`TTS.tts.configs.glow_tts_config.GlowTTSConfig` for class arguments.

    Examples:
        Init only model layers.

        >>> from TTS.tts.configs.glow_tts_config import GlowTTSConfig
        >>> from TTS.tts.models.glow_tts import GlowTTS
        >>> config = GlowTTSConfig(num_chars=2)
        >>> model = GlowTTS(config)

        Fully init a model ready for action. All the class attributes and class members
        (e.g Tokenizer, AudioProcessor, etc.). are initialized internally based on config values.

        >>> from TTS.tts.configs.glow_tts_config import GlowTTSConfig
        >>> from TTS.tts.models.glow_tts import GlowTTS
        >>> config = GlowTTSConfig()
        >>> model = GlowTTS.init_from_config(config, verbose=False)
    """

    def __init__(
        self,
        config: GlowTTSConfig,
        ap: "AudioProcessor" = None,
        tokenizer: "TTSTokenizer" = None,
        speaker_manager: SpeakerManager = None,
    ):
        super().__init__(config, ap, tokenizer, speaker_manager)

        # pass all config fields to `self`
        # for fewer code change
        self.config = config
        for key in config:
            setattr(self, key, config[key])

        self.decoder_output_dim = config.out_channels

        # init multi-speaker layers if necessary
        self.init_multispeaker(config)

        self.run_data_dep_init = config.data_dep_init_steps > 0
        self.encoder = Encoder(
            self.num_chars,
            out_channels=self.out_channels,
            hidden_channels=self.hidden_channels_enc,
            hidden_channels_dp=self.hidden_channels_dp,
            encoder_type=self.encoder_type,
            encoder_params=self.encoder_params,
            mean_only=self.mean_only,
            use_prenet=self.use_encoder_prenet,
            dropout_p_dp=self.dropout_p_dp,
            c_in_channels=self.c_in_channels,
        )

        self.decoder = Decoder(
            self.out_channels,
            self.hidden_channels_dec,
            self.kernel_size_dec,
            self.dilation_rate,
            self.num_flow_blocks_dec,
            self.num_block_layers,
            dropout_p=self.dropout_p_dec,
            num_splits=self.num_splits,
            num_squeeze=self.num_squeeze,
            sigmoid_scale=self.sigmoid_scale,
            c_in_channels=self.c_in_channels,
        )

    def init_multispeaker(self, config: Coqpit):
        """Init speaker embedding layer if `use_speaker_embedding` is True and set the expected speaker embedding
        vector dimension to the encoder layer channel size. If model uses d-vectors, then it only sets
        speaker embedding vector dimension to the d-vector dimension from the config.

        Args:
            config (Coqpit): Model configuration.
        """
        self.embedded_speaker_dim = 0
        # set number of speakers - if num_speakers is set in config, use it, otherwise use speaker_manager
        if self.speaker_manager is not None:
            self.num_speakers = self.speaker_manager.num_speakers
        # set ultimate speaker embedding size
        if config.use_d_vector_file:
            self.embedded_speaker_dim = (
                config.d_vector_dim if "d_vector_dim" in config and config.d_vector_dim is not None else 512
            )
            if self.speaker_manager is not None:
                assert (
                    config.d_vector_dim == self.speaker_manager.embedding_dim
                ), " [!] d-vector dimension mismatch b/w config and speaker manager."
        # init speaker embedding layer
        if config.use_speaker_embedding and not config.use_d_vector_file:
            print(" > Init speaker_embedding layer.")
            self.embedded_speaker_dim = self.hidden_channels_enc
            self.emb_g = nn.Embedding(self.num_speakers, self.hidden_channels_enc)
            nn.init.uniform_(self.emb_g.weight, -0.1, 0.1)
        # set conditioning dimensions
        self.c_in_channels = self.embedded_speaker_dim

    @staticmethod
    def compute_outputs(attn, o_mean, o_log_scale, x_mask):
        """Compute and format the mode outputs with the given alignment map"""
        y_mean = torch.matmul(attn.squeeze(1).transpose(1, 2), o_mean.transpose(1, 2)).transpose(
            1, 2
        )  # [b, t', t], [b, t, d] -> [b, d, t']
        y_log_scale = torch.matmul(attn.squeeze(1).transpose(1, 2), o_log_scale.transpose(1, 2)).transpose(
            1, 2
        )  # [b, t', t], [b, t, d] -> [b, d, t']
        # compute total duration with adjustment
        o_attn_dur = torch.log(1 + torch.sum(attn, -1)) * x_mask
        return y_mean, y_log_scale, o_attn_dur

    def unlock_act_norm_layers(self):
        """Unlock activation normalization layers for data depended initalization."""
        for f in self.decoder.flows:
            if getattr(f, "set_ddi", False):
                f.set_ddi(True)

    def lock_act_norm_layers(self):
        """Lock activation normalization layers."""
        for f in self.decoder.flows:
            if getattr(f, "set_ddi", False):
                f.set_ddi(False)

    def _set_speaker_input(self, aux_input: Dict):
        if aux_input is None:
            d_vectors = None
            speaker_ids = None
        else:
            d_vectors = aux_input.get("d_vectors", None)
            speaker_ids = aux_input.get("speaker_ids", None)

        if d_vectors is not None and speaker_ids is not None:
            raise ValueError("[!] Cannot use d-vectors and speaker-ids together.")

        if speaker_ids is not None and not hasattr(self, "emb_g"):
            raise ValueError("[!] Cannot use speaker-ids without enabling speaker embedding.")

        g = speaker_ids if speaker_ids is not None else d_vectors
        return g

    def _speaker_embedding(self, aux_input: Dict) -> Union[torch.tensor, None]:
        g = self._set_speaker_input(aux_input)
        # speaker embedding
        if g is not None:
            if hasattr(self, "emb_g"):
                # use speaker embedding layer
                if not g.size():  # if is a scalar
                    g = g.unsqueeze(0)  # unsqueeze
                g = F.normalize(self.emb_g(g)).unsqueeze(-1)  # [b, h, 1]
            else:
                # use d-vector
                g = F.normalize(g).unsqueeze(-1)  # [b, h, 1]
        return g

    def forward(
        self, x, x_lengths, y, y_lengths=None, aux_input={"d_vectors": None, "speaker_ids": None}
    ):  # pylint: disable=dangerous-default-value
        """
        Args:
            x (torch.Tensor):
                Input text sequence ids. :math:`[B, T_en]`

            x_lengths (torch.Tensor):
                Lengths of input text sequences. :math:`[B]`

            y (torch.Tensor):
                Target mel-spectrogram frames. :math:`[B, T_de, C_mel]`

            y_lengths (torch.Tensor):
                Lengths of target mel-spectrogram frames. :math:`[B]`

            aux_input (Dict):
                Auxiliary inputs. `d_vectors` is speaker embedding vectors for a multi-speaker model.
                :math:`[B, D_vec]`. `speaker_ids` is speaker ids for a multi-speaker model usind speaker-embedding
                layer. :math:`B`

        Returns:
            Dict:
                - z: :math: `[B, T_de, C]`
                - logdet: :math:`B`
                - y_mean: :math:`[B, T_de, C]`
                - y_log_scale: :math:`[B, T_de, C]`
                - alignments: :math:`[B, T_en, T_de]`
                - durations_log: :math:`[B, T_en, 1]`
                - total_durations_log: :math:`[B, T_en, 1]`
        """
        # [B, T, C] -> [B, C, T]
        y = y.transpose(1, 2)
        y_max_length = y.size(2)
        # norm speaker embeddings
        g = self._speaker_embedding(aux_input)
        # embedding pass
        o_mean, o_log_scale, o_dur_log, x_mask = self.encoder(x, x_lengths, g=g)
        # drop redisual frames wrt num_squeeze and set y_lengths.
        y, y_lengths, y_max_length, attn = self.preprocess(y, y_lengths, y_max_length, None)
        # create masks
        y_mask = torch.unsqueeze(sequence_mask(y_lengths, y_max_length), 1).to(x_mask.dtype)
        # [B, 1, T_en, T_de]
        attn_mask = torch.unsqueeze(x_mask, -1) * torch.unsqueeze(y_mask, 2)
        # decoder pass
        z, logdet = self.decoder(y, y_mask, g=g, reverse=False)
        # find the alignment path
        with torch.no_grad():
            o_scale = torch.exp(-2 * o_log_scale)
            logp1 = torch.sum(-0.5 * math.log(2 * math.pi) - o_log_scale, [1]).unsqueeze(-1)  # [b, t, 1]
            logp2 = torch.matmul(o_scale.transpose(1, 2), -0.5 * (z**2))  # [b, t, d] x [b, d, t'] = [b, t, t']
            logp3 = torch.matmul((o_mean * o_scale).transpose(1, 2), z)  # [b, t, d] x [b, d, t'] = [b, t, t']
            logp4 = torch.sum(-0.5 * (o_mean**2) * o_scale, [1]).unsqueeze(-1)  # [b, t, 1]
            logp = logp1 + logp2 + logp3 + logp4  # [b, t, t']
            attn = maximum_path(logp, attn_mask.squeeze(1)).unsqueeze(1).detach()
        y_mean, y_log_scale, o_attn_dur = self.compute_outputs(attn, o_mean, o_log_scale, x_mask)
        attn = attn.squeeze(1).permute(0, 2, 1)
        outputs = {
            "z": z.transpose(1, 2),
            "logdet": logdet,
            "y_mean": y_mean.transpose(1, 2),
            "y_log_scale": y_log_scale.transpose(1, 2),
            "alignments": attn,
            "durations_log": o_dur_log.transpose(1, 2),
            "total_durations_log": o_attn_dur.transpose(1, 2),
        }
        return outputs

    @torch.no_grad()
    def inference_with_MAS(
        self, x, x_lengths, y=None, y_lengths=None, aux_input={"d_vectors": None, "speaker_ids": None}
    ):  # pylint: disable=dangerous-default-value
        """
        It's similar to the teacher forcing in Tacotron.
        It was proposed in: https://arxiv.org/abs/2104.05557

        Shapes:
            - x: :math:`[B, T]`
            - x_lenghts: :math:`B`
            - y: :math:`[B, T, C]`
            - y_lengths: :math:`B`
            - g: :math:`[B, C] or B`
        """
        y = y.transpose(1, 2)
        y_max_length = y.size(2)
        # norm speaker embeddings
        g = self._speaker_embedding(aux_input)
        # embedding pass
        o_mean, o_log_scale, o_dur_log, x_mask = self.encoder(x, x_lengths, g=g)
        # drop redisual frames wrt num_squeeze and set y_lengths.
        y, y_lengths, y_max_length, attn = self.preprocess(y, y_lengths, y_max_length, None)
        # create masks
        y_mask = torch.unsqueeze(sequence_mask(y_lengths, y_max_length), 1).to(x_mask.dtype)
        attn_mask = torch.unsqueeze(x_mask, -1) * torch.unsqueeze(y_mask, 2)
        # decoder pass
        z, logdet = self.decoder(y, y_mask, g=g, reverse=False)
        # find the alignment path between z and encoder output
        o_scale = torch.exp(-2 * o_log_scale)
        logp1 = torch.sum(-0.5 * math.log(2 * math.pi) - o_log_scale, [1]).unsqueeze(-1)  # [b, t, 1]
        logp2 = torch.matmul(o_scale.transpose(1, 2), -0.5 * (z**2))  # [b, t, d] x [b, d, t'] = [b, t, t']
        logp3 = torch.matmul((o_mean * o_scale).transpose(1, 2), z)  # [b, t, d] x [b, d, t'] = [b, t, t']
        logp4 = torch.sum(-0.5 * (o_mean**2) * o_scale, [1]).unsqueeze(-1)  # [b, t, 1]
        logp = logp1 + logp2 + logp3 + logp4  # [b, t, t']
        attn = maximum_path(logp, attn_mask.squeeze(1)).unsqueeze(1).detach()

        y_mean, y_log_scale, o_attn_dur = self.compute_outputs(attn, o_mean, o_log_scale, x_mask)
        attn = attn.squeeze(1).permute(0, 2, 1)

        # get predited aligned distribution
        z = y_mean * y_mask

        # reverse the decoder and predict using the aligned distribution
        y, logdet = self.decoder(z, y_mask, g=g, reverse=True)
        outputs = {
            "model_outputs": z.transpose(1, 2),
            "logdet": logdet,
            "y_mean": y_mean.transpose(1, 2),
            "y_log_scale": y_log_scale.transpose(1, 2),
            "alignments": attn,
            "durations_log": o_dur_log.transpose(1, 2),
            "total_durations_log": o_attn_dur.transpose(1, 2),
        }
        return outputs

    @torch.no_grad()
    def decoder_inference(
        self, y, y_lengths=None, aux_input={"d_vectors": None, "speaker_ids": None}
    ):  # pylint: disable=dangerous-default-value
        """
        Shapes:
            - y: :math:`[B, T, C]`
            - y_lengths: :math:`B`
            - g: :math:`[B, C] or B`
        """
        y = y.transpose(1, 2)
        y_max_length = y.size(2)
        g = self._speaker_embedding(aux_input)
        y_mask = torch.unsqueeze(sequence_mask(y_lengths, y_max_length), 1).to(y.dtype)
        # decoder pass
        z, logdet = self.decoder(y, y_mask, g=g, reverse=False)
        # reverse decoder and predict
        y, logdet = self.decoder(z, y_mask, g=g, reverse=True)
        outputs = {}
        outputs["model_outputs"] = y.transpose(1, 2)
        outputs["logdet"] = logdet
        return outputs

    @torch.no_grad()
    def inference(
        self, x, aux_input={"x_lengths": None, "d_vectors": None, "speaker_ids": None}
    ):  # pylint: disable=dangerous-default-value
        x_lengths = aux_input["x_lengths"]
        g = self._speaker_embedding(aux_input)
        # embedding pass
        o_mean, o_log_scale, o_dur_log, x_mask = self.encoder(x, x_lengths, g=g)
        # compute output durations
        w = (torch.exp(o_dur_log) - 1) * x_mask * self.length_scale
        w_ceil = torch.clamp_min(torch.ceil(w), 1)
        y_lengths = torch.clamp_min(torch.sum(w_ceil, [1, 2]), 1).long()
        y_max_length = None
        # compute masks
        y_mask = torch.unsqueeze(sequence_mask(y_lengths, y_max_length), 1).to(x_mask.dtype)
        attn_mask = torch.unsqueeze(x_mask, -1) * torch.unsqueeze(y_mask, 2)
        # compute attention mask
        attn = generate_path(w_ceil.squeeze(1), attn_mask.squeeze(1)).unsqueeze(1)
        y_mean, y_log_scale, o_attn_dur = self.compute_outputs(attn, o_mean, o_log_scale, x_mask)

        z = (y_mean + torch.exp(y_log_scale) * torch.randn_like(y_mean) * self.inference_noise_scale) * y_mask
        # decoder pass
        y, logdet = self.decoder(z, y_mask, g=g, reverse=True)
        attn = attn.squeeze(1).permute(0, 2, 1)
        outputs = {
            "model_outputs": y.transpose(1, 2),
            "logdet": logdet,
            "y_mean": y_mean.transpose(1, 2),
            "y_log_scale": y_log_scale.transpose(1, 2),
            "alignments": attn,
            "durations_log": o_dur_log.transpose(1, 2),
            "total_durations_log": o_attn_dur.transpose(1, 2),
        }
        return outputs

    def train_step(self, batch: dict, criterion: nn.Module):
        """A single training step. Forward pass and loss computation. Run data depended initialization for the
        first `config.data_dep_init_steps` steps.

        Args:
            batch (dict): [description]
            criterion (nn.Module): [description]
        """
        text_input = batch["text_input"]
        text_lengths = batch["text_lengths"]
        mel_input = batch["mel_input"]
        mel_lengths = batch["mel_lengths"]
        d_vectors = batch["d_vectors"]
        speaker_ids = batch["speaker_ids"]

        if self.run_data_dep_init and self.training:
            # compute data-dependent initialization of activation norm layers
            self.unlock_act_norm_layers()
            with torch.no_grad():
                _ = self.forward(
                    text_input,
                    text_lengths,
                    mel_input,
                    mel_lengths,
                    aux_input={"d_vectors": d_vectors, "speaker_ids": speaker_ids},
                )
            outputs = None
            loss_dict = None
            self.lock_act_norm_layers()
        else:
            # normal training step
            outputs = self.forward(
                text_input,
                text_lengths,
                mel_input,
                mel_lengths,
                aux_input={"d_vectors": d_vectors, "speaker_ids": speaker_ids},
            )

            with autocast(enabled=False):  # avoid mixed_precision in criterion
                loss_dict = criterion(
                    outputs["z"].float(),
                    outputs["y_mean"].float(),
                    outputs["y_log_scale"].float(),
                    outputs["logdet"].float(),
                    mel_lengths,
                    outputs["durations_log"].float(),
                    outputs["total_durations_log"].float(),
                    text_lengths,
                )
        return outputs, loss_dict

    def _create_logs(self, batch, outputs, ap):
        alignments = outputs["alignments"]
        text_input = batch["text_input"][:1] if batch["text_input"] is not None else None
        text_lengths = batch["text_lengths"]
        mel_input = batch["mel_input"]
        d_vectors = batch["d_vectors"][:1] if batch["d_vectors"] is not None else None
        speaker_ids = batch["speaker_ids"][:1] if batch["speaker_ids"] is not None else None

        # model runs reverse flow to predict spectrograms
        pred_outputs = self.inference(
            text_input,
            aux_input={"x_lengths": text_lengths[:1], "d_vectors": d_vectors, "speaker_ids": speaker_ids},
        )
        model_outputs = pred_outputs["model_outputs"]

        pred_spec = model_outputs[0].data.cpu().numpy()
        gt_spec = mel_input[0].data.cpu().numpy()
        align_img = alignments[0].data.cpu().numpy()

        figures = {
            "prediction": plot_spectrogram(pred_spec, ap, output_fig=False),
            "ground_truth": plot_spectrogram(gt_spec, ap, output_fig=False),
            "alignment": plot_alignment(align_img, output_fig=False),
        }

        # Sample audio
        train_audio = ap.inv_melspectrogram(pred_spec.T)
        return figures, {"audio": train_audio}

    def train_log(
        self, batch: dict, outputs: dict, logger: "Logger", assets: dict, steps: int
    ) -> None:  # pylint: disable=no-self-use
        figures, audios = self._create_logs(batch, outputs, self.ap)
        logger.train_figures(steps, figures)
        logger.train_audios(steps, audios, self.ap.sample_rate)

    @torch.no_grad()
    def eval_step(self, batch: dict, criterion: nn.Module):
        return self.train_step(batch, criterion)

    def eval_log(self, batch: dict, outputs: dict, logger: "Logger", assets: dict, steps: int) -> None:
        figures, audios = self._create_logs(batch, outputs, self.ap)
        logger.eval_figures(steps, figures)
        logger.eval_audios(steps, audios, self.ap.sample_rate)

    @torch.no_grad()
    def test_run(self, assets: Dict) -> Tuple[Dict, Dict]:
        """Generic test run for `tts` models used by `Trainer`.

        You can override this for a different behaviour.

        Returns:
            Tuple[Dict, Dict]: Test figures and audios to be projected to Tensorboard.
        """
        print(" | > Synthesizing test sentences.")
        test_audios = {}
        test_figures = {}
        test_sentences = self.config.test_sentences
        aux_inputs = self._get_test_aux_input()
        if len(test_sentences) == 0:
            print(" | [!] No test sentences provided.")
        else:
            for idx, sen in enumerate(test_sentences):
                outputs = synthesis(
                    self,
                    sen,
                    self.config,
                    "cuda" in str(next(self.parameters()).device),
                    speaker_id=aux_inputs["speaker_id"],
                    d_vector=aux_inputs["d_vector"],
                    style_wav=aux_inputs["style_wav"],
                    use_griffin_lim=True,
                    do_trim_silence=False,
                )

                test_audios["{}-audio".format(idx)] = outputs["wav"]
                test_figures["{}-prediction".format(idx)] = plot_spectrogram(
                    outputs["outputs"]["model_outputs"], self.ap, output_fig=False
                )
                test_figures["{}-alignment".format(idx)] = plot_alignment(outputs["alignments"], output_fig=False)
        return test_figures, test_audios

    def preprocess(self, y, y_lengths, y_max_length, attn=None):
        if y_max_length is not None:
            y_max_length = (y_max_length // self.num_squeeze) * self.num_squeeze
            y = y[:, :, :y_max_length]
            if attn is not None:
                attn = attn[:, :, :, :y_max_length]
        y_lengths = torch.div(y_lengths, self.num_squeeze, rounding_mode="floor") * self.num_squeeze
        return y, y_lengths, y_max_length, attn

    def store_inverse(self):
        self.decoder.store_inverse()

    def load_checkpoint(
        self, config, checkpoint_path, eval=False
    ):  # pylint: disable=unused-argument, redefined-builtin
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"))
        self.load_state_dict(state["model"])
        if eval:
            self.eval()
            self.store_inverse()
            assert not self.training

    @staticmethod
    def get_criterion():
        from TTS.tts.layers.losses import GlowTTSLoss  # pylint: disable=import-outside-toplevel

        return GlowTTSLoss()

    def on_train_step_start(self, trainer):
        """Decide on every training step wheter enable/disable data depended initialization."""
        self.run_data_dep_init = trainer.total_steps_done < self.data_dep_init_steps

    @staticmethod
    def init_from_config(config: "GlowTTSConfig", samples: Union[List[List], List[Dict]] = None, verbose=True):
        """Initiate model from config

        Args:
            config (VitsConfig): Model config.
            samples (Union[List[List], List[Dict]]): Training samples to parse speaker ids for training.
                Defaults to None.
            verbose (bool): If True, print init messages. Defaults to True.
        """
        from TTS.utils.audio import AudioProcessor

        ap = AudioProcessor.init_from_config(config, verbose)
        tokenizer, new_config = TTSTokenizer.init_from_config(config)
        speaker_manager = SpeakerManager.init_from_config(config, samples)
        return GlowTTS(new_config, ap, tokenizer, speaker_manager)


================================================
FILE: TTS/tts/models/neuralhmm_tts.py
================================================
import os
from typing import Dict, List, Union

import torch
from coqpit import Coqpit
from torch import nn
from trainer.logging.tensorboard_logger import TensorboardLogger

from TTS.tts.layers.overflow.common_layers import Encoder, OverflowUtils
from TTS.tts.layers.overflow.neural_hmm import NeuralHMM
from TTS.tts.layers.overflow.plotting_utils import (
    get_spec_from_most_probable_state,
    plot_transition_probabilities_to_numpy,
)
from TTS.tts.models.base_tts import BaseTTS
from TTS.tts.utils.speakers import SpeakerManager
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.tts.utils.visual import plot_alignment, plot_spectrogram
from TTS.utils.generic_utils import format_aux_input
from TTS.utils.io import load_fsspec


class NeuralhmmTTS(BaseTTS):
    """Neural HMM TTS model.

    Paper::
        https://arxiv.org/abs/2108.13320

    Paper abstract::
        Neural sequence-to-sequence TTS has achieved significantly better output quality
    than statistical speech synthesis using HMMs.However, neural TTS is generally not probabilistic
    and uses non-monotonic attention. Attention failures increase training time and can make
    synthesis babble incoherently. This paper describes how the old and new paradigms can be
    combined to obtain the advantages of both worlds, by replacing attention in neural TTS with
    an autoregressive left-right no-skip hidden Markov model defined by a neural network.
    Based on this proposal, we modify Tacotron 2 to obtain an HMM-based neural TTS model with
    monotonic alignment, trained to maximise the full sequence likelihood without approximation.
    We also describe how to combine ideas from classical and contemporary TTS for best results.
    The resulting example system is smaller and simpler than Tacotron 2, and learns to speak with
    fewer iterations and less data, whilst achieving comparable naturalness prior to the post-net.
    Our approach also allows easy control over speaking rate. Audio examples and code
    are available at https://shivammehta25.github.io/Neural-HMM/ .

    Note:
        - This is a parameter efficient version of OverFlow (15.3M vs 28.6M). Since it has half the
        number of parameters as OverFlow the synthesis output quality is suboptimal (but comparable to Tacotron2
        without Postnet), but it learns to speak with even lesser amount of data and is still significantly faster
        than other attention-based methods.

        - Neural HMMs uses flat start initialization i.e it computes the means and std and transition probabilities
        of the dataset and uses them to initialize the model. This benefits the model and helps with faster learning
        If you change the dataset or want to regenerate the parameters change the `force_generate_statistics` and
        `mel_statistics_parameter_path` accordingly.

        - To enable multi-GPU training, set the `use_grad_checkpointing=False` in config.
        This will significantly increase the memory usage.  This is because to compute
        the actual data likelihood (not an approximation using MAS/Viterbi) we must use
        all the states at the previous time step during the forward pass to decide the
        probability distribution at the current step i.e the difference between the forward
        algorithm and viterbi approximation.

    Check :class:`TTS.tts.configs.neuralhmm_tts_config.NeuralhmmTTSConfig` for class arguments.
    """

    def __init__(
        self,
        config: "NeuralhmmTTSConfig",
        ap: "AudioProcessor" = None,
        tokenizer: "TTSTokenizer" = None,
        speaker_manager: SpeakerManager = None,
    ):
        super().__init__(config, ap, tokenizer, speaker_manager)

        # pass all config fields to `self`
        # for fewer code change
        self.config = config
        for key in config:
            setattr(self, key, config[key])

        self.encoder = Encoder(config.num_chars, config.state_per_phone, config.encoder_in_out_features)
        self.neural_hmm = NeuralHMM(
            frame_channels=self.out_channels,
            ar_order=self.ar_order,
            deterministic_transition=self.deterministic_transition,
            encoder_dim=self.encoder_in_out_features,
            prenet_type=self.prenet_type,
            prenet_dim=self.prenet_dim,
            prenet_n_layers=self.prenet_n_layers,
            prenet_dropout=self.prenet_dropout,
            prenet_dropout_at_inference=self.prenet_dropout_at_inference,
            memory_rnn_dim=self.memory_rnn_dim,
            outputnet_size=self.outputnet_size,
            flat_start_params=self.flat_start_params,
            std_floor=self.std_floor,
            use_grad_checkpointing=self.use_grad_checkpointing,
        )

        self.register_buffer("mean", torch.tensor(0))
        self.register_buffer("std", torch.tensor(1))

    def update_mean_std(self, statistics_dict: Dict):
        self.mean.data = torch.tensor(statistics_dict["mean"])
        self.std.data = torch.tensor(statistics_dict["std"])

    def preprocess_batch(self, text, text_len, mels, mel_len):
        if self.mean.item() == 0 or self.std.item() == 1:
            statistics_dict = torch.load(self.mel_statistics_parameter_path)
            self.update_mean_std(statistics_dict)

        mels = self.normalize(mels)
        return text, text_len, mels, mel_len

    def normalize(self, x):
        return x.sub(self.mean).div(self.std)

    def inverse_normalize(self, x):
        return x.mul(self.std).add(self.mean)

    def forward(self, text, text_len, mels, mel_len):
        """
        Forward pass for training and computing the log likelihood of a given batch.

        Shapes:
            Shapes:
            text: :math:`[B, T_in]`
            text_len: :math:`[B]`
            mels: :math:`[B, T_out, C]`
            mel_len: :math:`[B]`
        """
        text, text_len, mels, mel_len = self.preprocess_batch(text, text_len, mels, mel_len)
        encoder_outputs, encoder_output_len = self.encoder(text, text_len)

        log_probs, fwd_alignments, transition_vectors, means = self.neural_hmm(
            encoder_outputs, encoder_output_len, mels.transpose(1, 2), mel_len
        )

        outputs = {
            "log_probs": log_probs,
            "alignments": fwd_alignments,
            "transition_vectors": transition_vectors,
            "means": means,
        }

        return outputs

    @staticmethod
    def _training_stats(batch):
        stats = {}
        stats["avg_text_length"] = batch["text_lengths"].float().mean()
        stats["avg_spec_length"] = batch["mel_lengths"].float().mean()
        stats["avg_text_batch_occupancy"] = (batch["text_lengths"].float() / batch["text_lengths"].float().max()).mean()
        stats["avg_spec_batch_occupancy"] = (batch["mel_lengths"].float() / batch["mel_lengths"].float().max()).mean()
        return stats

    def train_step(self, batch: dict, criterion: nn.Module):
        text_input = batch["text_input"]
        text_lengths = batch["text_lengths"]
        mel_input = batch["mel_input"]
        mel_lengths = batch["mel_lengths"]

        outputs = self.forward(
            text=text_input,
            text_len=text_lengths,
            mels=mel_input,
            mel_len=mel_lengths,
        )
        loss_dict = criterion(outputs["log_probs"] / (mel_lengths.sum() + text_lengths.sum()))

        # for printing useful statistics on terminal
        loss_dict.update(self._training_stats(batch))
        return outputs, loss_dict

    def eval_step(self, batch: Dict, criterion: nn.Module):
        return self.train_step(batch, criterion)

    def _format_aux_input(self, aux_input: Dict, default_input_dict):
        """Set missing fields to their default value.

        Args:
            aux_inputs (Dict): Dictionary containing the auxiliary inputs.
        """
        default_input_dict = default_input_dict.copy()
        default_input_dict.update(
            {
                "sampling_temp": self.sampling_temp,
                "max_sampling_time": self.max_sampling_time,
                "duration_threshold": self.duration_threshold,
            }
        )
        if aux_input:
            return format_aux_input(default_input_dict, aux_input)
        return default_input_dict

    @torch.no_grad()
    def inference(
        self,
        text: torch.Tensor,
        aux_input={"x_lengths": None, "sampling_temp": None, "max_sampling_time": None, "duration_threshold": None},
    ):  # pylint: disable=dangerous-default-value
        """Sampling from the model

        Args:
            text (torch.Tensor): :math:`[B, T_in]`
            aux_inputs (_type_, optional): _description_. Defaults to None.

        Returns:
            outputs: Dictionary containing the following
                - mel (torch.Tensor): :math:`[B, T_out, C]`
                - hmm_outputs_len (torch.Tensor): :math:`[B]`
                - state_travelled (List[List[int]]): List of lists containing the state travelled for each sample in the batch.
                - input_parameters (list[torch.FloatTensor]): Input parameters to the neural HMM.
                - output_parameters (list[torch.FloatTensor]): Output parameters to the neural HMM.
        """
        default_input_dict = {
            "x_lengths": torch.sum(text != 0, dim=1),
        }
        aux_input = self._format_aux_input(aux_input, default_input_dict)
        encoder_outputs, encoder_output_len = self.encoder.inference(text, aux_input["x_lengths"])
        outputs = self.neural_hmm.inference(
            encoder_outputs,
            encoder_output_len,
            sampling_temp=aux_input["sampling_temp"],
            max_sampling_time=aux_input["max_sampling_time"],
            duration_threshold=aux_input["duration_threshold"],
        )
        mels, mel_outputs_len = outputs["hmm_outputs"], outputs["hmm_outputs_len"]

        mels = self.inverse_normalize(mels)
        outputs.update({"model_outputs": mels, "model_outputs_len": mel_outputs_len})
        outputs["alignments"] = OverflowUtils.double_pad(outputs["alignments"])
        return outputs

    @staticmethod
    def get_criterion():
        return NLLLoss()

    @staticmethod
    def init_from_config(config: "NeuralhmmTTSConfig", samples: Union[List[List], List[Dict]] = None, verbose=True):
        """Initiate model from config

        Args:
            config (VitsConfig): Model config.
            samples (Union[List[List], List[Dict]]): Training samples to parse speaker ids for training.
                Defaults to None.
            verbose (bool): If True, print init messages. Defaults to True.
        """
        from TTS.utils.audio import AudioProcessor

        ap = AudioProcessor.init_from_config(config, verbose)
        tokenizer, new_config = TTSTokenizer.init_from_config(config)
        speaker_manager = SpeakerManager.init_from_config(config, samples)
        return NeuralhmmTTS(new_config, ap, tokenizer, speaker_manager)

    def load_checkpoint(
        self, config: Coqpit, checkpoint_path: str, eval: bool = False, strict: bool = True, cache=False
    ):  # pylint: disable=unused-argument, redefined-builtin
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"))
        self.load_state_dict(state["model"])
        if eval:
            self.eval()
            assert not self.training

    def on_init_start(self, trainer):
        """If the current dataset does not have normalisation statistics and initialisation transition_probability it computes them otherwise loads."""
        if not os.path.isfile(trainer.config.mel_statistics_parameter_path) or trainer.config.force_generate_statistics:
            dataloader = trainer.get_train_dataloader(
                training_assets=None, samples=trainer.train_samples, verbose=False
            )
            print(
                f" | > Data parameters not found for: {trainer.config.mel_statistics_parameter_path}. Computing mel normalization parameters..."
            )
            data_mean, data_std, init_transition_prob = OverflowUtils.get_data_parameters_for_flat_start(
                dataloader, trainer.config.out_channels, trainer.config.state_per_phone
            )
            print(
                f" | > Saving data parameters to: {trainer.config.mel_statistics_parameter_path}: value: {data_mean, data_std, init_transition_prob}"
            )
            statistics = {
                "mean": data_mean.item(),
                "std": data_std.item(),
                "init_transition_prob": init_transition_prob.item(),
            }
            torch.save(statistics, trainer.config.mel_statistics_parameter_path)

        else:
            print(
                f" | > Data parameters found for: {trainer.config.mel_statistics_parameter_path}. Loading mel normalization parameters..."
            )
            statistics = torch.load(trainer.config.mel_statistics_parameter_path)
            data_mean, data_std, init_transition_prob = (
                statistics["mean"],
                statistics["std"],
                statistics["init_transition_prob"],
            )
            print(f" | > Data parameters loaded with value: {data_mean, data_std, init_transition_prob}")

        trainer.config.flat_start_params["transition_p"] = (
            init_transition_prob.item() if torch.is_tensor(init_transition_prob) else init_transition_prob
        )
        OverflowUtils.update_flat_start_transition(trainer.model, init_transition_prob)
        trainer.model.update_mean_std(statistics)

    @torch.inference_mode()
    def _create_logs(self, batch, outputs, ap):  # pylint: disable=no-self-use, unused-argument
        alignments, transition_vectors = outputs["alignments"], outputs["transition_vectors"]
        means = torch.stack(outputs["means"], dim=1)

        figures = {
            "alignment": plot_alignment(alignments[0].exp(), title="Forward alignment", fig_size=(20, 20)),
            "log_alignment": plot_alignment(
                alignments[0].exp(), title="Forward log alignment", plot_log=True, fig_size=(20, 20)
            ),
            "transition_vectors": plot_alignment(transition_vectors[0], title="Transition vectors", fig_size=(20, 20)),
            "mel_from_most_probable_state": plot_spectrogram(
                get_spec_from_most_probable_state(alignments[0], means[0]), fig_size=(12, 3)
            ),
            "mel_target": plot_spectrogram(batch["mel_input"][0], fig_size=(12, 3)),
        }

        # sample one item from the batch -1 will give the smalles item
        print(" | > Synthesising audio from the model...")
        inference_output = self.inference(
            batch["text_input"][-1].unsqueeze(0), aux_input={"x_lengths": batch["text_lengths"][-1].unsqueeze(0)}
        )
        figures["synthesised"] = plot_spectrogram(inference_output["model_outputs"][0], fig_size=(12, 3))

        states = [p[1] for p in inference_output["input_parameters"][0]]
        transition_probability_synthesising = [p[2].cpu().numpy() for p in inference_output["output_parameters"][0]]

        for i in range((len(transition_probability_synthesising) // 200) + 1):
            start = i * 200
            end = (i + 1) * 200
            figures[f"synthesised_transition_probabilities/{i}"] = plot_transition_probabilities_to_numpy(
                states[start:end], transition_probability_synthesising[start:end]
            )

        audio = ap.inv_melspectrogram(inference_output["model_outputs"][0].T.cpu().numpy())
        return figures, {"audios": audio}

    def train_log(
        self, batch: dict, outputs: dict, logger: "Logger", assets: dict, steps: int
    ):  # pylint: disable=unused-argument
        """Log training progress."""
        figures, audios = self._create_logs(batch, outputs, self.ap)
        logger.train_figures(steps, figures)
        logger.train_audios(steps, audios, self.ap.sample_rate)

    def eval_log(
        self, batch: Dict, outputs: Dict, logger: "Logger", assets: Dict, steps: int
    ):  # pylint: disable=unused-argument
        """Compute and log evaluation metrics."""
        # Plot model parameters histograms
        if isinstance(logger, TensorboardLogger):
            # I don't know if any other loggers supports this
            for tag, value in self.named_parameters():
                tag = tag.replace(".", "/")
                logger.writer.add_histogram(tag, value.data.cpu().numpy(), steps)

        figures, audios = self._create_logs(batch, outputs, self.ap)
        logger.eval_figures(steps, figures)
        logger.eval_audios(steps, audios, self.ap.sample_rate)

    def test_log(
        self, outputs: dict, logger: "Logger", assets: dict, steps: int  # pylint: disable=unused-argument
    ) -> None:
        logger.test_audios(steps, outputs[1], self.ap.sample_rate)
        logger.test_figures(steps, outputs[0])


class NLLLoss(nn.Module):
    """Negative log likelihood loss."""

    def forward(self, log_prob: torch.Tensor) -> dict:  # pylint: disable=no-self-use
        """Compute the loss.

        Args:
            logits (Tensor): [B, T, D]

        Returns:
            Tensor: [1]

        """
        return_dict = {}
        return_dict["loss"] = -log_prob.mean()
        return return_dict


================================================
FILE: TTS/tts/models/overflow.py
================================================
import os
from typing import Dict, List, Union

import torch
from coqpit import Coqpit
from torch import nn
from trainer.logging.tensorboard_logger import TensorboardLogger

from TTS.tts.layers.overflow.common_layers import Encoder, OverflowUtils
from TTS.tts.layers.overflow.decoder import Decoder
from TTS.tts.layers.overflow.neural_hmm import NeuralHMM
from TTS.tts.layers.overflow.plotting_utils import (
    get_spec_from_most_probable_state,
    plot_transition_probabilities_to_numpy,
)
from TTS.tts.models.base_tts import BaseTTS
from TTS.tts.utils.speakers import SpeakerManager
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.tts.utils.visual import plot_alignment, plot_spectrogram
from TTS.utils.generic_utils import format_aux_input
from TTS.utils.io import load_fsspec


class Overflow(BaseTTS):
    """OverFlow TTS model.

    Paper::
        https://arxiv.org/abs/2211.06892

    Paper abstract::
        Neural HMMs are a type of neural transducer recently proposed for
    sequence-to-sequence modelling in text-to-speech. They combine the best features
    of classic statistical speech synthesis and modern neural TTS, requiring less
    data and fewer training updates, and are less prone to gibberish output caused
    by neural attention failures. In this paper, we combine neural HMM TTS with
    normalising flows for describing the highly non-Gaussian distribution of speech
    acoustics. The result is a powerful, fully probabilistic model of durations and
    acoustics that can be trained using exact maximum likelihood. Compared to
    dominant flow-based acoustic models, our approach integrates autoregression for
    improved modelling of long-range dependences such as utterance-level prosody.
    Experiments show that a system based on our proposal gives more accurate
    pronunciations and better subjective speech quality than comparable methods,
    whilst retaining the original advantages of neural HMMs. Audio examples and code
    are available at https://shivammehta25.github.io/OverFlow/.

    Note:
        - Neural HMMs uses flat start initialization i.e it computes the means and std and transition probabilities
        of the dataset and uses them to initialize the model. This benefits the model and helps with faster learning
        If you change the dataset or want to regenerate the parameters change the `force_generate_statistics` and
        `mel_statistics_parameter_path` accordingly.

        - To enable multi-GPU training, set the `use_grad_checkpointing=False` in config.
        This will significantly increase the memory usage.  This is because to compute
        the actual data likelihood (not an approximation using MAS/Viterbi) we must use
        all the states at the previous time step during the forward pass to decide the
        probability distribution at the current step i.e the difference between the forward
        algorithm and viterbi approximation.

    Check :class:`TTS.tts.configs.overflow.OverFlowConfig` for class arguments.
    """

    def __init__(
        self,
        config: "OverFlowConfig",
        ap: "AudioProcessor" = None,
        tokenizer: "TTSTokenizer" = None,
        speaker_manager: SpeakerManager = None,
    ):
        super().__init__(config, ap, tokenizer, speaker_manager)

        # pass all config fields to `self`
        # for fewer code change
        self.config = config
        for key in config:
            setattr(self, key, config[key])

        self.decoder_output_dim = config.out_channels

        self.encoder = Encoder(config.num_chars, config.state_per_phone, config.encoder_in_out_features)
        self.neural_hmm = NeuralHMM(
            frame_channels=self.out_channels,
            ar_order=self.ar_order,
            deterministic_transition=self.deterministic_transition,
            encoder_dim=self.encoder_in_out_features,
            prenet_type=self.prenet_type,
            prenet_dim=self.prenet_dim,
            prenet_n_layers=self.prenet_n_layers,
            prenet_dropout=self.prenet_dropout,
            prenet_dropout_at_inference=self.prenet_dropout_at_inference,
            memory_rnn_dim=self.memory_rnn_dim,
            outputnet_size=self.outputnet_size,
            flat_start_params=self.flat_start_params,
            std_floor=self.std_floor,
            use_grad_checkpointing=self.use_grad_checkpointing,
        )

        self.decoder = Decoder(
            self.out_channels,
            self.hidden_channels_dec,
            self.kernel_size_dec,
            self.dilation_rate,
            self.num_flow_blocks_dec,
            self.num_block_layers,
            dropout_p=self.dropout_p_dec,
            num_splits=self.num_splits,
            num_squeeze=self.num_squeeze,
            sigmoid_scale=self.sigmoid_scale,
            c_in_channels=self.c_in_channels,
        )

        self.register_buffer("mean", torch.tensor(0))
        self.register_buffer("std", torch.tensor(1))

    def update_mean_std(self, statistics_dict: Dict):
        self.mean.data = torch.tensor(statistics_dict["mean"])
        self.std.data = torch.tensor(statistics_dict["std"])

    def preprocess_batch(self, text, text_len, mels, mel_len):
        if self.mean.item() == 0 or self.std.item() == 1:
            statistics_dict = torch.load(self.mel_statistics_parameter_path)
            self.update_mean_std(statistics_dict)

        mels = self.normalize(mels)
        return text, text_len, mels, mel_len

    def normalize(self, x):
        return x.sub(self.mean).div(self.std)

    def inverse_normalize(self, x):
        return x.mul(self.std).add(self.mean)

    def forward(self, text, text_len, mels, mel_len):
        """
        Forward pass for training and computing the log likelihood of a given batch.

        Shapes:
            Shapes:
            text: :math:`[B, T_in]`
            text_len: :math:`[B]`
            mels: :math:`[B, T_out, C]`
            mel_len: :math:`[B]`
        """
        text, text_len, mels, mel_len = self.preprocess_batch(text, text_len, mels, mel_len)
        encoder_outputs, encoder_output_len = self.encoder(text, text_len)
        z, z_lengths, logdet = self.decoder(mels.transpose(1, 2), mel_len)
        log_probs, fwd_alignments, transition_vectors, means = self.neural_hmm(
            encoder_outputs, encoder_output_len, z, z_lengths
        )

        outputs = {
            "log_probs": log_probs + logdet,
            "alignments": fwd_alignments,
            "transition_vectors": transition_vectors,
            "means": means,
        }

        return outputs

    @staticmethod
    def _training_stats(batch):
        stats = {}
        stats["avg_text_length"] = batch["text_lengths"].float().mean()
        stats["avg_spec_length"] = batch["mel_lengths"].float().mean()
        stats["avg_text_batch_occupancy"] = (batch["text_lengths"].float() / batch["text_lengths"].float().max()).mean()
        stats["avg_spec_batch_occupancy"] = (batch["mel_lengths"].float() / batch["mel_lengths"].float().max()).mean()
        return stats

    def train_step(self, batch: dict, criterion: nn.Module):
        text_input = batch["text_input"]
        text_lengths = batch["text_lengths"]
        mel_input = batch["mel_input"]
        mel_lengths = batch["mel_lengths"]

        outputs = self.forward(
            text=text_input,
            text_len=text_lengths,
            mels=mel_input,
            mel_len=mel_lengths,
        )
        loss_dict = criterion(outputs["log_probs"] / (mel_lengths.sum() + text_lengths.sum()))

        # for printing useful statistics on terminal
        loss_dict.update(self._training_stats(batch))
        return outputs, loss_dict

    def eval_step(self, batch: Dict, criterion: nn.Module):
        return self.train_step(batch, criterion)

    def _format_aux_input(self, aux_input: Dict, default_input_dict):
        """Set missing fields to their default value.

        Args:
            aux_inputs (Dict): Dictionary containing the auxiliary inputs.
        """
        default_input_dict = default_input_dict.copy()
        default_input_dict.update(
            {
                "sampling_temp": self.sampling_temp,
                "max_sampling_time": self.max_sampling_time,
                "duration_threshold": self.duration_threshold,
            }
        )
        if aux_input:
            return format_aux_input(default_input_dict, aux_input)
        return default_input_dict

    @torch.no_grad()
    def inference(
        self,
        text: torch.Tensor,
        aux_input={"x_lengths": None, "sampling_temp": None, "max_sampling_time": None, "duration_threshold": None},
    ):  # pylint: disable=dangerous-default-value
        """Sampling from the model

        Args:
            text (torch.Tensor): :math:`[B, T_in]`
            aux_inputs (_type_, optional): _description_. Defaults to None.

        Returns:
            outputs: Dictionary containing the following
                - mel (torch.Tensor): :math:`[B, T_out, C]`
                - hmm_outputs_len (torch.Tensor): :math:`[B]`
                - state_travelled (List[List[int]]): List of lists containing the state travelled for each sample in the batch.
                - input_parameters (list[torch.FloatTensor]): Input parameters to the neural HMM.
                - output_parameters (list[torch.FloatTensor]): Output parameters to the neural HMM.
        """
        default_input_dict = {
            "x_lengths": torch.sum(text != 0, dim=1),
        }
        aux_input = self._format_aux_input(aux_input, default_input_dict)
        encoder_outputs, encoder_output_len = self.encoder.inference(text, aux_input["x_lengths"])
        outputs = self.neural_hmm.inference(
            encoder_outputs,
            encoder_output_len,
            sampling_temp=aux_input["sampling_temp"],
            max_sampling_time=aux_input["max_sampling_time"],
            duration_threshold=aux_input["duration_threshold"],
        )

        mels, mel_outputs_len, _ = self.decoder(
            outputs["hmm_outputs"].transpose(1, 2), outputs["hmm_outputs_len"], reverse=True
        )
        mels = self.inverse_normalize(mels.transpose(1, 2))
        outputs.update({"model_outputs": mels, "model_outputs_len": mel_outputs_len})
        outputs["alignments"] = OverflowUtils.double_pad(outputs["alignments"])
        return outputs

    @staticmethod
    def get_criterion():
        return NLLLoss()

    @staticmethod
    def init_from_config(config: "OverFlowConfig", samples: Union[List[List], List[Dict]] = None, verbose=True):
        """Initiate model from config

        Args:
            config (VitsConfig): Model config.
            samples (Union[List[List], List[Dict]]): Training samples to parse speaker ids for training.
                Defaults to None.
            verbose (bool): If True, print init messages. Defaults to True.
        """
        from TTS.utils.audio import AudioProcessor

        ap = AudioProcessor.init_from_config(config, verbose)
        tokenizer, new_config = TTSTokenizer.init_from_config(config)
        speaker_manager = SpeakerManager.init_from_config(config, samples)
        return Overflow(new_config, ap, tokenizer, speaker_manager)

    def load_checkpoint(
        self, config: Coqpit, checkpoint_path: str, eval: bool = False, strict: bool = True, cache=False
    ):  # pylint: disable=unused-argument, redefined-builtin
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"))
        self.load_state_dict(state["model"])
        if eval:
            self.eval()
            self.decoder.store_inverse()
            assert not self.training

    def on_init_start(self, trainer):
        """If the current dataset does not have normalisation statistics and initialisation transition_probability it computes them otherwise loads."""
        if not os.path.isfile(trainer.config.mel_statistics_parameter_path) or trainer.config.force_generate_statistics:
            dataloader = trainer.get_train_dataloader(
                training_assets=None, samples=trainer.train_samples, verbose=False
            )
            print(
                f" | > Data parameters not found for: {trainer.config.mel_statistics_parameter_path}. Computing mel normalization parameters..."
            )
            data_mean, data_std, init_transition_prob = OverflowUtils.get_data_parameters_for_flat_start(
                dataloader, trainer.config.out_channels, trainer.config.state_per_phone
            )
            print(
                f" | > Saving data parameters to: {trainer.config.mel_statistics_parameter_path}: value: {data_mean, data_std, init_transition_prob}"
            )
            statistics = {
                "mean": data_mean.item(),
                "std": data_std.item(),
                "init_transition_prob": init_transition_prob.item(),
            }
            torch.save(statistics, trainer.config.mel_statistics_parameter_path)

        else:
            print(
                f" | > Data parameters found for: {trainer.config.mel_statistics_parameter_path}. Loading mel normalization parameters..."
            )
            statistics = torch.load(trainer.config.mel_statistics_parameter_path)
            data_mean, data_std, init_transition_prob = (
                statistics["mean"],
                statistics["std"],
                statistics["init_transition_prob"],
            )
            print(f" | > Data parameters loaded with value: {data_mean, data_std, init_transition_prob}")

        trainer.config.flat_start_params["transition_p"] = (
            init_transition_prob.item() if torch.is_tensor(init_transition_prob) else init_transition_prob
        )
        OverflowUtils.update_flat_start_transition(trainer.model, init_transition_prob)
        trainer.model.update_mean_std(statistics)

    @torch.inference_mode()
    def _create_logs(self, batch, outputs, ap):  # pylint: disable=no-self-use, unused-argument
        alignments, transition_vectors = outputs["alignments"], outputs["transition_vectors"]
        means = torch.stack(outputs["means"], dim=1)

        figures = {
            "alignment": plot_alignment(alignments[0].exp(), title="Forward alignment", fig_size=(20, 20)),
            "log_alignment": plot_alignment(
                alignments[0].exp(), title="Forward log alignment", plot_log=True, fig_size=(20, 20)
            ),
            "transition_vectors": plot_alignment(transition_vectors[0], title="Transition vectors", fig_size=(20, 20)),
            "mel_from_most_probable_state": plot_spectrogram(
                get_spec_from_most_probable_state(alignments[0], means[0], self.decoder), fig_size=(12, 3)
            ),
            "mel_target": plot_spectrogram(batch["mel_input"][0], fig_size=(12, 3)),
        }

        # sample one item from the batch -1 will give the smalles item
        print(" | > Synthesising audio from the model...")
        inference_output = self.inference(
            batch["text_input"][-1].unsqueeze(0), aux_input={"x_lengths": batch["text_lengths"][-1].unsqueeze(0)}
        )
        figures["synthesised"] = plot_spectrogram(inference_output["model_outputs"][0], fig_size=(12, 3))

        states = [p[1] for p in inference_output["input_parameters"][0]]
        transition_probability_synthesising = [p[2].cpu().numpy() for p in inference_output["output_parameters"][0]]

        for i in range((len(transition_probability_synthesising) // 200) + 1):
            start = i * 200
            end = (i + 1) * 200
            figures[f"synthesised_transition_probabilities/{i}"] = plot_transition_probabilities_to_numpy(
                states[start:end], transition_probability_synthesising[start:end]
            )

        audio = ap.inv_melspectrogram(inference_output["model_outputs"][0].T.cpu().numpy())
        return figures, {"audios": audio}

    def train_log(
        self, batch: dict, outputs: dict, logger: "Logger", assets: dict, steps: int
    ):  # pylint: disable=unused-argument
        """Log training progress."""
        figures, audios = self._create_logs(batch, outputs, self.ap)
        logger.train_figures(steps, figures)
        logger.train_audios(steps, audios, self.ap.sample_rate)

    def eval_log(
        self, batch: Dict, outputs: Dict, logger: "Logger", assets: Dict, steps: int
    ):  # pylint: disable=unused-argument
        """Compute and log evaluation metrics."""
        # Plot model parameters histograms
        if isinstance(logger, TensorboardLogger):
            # I don't know if any other loggers supports this
            for tag, value in self.named_parameters():
                tag = tag.replace(".", "/")
                logger.writer.add_histogram(tag, value.data.cpu().numpy(), steps)

        figures, audios = self._create_logs(batch, outputs, self.ap)
        logger.eval_figures(steps, figures)
        logger.eval_audios(steps, audios, self.ap.sample_rate)

    def test_log(
        self, outputs: dict, logger: "Logger", assets: dict, steps: int  # pylint: disable=unused-argument
    ) -> None:
        logger.test_audios(steps, outputs[1], self.ap.sample_rate)
        logger.test_figures(steps, outputs[0])


class NLLLoss(nn.Module):
    """Negative log likelihood loss."""

    def forward(self, log_prob: torch.Tensor) -> dict:  # pylint: disable=no-self-use
        """Compute the loss.

        Args:
            logits (Tensor): [B, T, D]

        Returns:
            Tensor: [1]

        """
        return_dict = {}
        return_dict["loss"] = -log_prob.mean()
        return return_dict


================================================
FILE: TTS/tts/models/tacotron.py
================================================
# coding: utf-8

from typing import Dict, List, Tuple, Union

import torch
from torch import nn
from torch.cuda.amp.autocast_mode import autocast
from trainer.trainer_utils import get_optimizer, get_scheduler

from TTS.tts.layers.tacotron.capacitron_layers import CapacitronVAE
from TTS.tts.layers.tacotron.gst_layers import GST
from TTS.tts.layers.tacotron.tacotron import Decoder, Encoder, PostCBHG
from TTS.tts.models.base_tacotron import BaseTacotron
from TTS.tts.utils.measures import alignment_diagonal_score
from TTS.tts.utils.speakers import SpeakerManager
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.tts.utils.visual import plot_alignment, plot_spectrogram
from TTS.utils.capacitron_optimizer import CapacitronOptimizer


class Tacotron(BaseTacotron):
    """Tacotron as in https://arxiv.org/abs/1703.10135
    It's an autoregressive encoder-attention-decoder-postnet architecture.
    Check `TacotronConfig` for the arguments.

    Args:
        config (TacotronConfig): Configuration for the Tacotron model.
        speaker_manager (SpeakerManager): Speaker manager to handle multi-speaker settings. Only use if the model is
            a multi-speaker model. Defaults to None.
    """

    def __init__(
        self,
        config: "TacotronConfig",
        ap: "AudioProcessor" = None,
        tokenizer: "TTSTokenizer" = None,
        speaker_manager: SpeakerManager = None,
    ):
        super().__init__(config, ap, tokenizer, speaker_manager)

        # pass all config fields to `self`
        # for fewer code change
        for key in config:
            setattr(self, key, config[key])

        # set speaker embedding channel size for determining `in_channels` for the connected layers.
        # `init_multispeaker` needs to be called once more in training to initialize the speaker embedding layer based
        # on the number of speakers infered from the dataset.
        if self.use_speaker_embedding or self.use_d_vector_file:
            self.init_multispeaker(config)
            self.decoder_in_features += self.embedded_speaker_dim  # add speaker embedding dim

        if self.use_gst:
            self.decoder_in_features += self.gst.gst_embedding_dim

        if self.use_capacitron_vae:
            self.decoder_in_features += self.capacitron_vae.capacitron_VAE_embedding_dim

        # embedding layer
        self.embedding = nn.Embedding(self.num_chars, 256, padding_idx=0)
        self.embedding.weight.data.normal_(0, 0.3)

        # base model layers
        self.encoder = Encoder(self.encoder_in_features)
        self.decoder = Decoder(
            self.decoder_in_features,
            self.decoder_output_dim,
            self.r,
            self.memory_size,
            self.attention_type,
            self.windowing,
            self.attention_norm,
            self.prenet_type,
            self.prenet_dropout,
            self.use_forward_attn,
            self.transition_agent,
            self.forward_attn_mask,
            self.location_attn,
            self.attention_heads,
            self.separate_stopnet,
            self.max_decoder_steps,
        )
        self.postnet = PostCBHG(self.decoder_output_dim)
        self.last_linear = nn.Linear(self.postnet.cbhg.gru_features * 2, self.out_channels)

        # setup prenet dropout
        self.decoder.prenet.dropout_at_inference = self.prenet_dropout_at_inference

        # global style token layers
        if self.gst and self.use_gst:
            self.gst_layer = GST(
                num_mel=self.decoder_output_dim,
                num_heads=self.gst.gst_num_heads,
                num_style_tokens=self.gst.gst_num_style_tokens,
                gst_embedding_dim=self.gst.gst_embedding_dim,
            )

        # Capacitron layers
        if self.capacitron_vae and self.use_capacitron_vae:
            self.capacitron_vae_layer = CapacitronVAE(
                num_mel=self.decoder_output_dim,
                encoder_output_dim=self.encoder_in_features,
                capacitron_VAE_embedding_dim=self.capacitron_vae.capacitron_VAE_embedding_dim,
                speaker_embedding_dim=self.embedded_speaker_dim
                if self.use_speaker_embedding and self.capacitron_vae.capacitron_use_speaker_embedding
                else None,
                text_summary_embedding_dim=self.capacitron_vae.capacitron_text_summary_embedding_dim
                if self.capacitron_vae.capacitron_use_text_summary_embeddings
                else None,
            )

        # backward pass decoder
        if self.bidirectional_decoder:
            self._init_backward_decoder()
        # setup DDC
        if self.double_decoder_consistency:
            self.coarse_decoder = Decoder(
                self.decoder_in_features,
                self.decoder_output_dim,
                self.ddc_r,
                self.memory_size,
                self.attention_type,
                self.windowing,
                self.attention_norm,
                self.prenet_type,
                self.prenet_dropout,
                self.use_forward_attn,
                self.transition_agent,
                self.forward_attn_mask,
                self.location_attn,
                self.attention_heads,
                self.separate_stopnet,
                self.max_decoder_steps,
            )

    def forward(  # pylint: disable=dangerous-default-value
        self, text, text_lengths, mel_specs=None, mel_lengths=None, aux_input={"speaker_ids": None, "d_vectors": None}
    ):
        """
        Shapes:
            text: [B, T_in]
            text_lengths: [B]
            mel_specs: [B, T_out, C]
            mel_lengths: [B]
            aux_input: 'speaker_ids': [B, 1] and  'd_vectors':[B, C]
        """
        aux_input = self._format_aux_input(aux_input)
        outputs = {"alignments_backward": None, "decoder_outputs_backward": None}
        inputs = self.embedding(text)
        input_mask, output_mask = self.compute_masks(text_lengths, mel_lengths)
        # B x T_in x encoder_in_features
        encoder_outputs = self.encoder(inputs)
        # sequence masking
        encoder_outputs = encoder_outputs * input_mask.unsqueeze(2).expand_as(encoder_outputs)
        # global style token
        if self.gst and self.use_gst:
            # B x gst_dim
            encoder_outputs = self.compute_gst(encoder_outputs, mel_specs)
        # speaker embedding
        if self.use_speaker_embedding or self.use_d_vector_file:
            if not self.use_d_vector_file:
                # B x 1 x speaker_embed_dim
                embedded_speakers = self.speaker_embedding(aux_input["speaker_ids"])[:, None]
            else:
                # B x 1 x speaker_embed_dim
                embedded_speakers = torch.unsqueeze(aux_input["d_vectors"], 1)
            encoder_outputs = self._concat_speaker_embedding(encoder_outputs, embedded_speakers)
        # Capacitron
        if self.capacitron_vae and self.use_capacitron_vae:
            # B x capacitron_VAE_embedding_dim
            encoder_outputs, *capacitron_vae_outputs = self.compute_capacitron_VAE_embedding(
                encoder_outputs,
                reference_mel_info=[mel_specs, mel_lengths],
                text_info=[inputs, text_lengths]
                if self.capacitron_vae.capacitron_use_text_summary_embeddings
                else None,
                speaker_embedding=embedded_speakers if self.capacitron_vae.capacitron_use_speaker_embedding else None,
            )
        else:
            capacitron_vae_outputs = None
        # decoder_outputs: B x decoder_in_features x T_out
        # alignments: B x T_in x encoder_in_features
        # stop_tokens: B x T_in
        decoder_outputs, alignments, stop_tokens = self.decoder(encoder_outputs, mel_specs, input_mask)
        # sequence masking
        if output_mask is not None:
            decoder_outputs = decoder_outputs * output_mask.unsqueeze(1).expand_as(decoder_outputs)
        # B x T_out x decoder_in_features
        postnet_outputs = self.postnet(decoder_outputs)
        # sequence masking
        if output_mask is not None:
            postnet_outputs = postnet_outputs * output_mask.unsqueeze(2).expand_as(postnet_outputs)
        # B x T_out x posnet_dim
        postnet_outputs = self.last_linear(postnet_outputs)
        # B x T_out x decoder_in_features
        decoder_outputs = decoder_outputs.transpose(1, 2).contiguous()
        if self.bidirectional_decoder:
            decoder_outputs_backward, alignments_backward = self._backward_pass(mel_specs, encoder_outputs, input_mask)
            outputs["alignments_backward"] = alignments_backward
            outputs["decoder_outputs_backward"] = decoder_outputs_backward
        if self.double_decoder_consistency:
            decoder_outputs_backward, alignments_backward = self._coarse_decoder_pass(
                mel_specs, encoder_outputs, alignments, input_mask
            )
            outputs["alignments_backward"] = alignments_backward
            outputs["decoder_outputs_backward"] = decoder_outputs_backward
        outputs.update(
            {
                "model_outputs": postnet_outputs,
                "decoder_outputs": decoder_outputs,
                "alignments": alignments,
                "stop_tokens": stop_tokens,
                "capacitron_vae_outputs": capacitron_vae_outputs,
            }
        )
        return outputs

    @torch.no_grad()
    def inference(self, text_input, aux_input=None):
        aux_input = self._format_aux_input(aux_input)
        inputs = self.embedding(text_input)
        encoder_outputs = self.encoder(inputs)
        if self.gst and self.use_gst:
            # B x gst_dim
            encoder_outputs = self.compute_gst(encoder_outputs, aux_input["style_mel"], aux_input["d_vectors"])
        if self.capacitron_vae and self.use_capacitron_vae:
            if aux_input["style_text"] is not None:
                style_text_embedding = self.embedding(aux_input["style_text"])
                style_text_length = torch.tensor([style_text_embedding.size(1)], dtype=torch.int64).to(
                    encoder_outputs.device
                )  # pylint: disable=not-callable
            reference_mel_length = (
                torch.tensor([aux_input["style_mel"].size(1)], dtype=torch.int64).to(encoder_outputs.device)
                if aux_input["style_mel"] is not None
                else None
            )  # pylint: disable=not-callable
            # B x capacitron_VAE_embedding_dim
            encoder_outputs, *_ = self.compute_capacitron_VAE_embedding(
                encoder_outputs,
                reference_mel_info=[aux_input["style_mel"], reference_mel_length]
                if aux_input["style_mel"] is not None
                else None,
                text_info=[style_text_embedding, style_text_length] if aux_input["style_text"] is not None else None,
                speaker_embedding=aux_input["d_vectors"]
                if self.capacitron_vae.capacitron_use_speaker_embedding
                else None,
            )
        if self.num_speakers > 1:
            if not self.use_d_vector_file:
                # B x 1 x speaker_embed_dim
                embedded_speakers = self.speaker_embedding(aux_input["speaker_ids"])
                # reshape embedded_speakers
                if embedded_speakers.ndim == 1:
                    embedded_speakers = embedded_speakers[None, None, :]
                elif embedded_speakers.ndim == 2:
                    embedded_speakers = embedded_speakers[None, :]
            else:
                # B x 1 x speaker_embed_dim
                embedded_speakers = torch.unsqueeze(aux_input["d_vectors"], 1)
            encoder_outputs = self._concat_speaker_embedding(encoder_outputs, embedded_speakers)
        decoder_outputs, alignments, stop_tokens = self.decoder.inference(encoder_outputs)
        postnet_outputs = self.postnet(decoder_outputs)
        postnet_outputs = self.last_linear(postnet_outputs)
        decoder_outputs = decoder_outputs.transpose(1, 2)
        outputs = {
            "model_outputs": postnet_outputs,
            "decoder_outputs": decoder_outputs,
            "alignments": alignments,
            "stop_tokens": stop_tokens,
        }
        return outputs

    def before_backward_pass(self, loss_dict, optimizer) -> None:
        # Extracting custom training specific operations for capacitron
        # from the trainer
        if self.use_capacitron_vae:
            loss_dict["capacitron_vae_beta_loss"].backward()
            optimizer.first_step()

    def train_step(self, batch: Dict, criterion: torch.nn.Module) -> Tuple[Dict, Dict]:
        """Perform a single training step by fetching the right set of samples from the batch.

        Args:
            batch ([Dict]): A dictionary of input tensors.
            criterion ([torch.nn.Module]): Callable criterion to compute model loss.
        """
        text_input = batch["text_input"]
        text_lengths = batch["text_lengths"]
        mel_input = batch["mel_input"]
        mel_lengths = batch["mel_lengths"]
        linear_input = batch["linear_input"]
        stop_targets = batch["stop_targets"]
        stop_target_lengths = batch["stop_target_lengths"]
        speaker_ids = batch["speaker_ids"]
        d_vectors = batch["d_vectors"]

        aux_input = {"speaker_ids": speaker_ids, "d_vectors": d_vectors}
        outputs = self.forward(text_input, text_lengths, mel_input, mel_lengths, aux_input)

        # set the [alignment] lengths wrt reduction factor for guided attention
        if mel_lengths.max() % self.decoder.r != 0:
            alignment_lengths = (
                mel_lengths + (self.decoder.r - (mel_lengths.max() % self.decoder.r))
            ) // self.decoder.r
        else:
            alignment_lengths = mel_lengths // self.decoder.r

        # compute loss
        with autocast(enabled=False):  # use float32 for the criterion
            loss_dict = criterion(
                outputs["model_outputs"].float(),
                outputs["decoder_outputs"].float(),
                mel_input.float(),
                linear_input.float(),
                outputs["stop_tokens"].float(),
                stop_targets.float(),
                stop_target_lengths,
                outputs["capacitron_vae_outputs"] if self.capacitron_vae else None,
                mel_lengths,
                None if outputs["decoder_outputs_backward"] is None else outputs["decoder_outputs_backward"].float(),
                outputs["alignments"].float(),
                alignment_lengths,
                None if outputs["alignments_backward"] is None else outputs["alignments_backward"].float(),
                text_lengths,
            )

        # compute alignment error (the lower the better )
        align_error = 1 - alignment_diagonal_score(outputs["alignments"])
        loss_dict["align_error"] = align_error
        return outputs, loss_dict

    def get_optimizer(self) -> List:
        if self.use_capacitron_vae:
            return CapacitronOptimizer(self.config, self.named_parameters())
        return get_optimizer(self.config.optimizer, self.config.optimizer_params, self.config.lr, self)

    def get_scheduler(self, optimizer: object):
        opt = optimizer.primary_optimizer if self.use_capacitron_vae else optimizer
        return get_scheduler(self.config.lr_scheduler, self.config.lr_scheduler_params, opt)

    def before_gradient_clipping(self):
        if self.use_capacitron_vae:
            # Capacitron model specific gradient clipping
            model_params_to_clip = []
            for name, param in self.named_parameters():
                if param.requires_grad:
                    if name != "capacitron_vae_layer.beta":
                        model_params_to_clip.append(param)
            torch.nn.utils.clip_grad_norm_(model_params_to_clip, self.capacitron_vae.capacitron_grad_clip)

    def _create_logs(self, batch, outputs, ap):
        postnet_outputs = outputs["model_outputs"]
        decoder_outputs = outputs["decoder_outputs"]
        alignments = outputs["alignments"]
        alignments_backward = outputs["alignments_backward"]
        mel_input = batch["mel_input"]
        linear_input = batch["linear_input"]

        pred_linear_spec = postnet_outputs[0].data.cpu().numpy()
        pred_mel_spec = decoder_outputs[0].data.cpu().numpy()
        gt_linear_spec = linear_input[0].data.cpu().numpy()
        gt_mel_spec = mel_input[0].data.cpu().numpy()
        align_img = alignments[0].data.cpu().numpy()

        figures = {
            "pred_linear_spec": plot_spectrogram(pred_linear_spec, ap, output_fig=False),
            "real_linear_spec": plot_spectrogram(gt_linear_spec, ap, output_fig=False),
            "pred_mel_spec": plot_spectrogram(pred_mel_spec, ap, output_fig=False),
            "real_mel_spec": plot_spectrogram(gt_mel_spec, ap, output_fig=False),
            "alignment": plot_alignment(align_img, output_fig=False),
        }

        if self.bidirectional_decoder or self.double_decoder_consistency:
            figures["alignment_backward"] = plot_alignment(alignments_backward[0].data.cpu().numpy(), output_fig=False)

        # Sample audio
        audio = ap.inv_spectrogram(pred_linear_spec.T)
        return figures, {"audio": audio}

    def train_log(
        self, batch: dict, outputs: dict, logger: "Logger", assets: dict, steps: int
    ) -> None:  # pylint: disable=no-self-use
        figures, audios = self._create_logs(batch, outputs, self.ap)
        logger.train_figures(steps, figures)
        logger.train_audios(steps, audios, self.ap.sample_rate)

    def eval_step(self, batch: dict, criterion: nn.Module):
        return self.train_step(batch, criterion)

    def eval_log(self, batch: dict, outputs: dict, logger: "Logger", assets: dict, steps: int) -> None:
        figures, audios = self._create_logs(batch, outputs, self.ap)
        logger.eval_figures(steps, figures)
        logger.eval_audios(steps, audios, self.ap.sample_rate)

    @staticmethod
    def init_from_config(config: "TacotronConfig", samples: Union[List[List], List[Dict]] = None):
        """Initiate model from config

        Args:
            config (TacotronConfig): Model config.
            samples (Union[List[List], List[Dict]]): Training samples to parse speaker ids for training.
                Defaults to None.
        """
        from TTS.utils.audio import AudioProcessor

        ap = AudioProcessor.init_from_config(config)
        tokenizer, new_config = TTSTokenizer.init_from_config(config)
        speaker_manager = SpeakerManager.init_from_config(config, samples)
        return Tacotron(new_config, ap, tokenizer, speaker_manager)


================================================
FILE: TTS/tts/models/tacotron2.py
================================================
# coding: utf-8

from typing import Dict, List, Union

import torch
from torch import nn
from torch.cuda.amp.autocast_mode import autocast
from trainer.trainer_utils import get_optimizer, get_scheduler

from TTS.tts.layers.tacotron.capacitron_layers import CapacitronVAE
from TTS.tts.layers.tacotron.gst_layers import GST
from TTS.tts.layers.tacotron.tacotron2 import Decoder, Encoder, Postnet
from TTS.tts.models.base_tacotron import BaseTacotron
from TTS.tts.utils.measures import alignment_diagonal_score
from TTS.tts.utils.speakers import SpeakerManager
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.tts.utils.visual import plot_alignment, plot_spectrogram
from TTS.utils.capacitron_optimizer import CapacitronOptimizer


class Tacotron2(BaseTacotron):
    """Tacotron2 model implementation inherited from :class:`TTS.tts.models.base_tacotron.BaseTacotron`.

    Paper::
        https://arxiv.org/abs/1712.05884

    Paper abstract::
        This paper describes Tacotron 2, a neural network architecture for speech synthesis directly from text.
        The system is composed of a recurrent sequence-to-sequence feature prediction network that maps character
        embeddings to mel-scale spectrograms, followed by a modified WaveNet model acting as a vocoder to synthesize
        timedomain waveforms from those spectrograms. Our model achieves a mean opinion score (MOS) of 4.53 comparable
        to a MOS of 4.58 for professionally recorded speech. To validate our design choices, we present ablation
        studies of key components of our system and evaluate the impact of using mel spectrograms as the input to
        WaveNet instead of linguistic, duration, and F0 features. We further demonstrate that using a compact acoustic
        intermediate representation enables significant simplification of the WaveNet architecture.

    Check :class:`TTS.tts.configs.tacotron2_config.Tacotron2Config` for model arguments.

    Args:
        config (TacotronConfig):
            Configuration for the Tacotron2 model.
        speaker_manager (SpeakerManager):
            Speaker manager for multi-speaker training. Uuse only for multi-speaker training. Defaults to None.
    """

    def __init__(
        self,
        config: "Tacotron2Config",
        ap: "AudioProcessor" = None,
        tokenizer: "TTSTokenizer" = None,
        speaker_manager: SpeakerManager = None,
    ):
        super().__init__(config, ap, tokenizer, speaker_manager)

        self.decoder_output_dim = config.out_channels

        # pass all config fields to `self`
        # for fewer code change
        for key in config:
            setattr(self, key, config[key])

        # init multi-speaker layers
        if self.use_speaker_embedding or self.use_d_vector_file:
            self.init_multispeaker(config)
            self.decoder_in_features += self.embedded_speaker_dim  # add speaker embedding dim

        if self.use_gst:
            self.decoder_in_features += self.gst.gst_embedding_dim

        if self.use_capacitron_vae:
            self.decoder_in_features += self.capacitron_vae.capacitron_VAE_embedding_dim

        # embedding layer
        self.embedding = nn.Embedding(self.num_chars, 512, padding_idx=0)

        # base model layers
        self.encoder = Encoder(self.encoder_in_features)

        self.decoder = Decoder(
            self.decoder_in_features,
            self.decoder_output_dim,
            self.r,
            self.attention_type,
            self.attention_win,
            self.attention_norm,
            self.prenet_type,
            self.prenet_dropout,
            self.use_forward_attn,
            self.transition_agent,
            self.forward_attn_mask,
            self.location_attn,
            self.attention_heads,
            self.separate_stopnet,
            self.max_decoder_steps,
        )
        self.postnet = Postnet(self.out_channels)

        # setup prenet dropout
        self.decoder.prenet.dropout_at_inference = self.prenet_dropout_at_inference

        # global style token layers
        if self.gst and self.use_gst:
            self.gst_layer = GST(
                num_mel=self.decoder_output_dim,
                num_heads=self.gst.gst_num_heads,
                num_style_tokens=self.gst.gst_num_style_tokens,
                gst_embedding_dim=self.gst.gst_embedding_dim,
            )

        # Capacitron VAE Layers
        if self.capacitron_vae and self.use_capacitron_vae:
            self.capacitron_vae_layer = CapacitronVAE(
                num_mel=self.decoder_output_dim,
                encoder_output_dim=self.encoder_in_features,
                capacitron_VAE_embedding_dim=self.capacitron_vae.capacitron_VAE_embedding_dim,
                speaker_embedding_dim=self.embedded_speaker_dim
                if self.capacitron_vae.capacitron_use_speaker_embedding
                else None,
                text_summary_embedding_dim=self.capacitron_vae.capacitron_text_summary_embedding_dim
                if self.capacitron_vae.capacitron_use_text_summary_embeddings
                else None,
            )

        # backward pass decoder
        if self.bidirectional_decoder:
            self._init_backward_decoder()
        # setup DDC
        if self.double_decoder_consistency:
            self.coarse_decoder = Decoder(
                self.decoder_in_features,
                self.decoder_output_dim,
                self.ddc_r,
                self.attention_type,
                self.attention_win,
                self.attention_norm,
                self.prenet_type,
                self.prenet_dropout,
                self.use_forward_attn,
                self.transition_agent,
                self.forward_attn_mask,
                self.location_attn,
                self.attention_heads,
                self.separate_stopnet,
                self.max_decoder_steps,
            )

    @staticmethod
    def shape_outputs(mel_outputs, mel_outputs_postnet, alignments):
        """Final reshape of the model output tensors."""
        mel_outputs = mel_outputs.transpose(1, 2)
        mel_outputs_postnet = mel_outputs_postnet.transpose(1, 2)
        return mel_outputs, mel_outputs_postnet, alignments

    def forward(  # pylint: disable=dangerous-default-value
        self, text, text_lengths, mel_specs=None, mel_lengths=None, aux_input={"speaker_ids": None, "d_vectors": None}
    ):
        """Forward pass for training with Teacher Forcing.

        Shapes:
            text: :math:`[B, T_in]`
            text_lengths: :math:`[B]`
            mel_specs: :math:`[B, T_out, C]`
            mel_lengths: :math:`[B]`
            aux_input: 'speaker_ids': :math:`[B, 1]` and  'd_vectors': :math:`[B, C]`
        """
        aux_input = self._format_aux_input(aux_input)
        outputs = {"alignments_backward": None, "decoder_outputs_backward": None}
        # compute mask for padding
        # B x T_in_max (boolean)
        input_mask, output_mask = self.compute_masks(text_lengths, mel_lengths)
        # B x D_embed x T_in_max
        embedded_inputs = self.embedding(text).transpose(1, 2)
        # B x T_in_max x D_en
        encoder_outputs = self.encoder(embedded_inputs, text_lengths)
        if self.gst and self.use_gst:
            # B x gst_dim
            encoder_outputs = self.compute_gst(encoder_outputs, mel_specs)

        if self.use_speaker_embedding or self.use_d_vector_file:
            if not self.use_d_vector_file:
                # B x 1 x speaker_embed_dim
                embedded_speakers = self.speaker_embedding(aux_input["speaker_ids"])[:, None]
            else:
                # B x 1 x speaker_embed_dim
                embedded_speakers = torch.unsqueeze(aux_input["d_vectors"], 1)
            encoder_outputs = self._concat_speaker_embedding(encoder_outputs, embedded_speakers)

        # capacitron
        if self.capacitron_vae and self.use_capacitron_vae:
            # B x capacitron_VAE_embedding_dim
            encoder_outputs, *capacitron_vae_outputs = self.compute_capacitron_VAE_embedding(
                encoder_outputs,
                reference_mel_info=[mel_specs, mel_lengths],
                text_info=[embedded_inputs.transpose(1, 2), text_lengths]
                if self.capacitron_vae.capacitron_use_text_summary_embeddings
                else None,
                speaker_embedding=embedded_speakers if self.capacitron_vae.capacitron_use_speaker_embedding else None,
            )
        else:
            capacitron_vae_outputs = None

        encoder_outputs = encoder_outputs * input_mask.unsqueeze(2).expand_as(encoder_outputs)

        # B x mel_dim x T_out -- B x T_out//r x T_in -- B x T_out//r
        decoder_outputs, alignments, stop_tokens = self.decoder(encoder_outputs, mel_specs, input_mask)
        # sequence masking
        if mel_lengths is not None:
            decoder_outputs = decoder_outputs * output_mask.unsqueeze(1).expand_as(decoder_outputs)
        # B x mel_dim x T_out
        postnet_outputs = self.postnet(decoder_outputs)
        postnet_outputs = decoder_outputs + postnet_outputs
        # sequence masking
        if output_mask is not None:
            postnet_outputs = postnet_outputs * output_mask.unsqueeze(1).expand_as(postnet_outputs)
        # B x T_out x mel_dim -- B x T_out x mel_dim -- B x T_out//r x T_in
        decoder_outputs, postnet_outputs, alignments = self.shape_outputs(decoder_outputs, postnet_outputs, alignments)
        if self.bidirectional_decoder:
            decoder_outputs_backward, alignments_backward = self._backward_pass(mel_specs, encoder_outputs, input_mask)
            outputs["alignments_backward"] = alignments_backward
            outputs["decoder_outputs_backward"] = decoder_outputs_backward
        if self.double_decoder_consistency:
            decoder_outputs_backward, alignments_backward = self._coarse_decoder_pass(
                mel_specs, encoder_outputs, alignments, input_mask
            )
            outputs["alignments_backward"] = alignments_backward
            outputs["decoder_outputs_backward"] = decoder_outputs_backward
        outputs.update(
            {
                "model_outputs": postnet_outputs,
                "decoder_outputs": decoder_outputs,
                "alignments": alignments,
                "stop_tokens": stop_tokens,
                "capacitron_vae_outputs": capacitron_vae_outputs,
            }
        )
        return outputs

    @torch.no_grad()
    def inference(self, text, aux_input=None):
        """Forward pass for inference with no Teacher-Forcing.

        Shapes:
           text: :math:`[B, T_in]`
           text_lengths: :math:`[B]`
        """
        aux_input = self._format_aux_input(aux_input)
        embedded_inputs = self.embedding(text).transpose(1, 2)
        encoder_outputs = self.encoder.inference(embedded_inputs)

        if self.gst and self.use_gst:
            # B x gst_dim
            encoder_outputs = self.compute_gst(encoder_outputs, aux_input["style_mel"], aux_input["d_vectors"])

        if self.capacitron_vae and self.use_capacitron_vae:
            if aux_input["style_text"] is not None:
                style_text_embedding = self.embedding(aux_input["style_text"])
                style_text_length = torch.tensor([style_text_embedding.size(1)], dtype=torch.int64).to(
                    encoder_outputs.device
                )  # pylint: disable=not-callable
            reference_mel_length = (
                torch.tensor([aux_input["style_mel"].size(1)], dtype=torch.int64).to(encoder_outputs.device)
                if aux_input["style_mel"] is not None
                else None
            )  # pylint: disable=not-callable
            # B x capacitron_VAE_embedding_dim
            encoder_outputs, *_ = self.compute_capacitron_VAE_embedding(
                encoder_outputs,
                reference_mel_info=[aux_input["style_mel"], reference_mel_length]
                if aux_input["style_mel"] is not None
                else None,
                text_info=[style_text_embedding, style_text_length] if aux_input["style_text"] is not None else None,
                speaker_embedding=aux_input["d_vectors"]
                if self.capacitron_vae.capacitron_use_speaker_embedding
                else None,
            )

        if self.num_speakers > 1:
            if not self.use_d_vector_file:
                embedded_speakers = self.speaker_embedding(aux_input["speaker_ids"])[None]
                # reshape embedded_speakers
                if embedded_speakers.ndim == 1:
                    embedded_speakers = embedded_speakers[None, None, :]
                elif embedded_speakers.ndim == 2:
                    embedded_speakers = embedded_speakers[None, :]
            else:
                embedded_speakers = aux_input["d_vectors"]

            encoder_outputs = self._concat_speaker_embedding(encoder_outputs, embedded_speakers)

        decoder_outputs, alignments, stop_tokens = self.decoder.inference(encoder_outputs)
        postnet_outputs = self.postnet(decoder_outputs)
        postnet_outputs = decoder_outputs + postnet_outputs
        decoder_outputs, postnet_outputs, alignments = self.shape_outputs(decoder_outputs, postnet_outputs, alignments)
        outputs = {
            "model_outputs": postnet_outputs,
            "decoder_outputs": decoder_outputs,
            "alignments": alignments,
            "stop_tokens": stop_tokens,
        }
        return outputs

    def before_backward_pass(self, loss_dict, optimizer) -> None:
        # Extracting custom training specific operations for capacitron
        # from the trainer
        if self.use_capacitron_vae:
            loss_dict["capacitron_vae_beta_loss"].backward()
            optimizer.first_step()

    def train_step(self, batch: Dict, criterion: torch.nn.Module):
        """A single training step. Forward pass and loss computation.

        Args:
            batch ([Dict]): A dictionary of input tensors.
            criterion ([type]): Callable criterion to compute model loss.
        """
        text_input = batch["text_input"]
        text_lengths = batch["text_lengths"]
        mel_input = batch["mel_input"]
        mel_lengths = batch["mel_lengths"]
        stop_targets = batch["stop_targets"]
        stop_target_lengths = batch["stop_target_lengths"]
        speaker_ids = batch["speaker_ids"]
        d_vectors = batch["d_vectors"]

        aux_input = {"speaker_ids": speaker_ids, "d_vectors": d_vectors}
        outputs = self.forward(text_input, text_lengths, mel_input, mel_lengths, aux_input)

        # set the [alignment] lengths wrt reduction factor for guided attention
        if mel_lengths.max() % self.decoder.r != 0:
            alignment_lengths = (
                mel_lengths + (self.decoder.r - (mel_lengths.max() % self.decoder.r))
            ) // self.decoder.r
        else:
            alignment_lengths = mel_lengths // self.decoder.r

        # compute loss
        with autocast(enabled=False):  # use float32 for the criterion
            loss_dict = criterion(
                outputs["model_outputs"].float(),
                outputs["decoder_outputs"].float(),
                mel_input.float(),
                None,
                outputs["stop_tokens"].float(),
                stop_targets.float(),
                stop_target_lengths,
                outputs["capacitron_vae_outputs"] if self.capacitron_vae else None,
                mel_lengths,
                None if outputs["decoder_outputs_backward"] is None else outputs["decoder_outputs_backward"].float(),
                outputs["alignments"].float(),
                alignment_lengths,
                None if outputs["alignments_backward"] is None else outputs["alignments_backward"].float(),
                text_lengths,
            )

        # compute alignment error (the lower the better )
        align_error = 1 - alignment_diagonal_score(outputs["alignments"])
        loss_dict["align_error"] = align_error
        return outputs, loss_dict

    def get_optimizer(self) -> List:
        if self.use_capacitron_vae:
            return CapacitronOptimizer(self.config, self.named_parameters())
        return get_optimizer(self.config.optimizer, self.config.optimizer_params, self.config.lr, self)

    def get_scheduler(self, optimizer: object):
        opt = optimizer.primary_optimizer if self.use_capacitron_vae else optimizer
        return get_scheduler(self.config.lr_scheduler, self.config.lr_scheduler_params, opt)

    def before_gradient_clipping(self):
        if self.use_capacitron_vae:
            # Capacitron model specific gradient clipping
            model_params_to_clip = []
            for name, param in self.named_parameters():
                if param.requires_grad:
                    if name != "capacitron_vae_layer.beta":
                        model_params_to_clip.append(param)
            torch.nn.utils.clip_grad_norm_(model_params_to_clip, self.capacitron_vae.capacitron_grad_clip)

    def _create_logs(self, batch, outputs, ap):
        """Create dashboard log information."""
        postnet_outputs = outputs["model_outputs"]
        alignments = outputs["alignments"]
        alignments_backward = outputs["alignments_backward"]
        mel_input = batch["mel_input"]

        pred_spec = postnet_outputs[0].data.cpu().numpy()
        gt_spec = mel_input[0].data.cpu().numpy()
        align_img = alignments[0].data.cpu().numpy()

        figures = {
            "prediction": plot_spectrogram(pred_spec, ap, output_fig=False),
            "ground_truth": plot_spectrogram(gt_spec, ap, output_fig=False),
            "alignment": plot_alignment(align_img, output_fig=False),
        }

        if self.bidirectional_decoder or self.double_decoder_consistency:
            figures["alignment_backward"] = plot_alignment(alignments_backward[0].data.cpu().numpy(), output_fig=False)

        # Sample audio
        audio = ap.inv_melspectrogram(pred_spec.T)
        return figures, {"audio": audio}

    def train_log(
        self, batch: dict, outputs: dict, logger: "Logger", assets: dict, steps: int
    ) -> None:  # pylint: disable=no-self-use
        """Log training progress."""
        figures, audios = self._create_logs(batch, outputs, self.ap)
        logger.train_figures(steps, figures)
        logger.train_audios(steps, audios, self.ap.sample_rate)

    def eval_step(self, batch: dict, criterion: nn.Module):
        return self.train_step(batch, criterion)

    def eval_log(self, batch: dict, outputs: dict, logger: "Logger", assets: dict, steps: int) -> None:
        figures, audios = self._create_logs(batch, outputs, self.ap)
        logger.eval_figures(steps, figures)
        logger.eval_audios(steps, audios, self.ap.sample_rate)

    @staticmethod
    def init_from_config(config: "Tacotron2Config", samples: Union[List[List], List[Dict]] = None):
        """Initiate model from config

        Args:
            config (Tacotron2Config): Model config.
            samples (Union[List[List], List[Dict]]): Training samples to parse speaker ids for training.
                Defaults to None.
        """
        from TTS.utils.audio import AudioProcessor

        ap = AudioProcessor.init_from_config(config)
        tokenizer, new_config = TTSTokenizer.init_from_config(config)
        speaker_manager = SpeakerManager.init_from_config(new_config, samples)
        return Tacotron2(new_config, ap, tokenizer, speaker_manager)


================================================
FILE: TTS/tts/models/vits.py
================================================
import math
import os
from dataclasses import dataclass, field, replace
from itertools import chain
from typing import Dict, List, Tuple, Union

import numpy as np
import torch
import torch.distributed as dist
import torchaudio
from coqpit import Coqpit
from librosa.filters import mel as librosa_mel_fn
from torch import nn
from torch.cuda.amp.autocast_mode import autocast
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torch.utils.data.sampler import WeightedRandomSampler
from trainer.torch import DistributedSampler, DistributedSamplerWrapper
from trainer.trainer_utils import get_optimizer, get_scheduler

from TTS.tts.configs.shared_configs import CharactersConfig
from TTS.tts.datasets.dataset import TTSDataset, _parse_sample
from TTS.tts.layers.glow_tts.duration_predictor import DurationPredictor
from TTS.tts.layers.vits.discriminator import VitsDiscriminator
from TTS.tts.layers.vits.networks import PosteriorEncoder, ResidualCouplingBlocks, TextEncoder
from TTS.tts.layers.vits.stochastic_duration_predictor import StochasticDurationPredictor
from TTS.tts.models.base_tts import BaseTTS
from TTS.tts.utils.helpers import generate_path, maximum_path, rand_segments, segment, sequence_mask
from TTS.tts.utils.languages import LanguageManager
from TTS.tts.utils.speakers import SpeakerManager
from TTS.tts.utils.synthesis import synthesis
from TTS.tts.utils.text.characters import BaseCharacters, _characters, _pad, _phonemes, _punctuations
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.tts.utils.visual import plot_alignment
from TTS.utils.io import load_fsspec
from TTS.utils.samplers import BucketBatchSampler
from TTS.vocoder.models.hifigan_generator import HifiganGenerator
from TTS.vocoder.utils.generic_utils import plot_results

##############################
# IO / Feature extraction
##############################

# pylint: disable=global-statement
hann_window = {}
mel_basis = {}


@torch.no_grad()
def weights_reset(m: nn.Module):
    # check if the current module has reset_parameters and if it is reset the weight
    reset_parameters = getattr(m, "reset_parameters", None)
    if callable(reset_parameters):
        m.reset_parameters()


def get_module_weights_sum(mdl: nn.Module):
    dict_sums = {}
    for name, w in mdl.named_parameters():
        if "weight" in name:
            value = w.data.sum().item()
            dict_sums[name] = value
    return dict_sums


def load_audio(file_path):
    """Load the audio file normalized in [-1, 1]

    Return Shapes:
        - x: :math:`[1, T]`
    """
    x, sr = torchaudio.load(file_path)
    assert (x > 1).sum() + (x < -1).sum() == 0
    return x, sr


def _amp_to_db(x, C=1, clip_val=1e-5):
    return torch.log(torch.clamp(x, min=clip_val) * C)


def _db_to_amp(x, C=1):
    return torch.exp(x) / C


def amp_to_db(magnitudes):
    output = _amp_to_db(magnitudes)
    return output


def db_to_amp(magnitudes):
    output = _db_to_amp(magnitudes)
    return output


def wav_to_spec(y, n_fft, hop_length, win_length, center=False):
    """
    Args Shapes:
        - y : :math:`[B, 1, T]`

    Return Shapes:
        - spec : :math:`[B,C,T]`
    """
    y = y.squeeze(1)

    if torch.min(y) < -1.0:
        print("min value is ", torch.min(y))
    if torch.max(y) > 1.0:
        print("max value is ", torch.max(y))

    global hann_window
    dtype_device = str(y.dtype) + "_" + str(y.device)
    wnsize_dtype_device = str(win_length) + "_" + dtype_device
    if wnsize_dtype_device not in hann_window:
        hann_window[wnsize_dtype_device] = torch.hann_window(win_length).to(dtype=y.dtype, device=y.device)

    y = torch.nn.functional.pad(
        y.unsqueeze(1),
        (int((n_fft - hop_length) / 2), int((n_fft - hop_length) / 2)),
        mode="reflect",
    )
    y = y.squeeze(1)

    spec = torch.stft(
        y,
        n_fft,
        hop_length=hop_length,
        win_length=win_length,
        window=hann_window[wnsize_dtype_device],
        center=center,
        pad_mode="reflect",
        normalized=False,
        onesided=True,
        return_complex=False,
    )

    spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-6)
    return spec


def spec_to_mel(spec, n_fft, num_mels, sample_rate, fmin, fmax):
    """
    Args Shapes:
        - spec : :math:`[B,C,T]`

    Return Shapes:
        - mel : :math:`[B,C,T]`
    """
    global mel_basis
    dtype_device = str(spec.dtype) + "_" + str(spec.device)
    fmax_dtype_device = str(fmax) + "_" + dtype_device
    if fmax_dtype_device not in mel_basis:
        mel = librosa_mel_fn(sample_rate, n_fft, num_mels, fmin, fmax)
        mel_basis[fmax_dtype_device] = torch.from_numpy(mel).to(dtype=spec.dtype, device=spec.device)
    mel = torch.matmul(mel_basis[fmax_dtype_device], spec)
    mel = amp_to_db(mel)
    return mel


def wav_to_mel(y, n_fft, num_mels, sample_rate, hop_length, win_length, fmin, fmax, center=False):
    """
    Args Shapes:
        - y : :math:`[B, 1, T]`

    Return Shapes:
        - spec : :math:`[B,C,T]`
    """
    y = y.squeeze(1)

    if torch.min(y) < -1.0:
        print("min value is ", torch.min(y))
    if torch.max(y) > 1.0:
        print("max value is ", torch.max(y))

    global mel_basis, hann_window
    dtype_device = str(y.dtype) + "_" + str(y.device)
    fmax_dtype_device = str(fmax) + "_" + dtype_device
    wnsize_dtype_device = str(win_length) + "_" + dtype_device
    if fmax_dtype_device not in mel_basis:
        mel = librosa_mel_fn(sample_rate, n_fft, num_mels, fmin, fmax)
        mel_basis[fmax_dtype_device] = torch.from_numpy(mel).to(dtype=y.dtype, device=y.device)
    if wnsize_dtype_device not in hann_window:
        hann_window[wnsize_dtype_device] = torch.hann_window(win_length).to(dtype=y.dtype, device=y.device)

    y = torch.nn.functional.pad(
        y.unsqueeze(1),
        (int((n_fft - hop_length) / 2), int((n_fft - hop_length) / 2)),
        mode="reflect",
    )
    y = y.squeeze(1)

    spec = torch.stft(
        y,
        n_fft,
        hop_length=hop_length,
        win_length=win_length,
        window=hann_window[wnsize_dtype_device],
        center=center,
        pad_mode="reflect",
        normalized=False,
        onesided=True,
        return_complex=False,
    )

    spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-6)
    spec = torch.matmul(mel_basis[fmax_dtype_device], spec)
    spec = amp_to_db(spec)
    return spec


#############################
# CONFIGS
#############################


@dataclass
class VitsAudioConfig(Coqpit):
    fft_size: int = 1024
    sample_rate: int = 22050
    win_length: int = 1024
    hop_length: int = 256
    num_mels: int = 80
    mel_fmin: int = 0
    mel_fmax: int = None


##############################
# DATASET
##############################


def get_attribute_balancer_weights(items: list, attr_name: str, multi_dict: dict = None):
    """Create inverse frequency weights for balancing the dataset.
    Use `multi_dict` to scale relative weights."""
    attr_names_samples = np.array([item[attr_name] for item in items])
    unique_attr_names = np.unique(attr_names_samples).tolist()
    attr_idx = [unique_attr_names.index(l) for l in attr_names_samples]
    attr_count = np.array([len(np.where(attr_names_samples == l)[0]) for l in unique_attr_names])
    weight_attr = 1.0 / attr_count
    dataset_samples_weight = np.array([weight_attr[l] for l in attr_idx])
    dataset_samples_weight = dataset_samples_weight / np.linalg.norm(dataset_samples_weight)
    if multi_dict is not None:
        # check if all keys are in the multi_dict
        for k in multi_dict:
            assert k in unique_attr_names, f"{k} not in {unique_attr_names}"
        # scale weights
        multiplier_samples = np.array([multi_dict.get(item[attr_name], 1.0) for item in items])
        dataset_samples_weight *= multiplier_samples
    return (
        torch.from_numpy(dataset_samples_weight).float(),
        unique_attr_names,
        np.unique(dataset_samples_weight).tolist(),
    )


class VitsDataset(TTSDataset):
    def __init__(self, model_args, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.pad_id = self.tokenizer.characters.pad_id
        self.model_args = model_args

    def __getitem__(self, idx):
        item = self.samples[idx]
        raw_text = item["text"]

        wav, _ = load_audio(item["audio_file"])
        if self.model_args.encoder_sample_rate is not None:
            if wav.size(1) % self.model_args.encoder_sample_rate != 0:
                wav = wav[:, : -int(wav.size(1) % self.model_args.encoder_sample_rate)]

        wav_filename = os.path.basename(item["audio_file"])

        token_ids = self.get_token_ids(idx, item["text"])

        # after phonemization the text length may change
        # this is a shameful 🤭 hack to prevent longer phonemes
        # TODO: find a better fix
        if len(token_ids) > self.max_text_len or wav.shape[1] < self.min_audio_len:
            self.rescue_item_idx += 1
            return self.__getitem__(self.rescue_item_idx)

        return {
            "raw_text": raw_text,
            "token_ids": token_ids,
            "token_len": len(token_ids),
            "wav": wav,
            "wav_file": wav_filename,
            "speaker_name": item["speaker_name"],
            "language_name": item["language"],
            "audio_unique_name": item["audio_unique_name"],
        }

    @property
    def lengths(self):
        lens = []
        for item in self.samples:
            _, wav_file, *_ = _parse_sample(item)
            audio_len = os.path.getsize(wav_file) / 16 * 8  # assuming 16bit audio
            lens.append(audio_len)
        return lens

    def collate_fn(self, batch):
        """
        Return Shapes:
            - tokens: :math:`[B, T]`
            - token_lens :math:`[B]`
            - token_rel_lens :math:`[B]`
            - waveform: :math:`[B, 1, T]`
            - waveform_lens: :math:`[B]`
            - waveform_rel_lens: :math:`[B]`
            - speaker_names: :math:`[B]`
            - language_names: :math:`[B]`
            - audiofile_paths: :math:`[B]`
            - raw_texts: :math:`[B]`
            - audio_unique_names: :math:`[B]`
        """
        # convert list of dicts to dict of lists
        B = len(batch)
        batch = {k: [dic[k] for dic in batch] for k in batch[0]}

        _, ids_sorted_decreasing = torch.sort(
            torch.LongTensor([x.size(1) for x in batch["wav"]]), dim=0, descending=True
        )

        max_text_len = max([len(x) for x in batch["token_ids"]])
        token_lens = torch.LongTensor(batch["token_len"])
        token_rel_lens = token_lens / token_lens.max()

        wav_lens = [w.shape[1] for w in batch["wav"]]
        wav_lens = torch.LongTensor(wav_lens)
        wav_lens_max = torch.max(wav_lens)
        wav_rel_lens = wav_lens / wav_lens_max

        token_padded = torch.LongTensor(B, max_text_len)
        wav_padded = torch.FloatTensor(B, 1, wav_lens_max)
        token_padded = token_padded.zero_() + self.pad_id
        wav_padded = wav_padded.zero_() + self.pad_id
        for i in range(len(ids_sorted_decreasing)):
            token_ids = batch["token_ids"][i]
            token_padded[i, : batch["token_len"][i]] = torch.LongTensor(token_ids)

            wav = batch["wav"][i]
            wav_padded[i, :, : wav.size(1)] = torch.FloatTensor(wav)

        return {
            "tokens": token_padded,
            "token_lens": token_lens,
            "token_rel_lens": token_rel_lens,
            "waveform": wav_padded,  # (B x T)
            "waveform_lens": wav_lens,  # (B)
            "waveform_rel_lens": wav_rel_lens,
            "speaker_names": batch["speaker_name"],
            "language_names": batch["language_name"],
            "audio_files": batch["wav_file"],
            "raw_text": batch["raw_text"],
            "audio_unique_names": batch["audio_unique_name"],
        }


##############################
# MODEL DEFINITION
##############################


@dataclass
class VitsArgs(Coqpit):
    """VITS model arguments.

    Args:

        num_chars (int):
            Number of characters in the vocabulary. Defaults to 100.

        out_channels (int):
            Number of output channels of the decoder. Defaults to 513.

        spec_segment_size (int):
            Decoder input segment size. Defaults to 32 `(32 * hoplength = waveform length)`.

        hidden_channels (int):
            Number of hidden channels of the model. Defaults to 192.

        hidden_channels_ffn_text_encoder (int):
            Number of hidden channels of the feed-forward layers of the text encoder transformer. Defaults to 256.

        num_heads_text_encoder (int):
            Number of attention heads of the text encoder transformer. Defaults to 2.

        num_layers_text_encoder (int):
            Number of transformer layers in the text encoder. Defaults to 6.

        kernel_size_text_encoder (int):
            Kernel size of the text encoder transformer FFN layers. Defaults to 3.

        dropout_p_text_encoder (float):
            Dropout rate of the text encoder. Defaults to 0.1.

        dropout_p_duration_predictor (float):
            Dropout rate of the duration predictor. Defaults to 0.1.

        kernel_size_posterior_encoder (int):
            Kernel size of the posterior encoder's WaveNet layers. Defaults to 5.

        dilatation_posterior_encoder (int):
            Dilation rate of the posterior encoder's WaveNet layers. Defaults to 1.

        num_layers_posterior_encoder (int):
            Number of posterior encoder's WaveNet layers. Defaults to 16.

        kernel_size_flow (int):
            Kernel size of the Residual Coupling layers of the flow network. Defaults to 5.

        dilatation_flow (int):
            Dilation rate of the Residual Coupling WaveNet layers of the flow network. Defaults to 1.

        num_layers_flow (int):
            Number of Residual Coupling WaveNet layers of the flow network. Defaults to 6.

        resblock_type_decoder (str):
            Type of the residual block in the decoder network. Defaults to "1".

        resblock_kernel_sizes_decoder (List[int]):
            Kernel sizes of the residual blocks in the decoder network. Defaults to `[3, 7, 11]`.

        resblock_dilation_sizes_decoder (List[List[int]]):
            Dilation sizes of the residual blocks in the decoder network. Defaults to `[[1, 3, 5], [1, 3, 5], [1, 3, 5]]`.

        upsample_rates_decoder (List[int]):
            Upsampling rates for each concecutive upsampling layer in the decoder network. The multiply of these
            values must be equal to the kop length used for computing spectrograms. Defaults to `[8, 8, 2, 2]`.

        upsample_initial_channel_decoder (int):
            Number of hidden channels of the first upsampling convolution layer of the decoder network. Defaults to 512.

        upsample_kernel_sizes_decoder (List[int]):
            Kernel sizes for each upsampling layer of the decoder network. Defaults to `[16, 16, 4, 4]`.

        periods_multi_period_discriminator (List[int]):
            Periods values for Vits Multi-Period Discriminator. Defaults to `[2, 3, 5, 7, 11]`.

        use_sdp (bool):
            Use Stochastic Duration Predictor. Defaults to True.

        noise_scale (float):
            Noise scale used for the sample noise tensor in training. Defaults to 1.0.

        inference_noise_scale (float):
            Noise scale used for the sample noise tensor in inference. Defaults to 0.667.

        length_scale (float):
            Scale factor for the predicted duration values. Smaller values result faster speech. Defaults to 1.

        noise_scale_dp (float):
            Noise scale used by the Stochastic Duration Predictor sample noise in training. Defaults to 1.0.

        inference_noise_scale_dp (float):
            Noise scale for the Stochastic Duration Predictor in inference. Defaults to 0.8.

        max_inference_len (int):
            Maximum inference length to limit the memory use. Defaults to None.

        init_discriminator (bool):
            Initialize the disciminator network if set True. Set False for inference. Defaults to True.

        use_spectral_norm_disriminator (bool):
            Use spectral normalization over weight norm in the discriminator. Defaults to False.

        use_speaker_embedding (bool):
            Enable/Disable speaker embedding for multi-speaker models. Defaults to False.

        num_speakers (int):
            Number of speakers for the speaker embedding layer. Defaults to 0.

        speakers_file (str):
            Path to the speaker mapping file for the Speaker Manager. Defaults to None.

        speaker_embedding_channels (int):
            Number of speaker embedding channels. Defaults to 256.

        use_d_vector_file (bool):
            Enable/Disable the use of d-vectors for multi-speaker training. Defaults to False.

        d_vector_file (List[str]):
            List of paths to the files including pre-computed speaker embeddings. Defaults to None.

        d_vector_dim (int):
            Number of d-vector channels. Defaults to 0.

        detach_dp_input (bool):
            Detach duration predictor's input from the network for stopping the gradients. Defaults to True.

        use_language_embedding (bool):
            Enable/Disable language embedding for multilingual models. Defaults to False.

        embedded_language_dim (int):
            Number of language embedding channels. Defaults to 4.

        num_languages (int):
            Number of languages for the language embedding layer. Defaults to 0.

        language_ids_file (str):
            Path to the language mapping file for the Language Manager. Defaults to None.

        use_speaker_encoder_as_loss (bool):
            Enable/Disable Speaker Consistency Loss (SCL). Defaults to False.

        speaker_encoder_config_path (str):
            Path to the file speaker encoder config file, to use for SCL. Defaults to "".

        speaker_encoder_model_path (str):
            Path to the file speaker encoder checkpoint file, to use for SCL. Defaults to "".

        condition_dp_on_speaker (bool):
            Condition the duration predictor on the speaker embedding. Defaults to True.

        freeze_encoder (bool):
            Freeze the encoder weigths during training. Defaults to False.

        freeze_DP (bool):
            Freeze the duration predictor weigths during training. Defaults to False.

        freeze_PE (bool):
            Freeze the posterior encoder weigths during training. Defaults to False.

        freeze_flow_encoder (bool):
            Freeze the flow encoder weigths during training. Defaults to False.

        freeze_waveform_decoder (bool):
            Freeze the waveform decoder weigths during training. Defaults to False.

        encoder_sample_rate (int):
            If not None this sample rate will be used for training the Posterior Encoder,
            flow, text_encoder and duration predictor. The decoder part (vocoder) will be
            trained with the `config.audio.sample_rate`. Defaults to None.

        interpolate_z (bool):
            If `encoder_sample_rate` not None and  this parameter True the nearest interpolation
            will be used to upsampling the latent variable z with the sampling rate `encoder_sample_rate`
            to the `config.audio.sample_rate`. If it is False you will need to add extra
            `upsample_rates_decoder` to match the shape. Defaults to True.

    """

    num_chars: int = 100
    out_channels: int = 513
    spec_segment_size: int = 32
    hidden_channels: int = 192
    hidden_channels_ffn_text_encoder: int = 768
    num_heads_text_encoder: int = 2
    num_layers_text_encoder: int = 6
    kernel_size_text_encoder: int = 3
    dropout_p_text_encoder: float = 0.1
    dropout_p_duration_predictor: float = 0.5
    kernel_size_posterior_encoder: int = 5
    dilation_rate_posterior_encoder: int = 1
    num_layers_posterior_encoder: int = 16
    kernel_size_flow: int = 5
    dilation_rate_flow: int = 1
    num_layers_flow: int = 4
    resblock_type_decoder: str = "1"
    resblock_kernel_sizes_decoder: List[int] = field(default_factory=lambda: [3, 7, 11])
    resblock_dilation_sizes_decoder: List[List[int]] = field(default_factory=lambda: [[1, 3, 5], [1, 3, 5], [1, 3, 5]])
    upsample_rates_decoder: List[int] = field(default_factory=lambda: [8, 8, 2, 2])
    upsample_initial_channel_decoder: int = 512
    upsample_kernel_sizes_decoder: List[int] = field(default_factory=lambda: [16, 16, 4, 4])
    periods_multi_period_discriminator: List[int] = field(default_factory=lambda: [2, 3, 5, 7, 11])
    use_sdp: bool = True
    noise_scale: float = 1.0
    inference_noise_scale: float = 0.667
    length_scale: float = 1
    noise_scale_dp: float = 1.0
    inference_noise_scale_dp: float = 1.0
    max_inference_len: int = None
    init_discriminator: bool = True
    use_spectral_norm_disriminator: bool = False
    use_speaker_embedding: bool = False
    num_speakers: int = 0
    speakers_file: str = None
    d_vector_file: List[str] = None
    speaker_embedding_channels: int = 256
    use_d_vector_file: bool = False
    d_vector_dim: int = 0
    detach_dp_input: bool = True
    use_language_embedding: bool = False
    embedded_language_dim: int = 4
    num_languages: int = 0
    language_ids_file: str = None
    use_speaker_encoder_as_loss: bool = False
    speaker_encoder_config_path: str = ""
    speaker_encoder_model_path: str = ""
    condition_dp_on_speaker: bool = True
    freeze_encoder: bool = False
    freeze_DP: bool = False
    freeze_PE: bool = False
    freeze_flow_decoder: bool = False
    freeze_waveform_decoder: bool = False
    encoder_sample_rate: int = None
    interpolate_z: bool = True
    reinit_DP: bool = False
    reinit_text_encoder: bool = False


class Vits(BaseTTS):
    """VITS TTS model

    Paper::
        https://arxiv.org/pdf/2106.06103.pdf

    Paper Abstract::
        Several recent end-to-end text-to-speech (TTS) models enabling single-stage training and parallel
        sampling have been proposed, but their sample quality does not match that of two-stage TTS systems.
        In this work, we present a parallel endto-end TTS method that generates more natural sounding audio than
        current two-stage models. Our method adopts variational inference augmented with normalizing flows and
        an adversarial training process, which improves the expressive power of generative modeling. We also propose a
        stochastic duration predictor to synthesize speech with diverse rhythms from input text. With the
        uncertainty modeling over latent variables and the stochastic duration predictor, our method expresses the
        natural one-to-many relationship in which a text input can be spoken in multiple ways
        with different pitches and rhythms. A subjective human evaluation (mean opinion score, or MOS)
        on the LJ Speech, a single speaker dataset, shows that our method outperforms the best publicly
        available TTS systems and achieves a MOS comparable to ground truth.

    Check :class:`TTS.tts.configs.vits_config.VitsConfig` for class arguments.

    Examples:
        >>> from TTS.tts.configs.vits_config import VitsConfig
        >>> from TTS.tts.models.vits import Vits
        >>> config = VitsConfig()
        >>> model = Vits(config)
    """

    def __init__(
        self,
        config: Coqpit,
        ap: "AudioProcessor" = None,
        tokenizer: "TTSTokenizer" = None,
        speaker_manager: SpeakerManager = None,
        language_manager: LanguageManager = None,
    ):
        super().__init__(config, ap, tokenizer, speaker_manager, language_manager)

        self.init_multispeaker(config)
        self.init_multilingual(config)
        self.init_upsampling()

        self.length_scale = self.args.length_scale
        self.noise_scale = self.args.noise_scale
        self.inference_noise_scale = self.args.inference_noise_scale
        self.inference_noise_scale_dp = self.args.inference_noise_scale_dp
        self.noise_scale_dp = self.args.noise_scale_dp
        self.max_inference_len = self.args.max_inference_len
        self.spec_segment_size = self.args.spec_segment_size

        self.text_encoder = TextEncoder(
            self.args.num_chars,
            self.args.hidden_channels,
            self.args.hidden_channels,
            self.args.hidden_channels_ffn_text_encoder,
            self.args.num_heads_text_encoder,
            self.args.num_layers_text_encoder,
            self.args.kernel_size_text_encoder,
            self.args.dropout_p_text_encoder,
            language_emb_dim=self.embedded_language_dim,
        )

        self.posterior_encoder = PosteriorEncoder(
            self.args.out_channels,
            self.args.hidden_channels,
            self.args.hidden_channels,
            kernel_size=self.args.kernel_size_posterior_encoder,
            dilation_rate=self.args.dilation_rate_posterior_encoder,
            num_layers=self.args.num_layers_posterior_encoder,
            cond_channels=self.embedded_speaker_dim,
        )

        self.flow = ResidualCouplingBlocks(
            self.args.hidden_channels,
            self.args.hidden_channels,
            kernel_size=self.args.kernel_size_flow,
            dilation_rate=self.args.dilation_rate_flow,
            num_layers=self.args.num_layers_flow,
            cond_channels=self.embedded_speaker_dim,
        )

        if self.args.use_sdp:
            self.duration_predictor = StochasticDurationPredictor(
                self.args.hidden_channels,
                192,
                3,
                self.args.dropout_p_duration_predictor,
                4,
                cond_channels=self.embedded_speaker_dim if self.args.condition_dp_on_speaker else 0,
                language_emb_dim=self.embedded_language_dim,
            )
        else:
            self.duration_predictor = DurationPredictor(
                self.args.hidden_channels,
                256,
                3,
                self.args.dropout_p_duration_predictor,
                cond_channels=self.embedded_speaker_dim,
                language_emb_dim=self.embedded_language_dim,
            )

        self.waveform_decoder = HifiganGenerator(
            self.args.hidden_channels,
            1,
            self.args.resblock_type_decoder,
            self.args.resblock_dilation_sizes_decoder,
            self.args.resblock_kernel_sizes_decoder,
            self.args.upsample_kernel_sizes_decoder,
            self.args.upsample_initial_channel_decoder,
            self.args.upsample_rates_decoder,
            inference_padding=0,
            cond_channels=self.embedded_speaker_dim,
            conv_pre_weight_norm=False,
            conv_post_weight_norm=False,
            conv_post_bias=False,
        )

        if self.args.init_discriminator:
            self.disc = VitsDiscriminator(
                periods=self.args.periods_multi_period_discriminator,
                use_spectral_norm=self.args.use_spectral_norm_disriminator,
            )

    @property
    def device(self):
        return next(self.parameters()).device

    def init_multispeaker(self, config: Coqpit):
        """Initialize multi-speaker modules of a model. A model can be trained either with a speaker embedding layer
        or with external `d_vectors` computed from a speaker encoder model.

        You must provide a `speaker_manager` at initialization to set up the multi-speaker modules.

        Args:
            config (Coqpit): Model configuration.
            data (List, optional): Dataset items to infer number of speakers. Defaults to None.
        """
        self.embedded_speaker_dim = 0
        self.num_speakers = self.args.num_speakers
        self.audio_transform = None

        if self.speaker_manager:
            self.num_speakers = self.speaker_manager.num_speakers

        if self.args.use_speaker_embedding:
            self._init_speaker_embedding()

        if self.args.use_d_vector_file:
            self._init_d_vector()

        # TODO: make this a function
        if self.args.use_speaker_encoder_as_loss:
            if self.speaker_manager.encoder is None and (
                not self.args.speaker_encoder_model_path or not self.args.speaker_encoder_config_path
            ):
                raise RuntimeError(
                    " [!] To use the speaker consistency loss (SCL) you need to specify speaker_encoder_model_path and speaker_encoder_config_path !!"
                )

            self.speaker_manager.encoder.eval()
            print(" > External Speaker Encoder Loaded !!")

            if (
                hasattr(self.speaker_manager.encoder, "audio_config")
                and self.config.audio.sample_rate != self.speaker_manager.encoder.audio_config["sample_rate"]
            ):
                self.audio_transform = torchaudio.transforms.Resample(
                    orig_freq=self.config.audio.sample_rate,
                    new_freq=self.speaker_manager.encoder.audio_config["sample_rate"],
                )

    def _init_speaker_embedding(self):
        # pylint: disable=attribute-defined-outside-init
        if self.num_speakers > 0:
            print(" > initialization of speaker-embedding layers.")
            self.embedded_speaker_dim = self.args.speaker_embedding_channels
            self.emb_g = nn.Embedding(self.num_speakers, self.embedded_speaker_dim)

    def _init_d_vector(self):
        # pylint: disable=attribute-defined-outside-init
        if hasattr(self, "emb_g"):
            raise ValueError("[!] Speaker embedding layer already initialized before d_vector settings.")
        self.embedded_speaker_dim = self.args.d_vector_dim

    def init_multilingual(self, config: Coqpit):
        """Initialize multilingual modules of a model.

        Args:
            config (Coqpit): Model configuration.
        """
        if self.args.language_ids_file is not None:
            self.language_manager = LanguageManager(language_ids_file_path=config.language_ids_file)

        if self.args.use_language_embedding and self.language_manager:
            print(" > initialization of language-embedding layers.")
            self.num_languages = self.language_manager.num_languages
            self.embedded_language_dim = self.args.embedded_language_dim
            self.emb_l = nn.Embedding(self.num_languages, self.embedded_language_dim)
            torch.nn.init.xavier_uniform_(self.emb_l.weight)
        else:
            self.embedded_language_dim = 0

    def init_upsampling(self):
        """
        Initialize upsampling modules of a model.
        """
        if self.args.encoder_sample_rate:
            self.interpolate_factor = self.config.audio["sample_rate"] / self.args.encoder_sample_rate
            self.audio_resampler = torchaudio.transforms.Resample(
                orig_freq=self.config.audio["sample_rate"], new_freq=self.args.encoder_sample_rate
            )  # pylint: disable=W0201

    def on_epoch_start(self, trainer):  # pylint: disable=W0613
        """Freeze layers at the beginning of an epoch"""
        self._freeze_layers()
        # set the device of speaker encoder
        if self.args.use_speaker_encoder_as_loss:
            self.speaker_manager.encoder = self.speaker_manager.encoder.to(self.device)

    def on_init_end(self, trainer):  # pylint: disable=W0613
        """Reinit layes if needed"""
        if self.args.reinit_DP:
            before_dict = get_module_weights_sum(self.duration_predictor)
            # Applies weights_reset recursively to every submodule of the duration predictor
            self.duration_predictor.apply(fn=weights_reset)
            after_dict = get_module_weights_sum(self.duration_predictor)
            for key, value in after_dict.items():
                if value == before_dict[key]:
                    raise RuntimeError(" [!] The weights of Duration Predictor was not reinit check it !")
            print(" > Duration Predictor was reinit.")

        if self.args.reinit_text_encoder:
            before_dict = get_module_weights_sum(self.text_encoder)
            # Applies weights_reset recursively to every submodule of the duration predictor
            self.text_encoder.apply(fn=weights_reset)
            after_dict = get_module_weights_sum(self.text_encoder)
            for key, value in after_dict.items():
                if value == before_dict[key]:
                    raise RuntimeError(" [!] The weights of Text Encoder was not reinit check it !")
            print(" > Text Encoder was reinit.")

    def get_aux_input(self, aux_input: Dict):
        sid, g, lid, _ = self._set_cond_input(aux_input)
        return {"speaker_ids": sid, "style_wav": None, "d_vectors": g, "language_ids": lid}

    def _freeze_layers(self):
        if self.args.freeze_encoder:
            for param in self.text_encoder.parameters():
                param.requires_grad = False

            if hasattr(self, "emb_l"):
                for param in self.emb_l.parameters():
                    param.requires_grad = False

        if self.args.freeze_PE:
            for param in self.posterior_encoder.parameters():
                param.requires_grad = False

        if self.args.freeze_DP:
            for param in self.duration_predictor.parameters():
                param.requires_grad = False

        if self.args.freeze_flow_decoder:
            for param in self.flow.parameters():
                param.requires_grad = False

        if self.args.freeze_waveform_decoder:
            for param in self.waveform_decoder.parameters():
                param.requires_grad = False

    @staticmethod
    def _set_cond_input(aux_input: Dict):
        """Set the speaker conditioning input based on the multi-speaker mode."""
        sid, g, lid, durations = None, None, None, None
        if "speaker_ids" in aux_input and aux_input["speaker_ids"] is not None:
            sid = aux_input["speaker_ids"]
            if sid.ndim == 0:
                sid = sid.unsqueeze_(0)
        if "d_vectors" in aux_input and aux_input["d_vectors"] is not None:
            g = F.normalize(aux_input["d_vectors"]).unsqueeze(-1)
            if g.ndim == 2:
                g = g.unsqueeze_(0)

        if "language_ids" in aux_input and aux_input["language_ids"] is not None:
            lid = aux_input["language_ids"]
            if lid.ndim == 0:
                lid = lid.unsqueeze_(0)

        if "durations" in aux_input and aux_input["durations"] is not None:
            durations = aux_input["durations"]

        return sid, g, lid, durations

    def _set_speaker_input(self, aux_input: Dict):
        d_vectors = aux_input.get("d_vectors", None)
        speaker_ids = aux_input.get("speaker_ids", None)

        if d_vectors is not None and speaker_ids is not None:
            raise ValueError("[!] Cannot use d-vectors and speaker-ids together.")

        if speaker_ids is not None and not hasattr(self, "emb_g"):
            raise ValueError("[!] Cannot use speaker-ids without enabling speaker embedding.")

        g = speaker_ids if speaker_ids is not None else d_vectors
        return g

    def forward_mas(self, outputs, z_p, m_p, logs_p, x, x_mask, y_mask, g, lang_emb):
        # find the alignment path
        attn_mask = torch.unsqueeze(x_mask, -1) * torch.unsqueeze(y_mask, 2)
        with torch.no_grad():
            o_scale = torch.exp(-2 * logs_p)
            logp1 = torch.sum(-0.5 * math.log(2 * math.pi) - logs_p, [1]).unsqueeze(-1)  # [b, t, 1]
            logp2 = torch.einsum("klm, kln -> kmn", [o_scale, -0.5 * (z_p**2)])
            logp3 = torch.einsum("klm, kln -> kmn", [m_p * o_scale, z_p])
            logp4 = torch.sum(-0.5 * (m_p**2) * o_scale, [1]).unsqueeze(-1)  # [b, t, 1]
            logp = logp2 + logp3 + logp1 + logp4
            attn = maximum_path(logp, attn_mask.squeeze(1)).unsqueeze(1).detach()  # [b, 1, t, t']

        # duration predictor
        attn_durations = attn.sum(3)
        if self.args.use_sdp:
            loss_duration = self.duration_predictor(
                x.detach() if self.args.detach_dp_input else x,
                x_mask,
                attn_durations,
                g=g.detach() if self.args.detach_dp_input and g is not None else g,
                lang_emb=lang_emb.detach() if self.args.detach_dp_input and lang_emb is not None else lang_emb,
            )
            loss_duration = loss_duration / torch.sum(x_mask)
        else:
            attn_log_durations = torch.log(attn_durations + 1e-6) * x_mask
            log_durations = self.duration_predictor(
                x.detach() if self.args.detach_dp_input else x,
                x_mask,
                g=g.detach() if self.args.detach_dp_input and g is not None else g,
                lang_emb=lang_emb.detach() if self.args.detach_dp_input and lang_emb is not None else lang_emb,
            )
            loss_duration = torch.sum((log_durations - attn_log_durations) ** 2, [1, 2]) / torch.sum(x_mask)
        outputs["loss_duration"] = loss_duration
        return outputs, attn

    def upsampling_z(self, z, slice_ids=None, y_lengths=None, y_mask=None):
        spec_segment_size = self.spec_segment_size
        if self.args.encoder_sample_rate:
            # recompute the slices and spec_segment_size if needed
            slice_ids = slice_ids * int(self.interpolate_factor) if slice_ids is not None else slice_ids
            spec_segment_size = spec_segment_size * int(self.interpolate_factor)
            # interpolate z if needed
            if self.args.interpolate_z:
                z = torch.nn.functional.interpolate(z, scale_factor=[self.interpolate_factor], mode="linear").squeeze(0)
                # recompute the mask if needed
                if y_lengths is not None and y_mask is not None:
                    y_mask = (
                        sequence_mask(y_lengths * self.interpolate_factor, None).to(y_mask.dtype).unsqueeze(1)
                    )  # [B, 1, T_dec_resampled]

        return z, spec_segment_size, slice_ids, y_mask

    def forward(  # pylint: disable=dangerous-default-value
        self,
        x: torch.tensor,
        x_lengths: torch.tensor,
        y: torch.tensor,
        y_lengths: torch.tensor,
        waveform: torch.tensor,
        aux_input={"d_vectors": None, "speaker_ids": None, "language_ids": None},
    ) -> Dict:
        """Forward pass of the model.

        Args:
            x (torch.tensor): Batch of input character sequence IDs.
            x_lengths (torch.tensor): Batch of input character sequence lengths.
            y (torch.tensor): Batch of input spectrograms.
            y_lengths (torch.tensor): Batch of input spectrogram lengths.
            waveform (torch.tensor): Batch of ground truth waveforms per sample.
            aux_input (dict, optional): Auxiliary inputs for multi-speaker and multi-lingual training.
                Defaults to {"d_vectors": None, "speaker_ids": None, "language_ids": None}.

        Returns:
            Dict: model outputs keyed by the output name.

        Shapes:
            - x: :math:`[B, T_seq]`
            - x_lengths: :math:`[B]`
            - y: :math:`[B, C, T_spec]`
            - y_lengths: :math:`[B]`
            - waveform: :math:`[B, 1, T_wav]`
            - d_vectors: :math:`[B, C, 1]`
            - speaker_ids: :math:`[B]`
            - language_ids: :math:`[B]`

        Return Shapes:
            - model_outputs: :math:`[B, 1, T_wav]`
            - alignments: :math:`[B, T_seq, T_dec]`
            - z: :math:`[B, C, T_dec]`
            - z_p: :math:`[B, C, T_dec]`
            - m_p: :math:`[B, C, T_dec]`
            - logs_p: :math:`[B, C, T_dec]`
            - m_q: :math:`[B, C, T_dec]`
            - logs_q: :math:`[B, C, T_dec]`
            - waveform_seg: :math:`[B, 1, spec_seg_size * hop_length]`
            - gt_spk_emb: :math:`[B, 1, speaker_encoder.proj_dim]`
            - syn_spk_emb: :math:`[B, 1, speaker_encoder.proj_dim]`
        """
        outputs = {}
        sid, g, lid, _ = self._set_cond_input(aux_input)
        # speaker embedding
        if self.args.use_speaker_embedding and sid is not None:
            g = self.emb_g(sid).unsqueeze(-1)  # [b, h, 1]

        # language embedding
        lang_emb = None
        if self.args.use_language_embedding and lid is not None:
            lang_emb = self.emb_l(lid).unsqueeze(-1)

        x, m_p, logs_p, x_mask = self.text_encoder(x, x_lengths, lang_emb=lang_emb)

        # posterior encoder
        z, m_q, logs_q, y_mask = self.posterior_encoder(y, y_lengths, g=g)

        # flow layers
        z_p = self.flow(z, y_mask, g=g)

        # duration predictor
        outputs, attn = self.forward_mas(outputs, z_p, m_p, logs_p, x, x_mask, y_mask, g=g, lang_emb=lang_emb)

        # expand prior
        m_p = torch.einsum("klmn, kjm -> kjn", [attn, m_p])
        logs_p = torch.einsum("klmn, kjm -> kjn", [attn, logs_p])

        # select a random feature segment for the waveform decoder
        z_slice, slice_ids = rand_segments(z, y_lengths, self.spec_segment_size, let_short_samples=True, pad_short=True)

        # interpolate z if needed
        z_slice, spec_segment_size, slice_ids, _ = self.upsampling_z(z_slice, slice_ids=slice_ids)

        o = self.waveform_decoder(z_slice, g=g)

        wav_seg = segment(
            waveform,
            slice_ids * self.config.audio.hop_length,
            spec_segment_size * self.config.audio.hop_length,
            pad_short=True,
        )

        if self.args.use_speaker_encoder_as_loss and self.speaker_manager.encoder is not None:
            # concate generated and GT waveforms
            wavs_batch = torch.cat((wav_seg, o), dim=0)

            # resample audio to speaker encoder sample_rate
            # pylint: disable=W0105
            if self.audio_transform is not None:
                wavs_batch = self.audio_transform(wavs_batch)

            pred_embs = self.speaker_manager.encoder.forward(wavs_batch, l2_norm=True)

            # split generated and GT speaker embeddings
            gt_spk_emb, syn_spk_emb = torch.chunk(pred_embs, 2, dim=0)
        else:
            gt_spk_emb, syn_spk_emb = None, None

        outputs.update(
            {
                "model_outputs": o,
                "alignments": attn.squeeze(1),
                "m_p": m_p,
                "logs_p": logs_p,
                "z": z,
                "z_p": z_p,
                "m_q": m_q,
                "logs_q": logs_q,
                "waveform_seg": wav_seg,
                "gt_spk_emb": gt_spk_emb,
                "syn_spk_emb": syn_spk_emb,
                "slice_ids": slice_ids,
            }
        )
        return outputs

    @staticmethod
    def _set_x_lengths(x, aux_input):
        if "x_lengths" in aux_input and aux_input["x_lengths"] is not None:
            return aux_input["x_lengths"]
        return torch.tensor(x.shape[1:2]).to(x.device)

    @torch.no_grad()
    def inference(
        self,
        x,
        aux_input={"x_lengths": None, "d_vectors": None, "speaker_ids": None, "language_ids": None, "durations": None},
    ):  # pylint: disable=dangerous-default-value
        """
        Note:
            To run in batch mode, provide `x_lengths` else model assumes that the batch size is 1.

        Shapes:
            - x: :math:`[B, T_seq]`
            - x_lengths: :math:`[B]`
            - d_vectors: :math:`[B, C]`
            - speaker_ids: :math:`[B]`

        Return Shapes:
            - model_outputs: :math:`[B, 1, T_wav]`
            - alignments: :math:`[B, T_seq, T_dec]`
            - z: :math:`[B, C, T_dec]`
            - z_p: :math:`[B, C, T_dec]`
            - m_p: :math:`[B, C, T_dec]`
            - logs_p: :math:`[B, C, T_dec]`
        """
        sid, g, lid, durations = self._set_cond_input(aux_input)
        x_lengths = self._set_x_lengths(x, aux_input)

        # speaker embedding
        if self.args.use_speaker_embedding and sid is not None:
            g = self.emb_g(sid).unsqueeze(-1)

        # language embedding
        lang_emb = None
        if self.args.use_language_embedding and lid is not None:
            lang_emb = self.emb_l(lid).unsqueeze(-1)

        x, m_p, logs_p, x_mask = self.text_encoder(x, x_lengths, lang_emb=lang_emb)

        if durations is None:
            if self.args.use_sdp:
                logw = self.duration_predictor(
                    x,
                    x_mask,
                    g=g if self.args.condition_dp_on_speaker else None,
                    reverse=True,
                    noise_scale=self.inference_noise_scale_dp,
                    lang_emb=lang_emb,
                )
            else:
                logw = self.duration_predictor(
                    x, x_mask, g=g if self.args.condition_dp_on_speaker else None, lang_emb=lang_emb
                )
            w = torch.exp(logw) * x_mask * self.length_scale
        else:
            assert durations.shape[-1] == x.shape[-1]
            w = durations.unsqueeze(0)

        w_ceil = torch.ceil(w)
        y_lengths = torch.clamp_min(torch.sum(w_ceil, [1, 2]), 1).long()
        y_mask = sequence_mask(y_lengths, None).to(x_mask.dtype).unsqueeze(1)  # [B, 1, T_dec]

        attn_mask = x_mask * y_mask.transpose(1, 2)  # [B, 1, T_enc] * [B, T_dec, 1]
        attn = generate_path(w_ceil.squeeze(1), attn_mask.squeeze(1).transpose(1, 2))

        m_p = torch.matmul(attn.transpose(1, 2), m_p.transpose(1, 2)).transpose(1, 2)
        logs_p = torch.matmul(attn.transpose(1, 2), logs_p.transpose(1, 2)).transpose(1, 2)

        z_p = m_p + torch.randn_like(m_p) * torch.exp(logs_p) * self.inference_noise_scale
        z = self.flow(z_p, y_mask, g=g, reverse=True)

        # upsampling if needed
        z, _, _, y_mask = self.upsampling_z(z, y_lengths=y_lengths, y_mask=y_mask)

        o = self.waveform_decoder((z * y_mask)[:, :, : self.max_inference_len], g=g)

        outputs = {
            "model_outputs": o,
            "alignments": attn.squeeze(1),
            "durations": w_ceil,
            "z": z,
            "z_p": z_p,
            "m_p": m_p,
            "logs_p": logs_p,
            "y_mask": y_mask,
        }
        return outputs

    @torch.no_grad()
    def inference_voice_conversion(
        self, reference_wav, speaker_id=None, d_vector=None, reference_speaker_id=None, reference_d_vector=None
    ):
        """Inference for voice conversion

        Args:
            reference_wav (Tensor): Reference wavform. Tensor of shape [B, T]
            speaker_id (Tensor): speaker_id of the target speaker. Tensor of shape [B]
            d_vector (Tensor): d_vector embedding of target speaker. Tensor of shape `[B, C]`
            reference_speaker_id (Tensor): speaker_id of the reference_wav speaker. Tensor of shape [B]
            reference_d_vector (Tensor): d_vector embedding of the reference_wav speaker. Tensor of shape `[B, C]`
        """
        # compute spectrograms
        y = wav_to_spec(
            reference_wav,
            self.config.audio.fft_size,
            self.config.audio.hop_length,
            self.config.audio.win_length,
            center=False,
        )
        y_lengths = torch.tensor([y.size(-1)]).to(y.device)
        speaker_cond_src = reference_speaker_id if reference_speaker_id is not None else reference_d_vector
        speaker_cond_tgt = speaker_id if speaker_id is not None else d_vector
        wav, _, _ = self.voice_conversion(y, y_lengths, speaker_cond_src, speaker_cond_tgt)
        return wav

    def voice_conversion(self, y, y_lengths, speaker_cond_src, speaker_cond_tgt):
        """Forward pass for voice conversion

        TODO: create an end-point for voice conversion

        Args:
            y (Tensor): Reference spectrograms. Tensor of shape [B, T, C]
            y_lengths (Tensor): Length of each reference spectrogram. Tensor of shape [B]
            speaker_cond_src (Tensor): Reference speaker ID. Tensor of shape [B,]
            speaker_cond_tgt (Tensor): Target speaker ID. Tensor of shape [B,]
        """
        assert self.num_speakers > 0, "num_speakers have to be larger than 0."
        # speaker embedding
        if self.args.use_speaker_embedding and not self.args.use_d_vector_file:
            g_src = self.emb_g(torch.from_numpy((np.array(speaker_cond_src))).unsqueeze(0)).unsqueeze(-1)
            g_tgt = self.emb_g(torch.from_numpy((np.array(speaker_cond_tgt))).unsqueeze(0)).unsqueeze(-1)
        elif not self.args.use_speaker_embedding and self.args.use_d_vector_file:
            g_src = F.normalize(speaker_cond_src).unsqueeze(-1)
            g_tgt = F.normalize(speaker_cond_tgt).unsqueeze(-1)
        else:
            raise RuntimeError(" [!] Voice conversion is only supported on multi-speaker models.")

        z, _, _, y_mask = self.posterior_encoder(y, y_lengths, g=g_src)
        z_p = self.flow(z, y_mask, g=g_src)
        z_hat = self.flow(z_p, y_mask, g=g_tgt, reverse=True)
        o_hat = self.waveform_decoder(z_hat * y_mask, g=g_tgt)
        return o_hat, y_mask, (z, z_p, z_hat)

    def train_step(self, batch: dict, criterion: nn.Module, optimizer_idx: int) -> Tuple[Dict, Dict]:
        """Perform a single training step. Run the model forward pass and compute losses.

        Args:
            batch (Dict): Input tensors.
            criterion (nn.Module): Loss layer designed for the model.
            optimizer_idx (int): Index of optimizer to use. 0 for the generator and 1 for the discriminator networks.

        Returns:
            Tuple[Dict, Dict]: Model ouputs and computed losses.
        """

        spec_lens = batch["spec_lens"]

        if optimizer_idx == 0:
            tokens = batch["tokens"]
            token_lenghts = batch["token_lens"]
            spec = batch["spec"]

            d_vectors = batch["d_vectors"]
            speaker_ids = batch["speaker_ids"]
            language_ids = batch["language_ids"]
            waveform = batch["waveform"]

            # generator pass
            outputs = self.forward(
                tokens,
                token_lenghts,
                spec,
                spec_lens,
                waveform,
                aux_input={"d_vectors": d_vectors, "speaker_ids": speaker_ids, "language_ids": language_ids},
            )

            # cache tensors for the generator pass
            self.model_outputs_cache = outputs  # pylint: disable=attribute-defined-outside-init

            # compute scores and features
            scores_disc_fake, _, scores_disc_real, _ = self.disc(
                outputs["model_outputs"].detach(), outputs["waveform_seg"]
            )

            # compute loss
            with autocast(enabled=False):  # use float32 for the criterion
                loss_dict = criterion[optimizer_idx](
                    scores_disc_real,
                    scores_disc_fake,
                )
            return outputs, loss_dict

        if optimizer_idx == 1:
            mel = batch["mel"]

            # compute melspec segment
            with autocast(enabled=False):
                if self.args.encoder_sample_rate:
                    spec_segment_size = self.spec_segment_size * int(self.interpolate_factor)
                else:
                    spec_segment_size = self.spec_segment_size

                mel_slice = segment(
                    mel.float(), self.model_outputs_cache["slice_ids"], spec_segment_size, pad_short=True
                )
                mel_slice_hat = wav_to_mel(
                    y=self.model_outputs_cache["model_outputs"].float(),
                    n_fft=self.config.audio.fft_size,
                    sample_rate=self.config.audio.sample_rate,
                    num_mels=self.config.audio.num_mels,
                    hop_length=self.config.audio.hop_length,
                    win_length=self.config.audio.win_length,
                    fmin=self.config.audio.mel_fmin,
                    fmax=self.config.audio.mel_fmax,
                    center=False,
                )

            # compute discriminator scores and features
            scores_disc_fake, feats_disc_fake, _, feats_disc_real = self.disc(
                self.model_outputs_cache["model_outputs"], self.model_outputs_cache["waveform_seg"]
            )

            # compute losses
            with autocast(enabled=False):  # use float32 for the criterion
                loss_dict = criterion[optimizer_idx](
                    mel_slice_hat=mel_slice.float(),
                    mel_slice=mel_slice_hat.float(),
                    z_p=self.model_outputs_cache["z_p"].float(),
                    logs_q=self.model_outputs_cache["logs_q"].float(),
                    m_p=self.model_outputs_cache["m_p"].float(),
                    logs_p=self.model_outputs_cache["logs_p"].float(),
                    z_len=spec_lens,
                    scores_disc_fake=scores_disc_fake,
                    feats_disc_fake=feats_disc_fake,
                    feats_disc_real=feats_disc_real,
                    loss_duration=self.model_outputs_cache["loss_duration"],
                    use_speaker_encoder_as_loss=self.args.use_speaker_encoder_as_loss,
                    gt_spk_emb=self.model_outputs_cache["gt_spk_emb"],
                    syn_spk_emb=self.model_outputs_cache["syn_spk_emb"],
                )

            return self.model_outputs_cache, loss_dict

        raise ValueError(" [!] Unexpected `optimizer_idx`.")

    def _log(self, ap, batch, outputs, name_prefix="train"):  # pylint: disable=unused-argument,no-self-use
        y_hat = outputs[1]["model_outputs"]
        y = outputs[1]["waveform_seg"]
        figures = plot_results(y_hat, y, ap, name_prefix)
        sample_voice = y_hat[0].squeeze(0).detach().cpu().numpy()
        audios = {f"{name_prefix}/audio": sample_voice}

        alignments = outputs[1]["alignments"]
        align_img = alignments[0].data.cpu().numpy().T

        figures.update(
            {
                "alignment": plot_alignment(align_img, output_fig=False),
            }
        )
        return figures, audios

    def train_log(
        self, batch: dict, outputs: dict, logger: "Logger", assets: dict, steps: int
    ):  # pylint: disable=no-self-use
        """Create visualizations and waveform examples.

        For example, here you can plot spectrograms and generate sample sample waveforms from these spectrograms to
        be projected onto Tensorboard.

        Args:
            ap (AudioProcessor): audio processor used at training.
            batch (Dict): Model inputs used at the previous training step.
            outputs (Dict): Model outputs generated at the previoud training step.

        Returns:
            Tuple[Dict, np.ndarray]: training plots and output waveform.
        """
        figures, audios = self._log(self.ap, batch, outputs, "train")
        logger.train_figures(steps, figures)
        logger.train_audios(steps, audios, self.ap.sample_rate)

    @torch.no_grad()
    def eval_step(self, batch: dict, criterion: nn.Module, optimizer_idx: int):
        return self.train_step(batch, criterion, optimizer_idx)

    def eval_log(self, batch: dict, outputs: dict, logger: "Logger", assets: dict, steps: int) -> None:
        figures, audios = self._log(self.ap, batch, outputs, "eval")
        logger.eval_figures(steps, figures)
        logger.eval_audios(steps, audios, self.ap.sample_rate)

    def get_aux_input_from_test_sentences(self, sentence_info):
        if hasattr(self.config, "model_args"):
            config = self.config.model_args
        else:
            config = self.config

        # extract speaker and language info
        text, speaker_name, style_wav, language_name = None, None, None, None

        if isinstance(sentence_info, list):
            if len(sentence_info) == 1:
                text = sentence_info[0]
            elif len(sentence_info) == 2:
                text, speaker_name = sentence_info
            elif len(sentence_info) == 3:
                text, speaker_name, style_wav = sentence_info
            elif len(sentence_info) == 4:
                text, speaker_name, style_wav, language_name = sentence_info
        else:
            text = sentence_info

        # get speaker  id/d_vector
        speaker_id, d_vector, language_id = None, None, None
        if hasattr(self, "speaker_manager"):
            if config.use_d_vector_file:
                if speaker_name is None:
                    d_vector = self.speaker_manager.get_random_embedding()
                else:
                    d_vector = self.speaker_manager.get_mean_embedding(speaker_name, num_samples=None, randomize=False)
            elif config.use_speaker_embedding:
                if speaker_name is None:
                    speaker_id = self.speaker_manager.get_random_id()
                else:
                    speaker_id = self.speaker_manager.name_to_id[speaker_name]

        # get language id
        if hasattr(self, "language_manager") and config.use_language_embedding and language_name is not None:
            language_id = self.language_manager.name_to_id[language_name]

        return {
            "text": text,
            "speaker_id": speaker_id,
            "style_wav": style_wav,
            "d_vector": d_vector,
            "language_id": language_id,
            "language_name": language_name,
        }

    @torch.no_grad()
    def test_run(self, assets) -> Tuple[Dict, Dict]:
        """Generic test run for `tts` models used by `Trainer`.

        You can override this for a different behaviour.

        Returns:
            Tuple[Dict, Dict]: Test figures and audios to be projected to Tensorboard.
        """
        print(" | > Synthesizing test sentences.")
        test_audios = {}
        test_figures = {}
        test_sentences = self.config.test_sentences
        for idx, s_info in enumerate(test_sentences):
            aux_inputs = self.get_aux_input_from_test_sentences(s_info)
            wav, alignment, _, _ = synthesis(
                self,
                aux_inputs["text"],
                self.config,
                "cuda" in str(next(self.parameters()).device),
                speaker_id=aux_inputs["speaker_id"],
                d_vector=aux_inputs["d_vector"],
                style_wav=aux_inputs["style_wav"],
                language_id=aux_inputs["language_id"],
                use_griffin_lim=True,
                do_trim_silence=False,
            ).values()
            test_audios["{}-audio".format(idx)] = wav
            test_figures["{}-alignment".format(idx)] = plot_alignment(alignment.T, output_fig=False)
        return {"figures": test_figures, "audios": test_audios}

    def test_log(
        self, outputs: dict, logger: "Logger", assets: dict, steps: int  # pylint: disable=unused-argument
    ) -> None:
        logger.test_audios(steps, outputs["audios"], self.ap.sample_rate)
        logger.test_figures(steps, outputs["figures"])

    def format_batch(self, batch: Dict) -> Dict:
        """Compute speaker, langugage IDs and d_vector for the batch if necessary."""
        speaker_ids = None
        language_ids = None
        d_vectors = None

        # get numerical speaker ids from speaker names
        if self.speaker_manager is not None and self.speaker_manager.name_to_id and self.args.use_speaker_embedding:
            speaker_ids = [self.speaker_manager.name_to_id[sn] for sn in batch["speaker_names"]]

        if speaker_ids is not None:
            speaker_ids = torch.LongTensor(speaker_ids)

        # get d_vectors from audio file names
        if self.speaker_manager is not None and self.speaker_manager.embeddings and self.args.use_d_vector_file:
            d_vector_mapping = self.speaker_manager.embeddings
            d_vectors = [d_vector_mapping[w]["embedding"] for w in batch["audio_unique_names"]]
            d_vectors = torch.FloatTensor(d_vectors)

        # get language ids from language names
        if self.language_manager is not None and self.language_manager.name_to_id and self.args.use_language_embedding:
            language_ids = [self.language_manager.name_to_id[ln] for ln in batch["language_names"]]

        if language_ids is not None:
            language_ids = torch.LongTensor(language_ids)

        batch["language_ids"] = language_ids
        batch["d_vectors"] = d_vectors
        batch["speaker_ids"] = speaker_ids
        return batch

    def format_batch_on_device(self, batch):
        """Compute spectrograms on the device."""
        ac = self.config.audio

        if self.args.encoder_sample_rate:
            wav = self.audio_resampler(batch["waveform"])
        else:
            wav = batch["waveform"]

        # compute spectrograms
        batch["spec"] = wav_to_spec(wav, ac.fft_size, ac.hop_length, ac.win_length, center=False)

        if self.args.encoder_sample_rate:
            # recompute spec with high sampling rate to the loss
            spec_mel = wav_to_spec(batch["waveform"], ac.fft_size, ac.hop_length, ac.win_length, center=False)
            # remove extra stft frames if needed
            if spec_mel.size(2) > int(batch["spec"].size(2) * self.interpolate_factor):
                spec_mel = spec_mel[:, :, : int(batch["spec"].size(2) * self.interpolate_factor)]
            else:
                batch["spec"] = batch["spec"][:, :, : int(spec_mel.size(2) / self.interpolate_factor)]
        else:
            spec_mel = batch["spec"]

        batch["mel"] = spec_to_mel(
            spec=spec_mel,
            n_fft=ac.fft_size,
            num_mels=ac.num_mels,
            sample_rate=ac.sample_rate,
            fmin=ac.mel_fmin,
            fmax=ac.mel_fmax,
        )

        if self.args.encoder_sample_rate:
            assert batch["spec"].shape[2] == int(
                batch["mel"].shape[2] / self.interpolate_factor
            ), f"{batch['spec'].shape[2]}, {batch['mel'].shape[2]}"
        else:
            assert batch["spec"].shape[2] == batch["mel"].shape[2], f"{batch['spec'].shape[2]}, {batch['mel'].shape[2]}"

        # compute spectrogram frame lengths
        batch["spec_lens"] = (batch["spec"].shape[2] * batch["waveform_rel_lens"]).int()
        batch["mel_lens"] = (batch["mel"].shape[2] * batch["waveform_rel_lens"]).int()

        if self.args.encoder_sample_rate:
            assert (batch["spec_lens"] - (batch["mel_lens"] / self.interpolate_factor).int()).sum() == 0
        else:
            assert (batch["spec_lens"] - batch["mel_lens"]).sum() == 0

        # zero the padding frames
        batch["spec"] = batch["spec"] * sequence_mask(batch["spec_lens"]).unsqueeze(1)
        batch["mel"] = batch["mel"] * sequence_mask(batch["mel_lens"]).unsqueeze(1)
        return batch

    def get_sampler(self, config: Coqpit, dataset: TTSDataset, num_gpus=1, is_eval=False):
        weights = None
        data_items = dataset.samples
        if getattr(config, "use_weighted_sampler", False):
            for attr_name, alpha in config.weighted_sampler_attrs.items():
                print(f" > Using weighted sampler for attribute '{attr_name}' with alpha '{alpha}'")
                multi_dict = config.weighted_sampler_multipliers.get(attr_name, None)
                print(multi_dict)
                weights, attr_names, attr_weights = get_attribute_balancer_weights(
                    attr_name=attr_name, items=data_items, multi_dict=multi_dict
                )
                weights = weights * alpha
                print(f" > Attribute weights for '{attr_names}' \n | > {attr_weights}")

        # input_audio_lenghts = [os.path.getsize(x["audio_file"]) for x in data_items]

        if weights is not None:
            w_sampler = WeightedRandomSampler(weights, len(weights))
            batch_sampler = BucketBatchSampler(
                w_sampler,
                data=data_items,
                batch_size=config.eval_batch_size if is_eval else config.batch_size,
                sort_key=lambda x: os.path.getsize(x["audio_file"]),
                drop_last=True,
            )
        else:
            batch_sampler = None
        # sampler for DDP
        if batch_sampler is None:
            batch_sampler = DistributedSampler(dataset) if num_gpus > 1 else None
        else:  # If a sampler is already defined use this sampler and DDP sampler together
            batch_sampler = (
                DistributedSamplerWrapper(batch_sampler) if num_gpus > 1 else batch_sampler
            )  # TODO: check batch_sampler with multi-gpu
        return batch_sampler

    def get_data_loader(
        self,
        config: Coqpit,
        assets: Dict,
        is_eval: bool,
        samples: Union[List[Dict], List[List]],
        verbose: bool,
        num_gpus: int,
        rank: int = None,
    ) -> "DataLoader":
        if is_eval and not config.run_eval:
            loader = None
        else:
            # init dataloader
            dataset = VitsDataset(
                model_args=self.args,
                samples=samples,
                batch_group_size=0 if is_eval else config.batch_group_size * config.batch_size,
                min_text_len=config.min_text_len,
                max_text_len=config.max_text_len,
                min_audio_len=config.min_audio_len,
                max_audio_len=config.max_audio_len,
                phoneme_cache_path=config.phoneme_cache_path,
                precompute_num_workers=config.precompute_num_workers,
                verbose=verbose,
                tokenizer=self.tokenizer,
                start_by_longest=config.start_by_longest,
            )

            # wait all the DDP process to be ready
            if num_gpus > 1:
                dist.barrier()

            # sort input sequences from short to long
            dataset.preprocess_samples()

            # get samplers
            sampler = self.get_sampler(config, dataset, num_gpus)
            if sampler is None:
                loader = DataLoader(
                    dataset,
                    batch_size=config.eval_batch_size if is_eval else config.batch_size,
                    shuffle=False,  # shuffle is done in the dataset.
                    collate_fn=dataset.collate_fn,
                    drop_last=False,  # setting this False might cause issues in AMP training.
                    num_workers=config.num_eval_loader_workers if is_eval else config.num_loader_workers,
                    pin_memory=False,
                )
            else:
                if num_gpus > 1:
                    loader = DataLoader(
                        dataset,
                        sampler=sampler,
                        batch_size=config.eval_batch_size if is_eval else config.batch_size,
                        collate_fn=dataset.collate_fn,
                        num_workers=config.num_eval_loader_workers if is_eval else config.num_loader_workers,
                        pin_memory=False,
                    )
                else:
                    loader = DataLoader(
                        dataset,
                        batch_sampler=sampler,
                        collate_fn=dataset.collate_fn,
                        num_workers=config.num_eval_loader_workers if is_eval else config.num_loader_workers,
                        pin_memory=False,
                    )
        return loader

    def get_optimizer(self) -> List:
        """Initiate and return the GAN optimizers based on the config parameters.
        It returnes 2 optimizers in a list. First one is for the generator and the second one is for the discriminator.
        Returns:
            List: optimizers.
        """
        # select generator parameters
        optimizer0 = get_optimizer(self.config.optimizer, self.config.optimizer_params, self.config.lr_disc, self.disc)

        gen_parameters = chain(params for k, params in self.named_parameters() if not k.startswith("disc."))
        optimizer1 = get_optimizer(
            self.config.optimizer, self.config.optimizer_params, self.config.lr_gen, parameters=gen_parameters
        )
        return [optimizer0, optimizer1]

    def get_lr(self) -> List:
        """Set the initial learning rates for each optimizer.

        Returns:
            List: learning rates for each optimizer.
        """
        return [self.config.lr_disc, self.config.lr_gen]

    def get_scheduler(self, optimizer) -> List:
        """Set the schedulers for each optimizer.

        Args:
            optimizer (List[`torch.optim.Optimizer`]): List of optimizers.

        Returns:
            List: Schedulers, one for each optimizer.
        """
        scheduler_D = get_scheduler(self.config.lr_scheduler_disc, self.config.lr_scheduler_disc_params, optimizer[0])
        scheduler_G = get_scheduler(self.config.lr_scheduler_gen, self.config.lr_scheduler_gen_params, optimizer[1])
        return [scheduler_D, scheduler_G]

    def get_criterion(self):
        """Get criterions for each optimizer. The index in the output list matches the optimizer idx used in
        `train_step()`"""
        from TTS.tts.layers.losses import (  # pylint: disable=import-outside-toplevel
            VitsDiscriminatorLoss,
            VitsGeneratorLoss,
        )

        return [VitsDiscriminatorLoss(self.config), VitsGeneratorLoss(self.config)]

    def load_checkpoint(
        self, config, checkpoint_path, eval=False, strict=True, cache=False
    ):  # pylint: disable=unused-argument, redefined-builtin
        """Load the model checkpoint and setup for training or inference"""
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
        # compat band-aid for the pre-trained models to not use the encoder baked into the model
        # TODO: consider baking the speaker encoder into the model and call it from there.
        # as it is probably easier for model distribution.
        state["model"] = {k: v for k, v in state["model"].items() if "speaker_encoder" not in k}

        if self.args.encoder_sample_rate is not None and eval:
            # audio resampler is not used in inference time
            self.audio_resampler = None

        # handle fine-tuning from a checkpoint with additional speakers
        if hasattr(self, "emb_g") and state["model"]["emb_g.weight"].shape != self.emb_g.weight.shape:
            num_new_speakers = self.emb_g.weight.shape[0] - state["model"]["emb_g.weight"].shape[0]
            print(f" > Loading checkpoint with {num_new_speakers} additional speakers.")
            emb_g = state["model"]["emb_g.weight"]
            new_row = torch.randn(num_new_speakers, emb_g.shape[1])
            emb_g = torch.cat([emb_g, new_row], axis=0)
            state["model"]["emb_g.weight"] = emb_g
        # load the model weights
        self.load_state_dict(state["model"], strict=strict)

        if eval:
            self.eval()
            assert not self.training

    @staticmethod
    def init_from_config(config: "VitsConfig", samples: Union[List[List], List[Dict]] = None, verbose=True):
        """Initiate model from config

        Args:
            config (VitsConfig): Model config.
            samples (Union[List[List], List[Dict]]): Training samples to parse speaker ids for training.
                Defaults to None.
        """
        from TTS.utils.audio import AudioProcessor

        upsample_rate = torch.prod(torch.as_tensor(config.model_args.upsample_rates_decoder)).item()

        if not config.model_args.encoder_sample_rate:
            assert (
                upsample_rate == config.audio.hop_length
            ), f" [!] Product of upsample rates must be equal to the hop length - {upsample_rate} vs {config.audio.hop_length}"
        else:
            encoder_to_vocoder_upsampling_factor = config.audio.sample_rate / config.model_args.encoder_sample_rate
            effective_hop_length = config.audio.hop_length * encoder_to_vocoder_upsampling_factor
            assert (
                upsample_rate == effective_hop_length
            ), f" [!] Product of upsample rates must be equal to the hop length - {upsample_rate} vs {effective_hop_length}"

        ap = AudioProcessor.init_from_config(config, verbose=verbose)
        tokenizer, new_config = TTSTokenizer.init_from_config(config)
        speaker_manager = SpeakerManager.init_from_config(config, samples)
        language_manager = LanguageManager.init_from_config(config)

        if config.model_args.speaker_encoder_model_path:
            speaker_manager.init_encoder(
                config.model_args.speaker_encoder_model_path, config.model_args.speaker_encoder_config_path
            )
        return Vits(new_config, ap, tokenizer, speaker_manager, language_manager)


##################################
# VITS CHARACTERS
##################################


class VitsCharacters(BaseCharacters):
    """Characters class for VITs model for compatibility with pre-trained models"""

    def __init__(
        self,
        graphemes: str = _characters,
        punctuations: str = _punctuations,
        pad: str = _pad,
        ipa_characters: str = _phonemes,
    ) -> None:
        if ipa_characters is not None:
            graphemes += ipa_characters
        super().__init__(graphemes, punctuations, pad, None, None, "<BLNK>", is_unique=False, is_sorted=True)

    def _create_vocab(self):
        self._vocab = [self._pad] + list(self._punctuations) + list(self._characters) + [self._blank]
        self._char_to_id = {char: idx for idx, char in enumerate(self.vocab)}
        # pylint: disable=unnecessary-comprehension
        self._id_to_char = {idx: char for idx, char in enumerate(self.vocab)}

    @staticmethod
    def init_from_config(config: Coqpit):
        if config.characters is not None:
            _pad = config.characters["pad"]
            _punctuations = config.characters["punctuations"]
            _letters = config.characters["characters"]
            _letters_ipa = config.characters["phonemes"]
            return (
                VitsCharacters(graphemes=_letters, ipa_characters=_letters_ipa, punctuations=_punctuations, pad=_pad),
                config,
            )
        characters = VitsCharacters()
        new_config = replace(config, characters=characters.to_config())
        return characters, new_config

    def to_config(self) -> "CharactersConfig":
        return CharactersConfig(
            characters=self._characters,
            punctuations=self._punctuations,
            pad=self._pad,
            eos=None,
            bos=None,
            blank=self._blank,
            is_unique=False,
            is_sorted=True,
        )


================================================
FILE: TTS/tts/utils/__init__.py
================================================


================================================
FILE: TTS/tts/utils/data.py
================================================
import bisect

import numpy as np
import torch


def _pad_data(x, length):
    _pad = 0
    assert x.ndim == 1
    return np.pad(x, (0, length - x.shape[0]), mode="constant", constant_values=_pad)


def prepare_data(inputs):
    max_len = max((len(x) for x in inputs))
    return np.stack([_pad_data(x, max_len) for x in inputs])


def _pad_tensor(x, length):
    _pad = 0.0
    assert x.ndim == 2
    x = np.pad(x, [[0, 0], [0, length - x.shape[1]]], mode="constant", constant_values=_pad)
    return x


def prepare_tensor(inputs, out_steps):
    max_len = max((x.shape[1] for x in inputs))
    remainder = max_len % out_steps
    pad_len = max_len + (out_steps - remainder) if remainder > 0 else max_len
    return np.stack([_pad_tensor(x, pad_len) for x in inputs])


def _pad_stop_target(x: np.ndarray, length: int, pad_val=1) -> np.ndarray:
    """Pad stop target array.

    Args:
        x (np.ndarray): Stop target array.
        length (int): Length after padding.
        pad_val (int, optional): Padding value. Defaults to 1.

    Returns:
        np.ndarray: Padded stop target array.
    """
    assert x.ndim == 1
    return np.pad(x, (0, length - x.shape[0]), mode="constant", constant_values=pad_val)


def prepare_stop_target(inputs, out_steps):
    """Pad row vectors with 1."""
    max_len = max((x.shape[0] for x in inputs))
    remainder = max_len % out_steps
    pad_len = max_len + (out_steps - remainder) if remainder > 0 else max_len
    return np.stack([_pad_stop_target(x, pad_len) for x in inputs])


def pad_per_step(inputs, pad_len):
    return np.pad(inputs, [[0, 0], [0, 0], [0, pad_len]], mode="constant", constant_values=0.0)


def get_length_balancer_weights(items: list, num_buckets=10):
    # get all durations
    audio_lengths = np.array([item["audio_length"] for item in items])
    # create the $num_buckets buckets classes based in the dataset max and min length
    max_length = int(max(audio_lengths))
    min_length = int(min(audio_lengths))
    step = int((max_length - min_length) / num_buckets) + 1
    buckets_classes = [i + step for i in range(min_length, (max_length - step) + num_buckets + 1, step)]
    # add each sample in their respective length bucket
    buckets_names = np.array(
        [buckets_classes[bisect.bisect_left(buckets_classes, item["audio_length"])] for item in items]
    )
    # count and compute the weights_bucket for each sample
    unique_buckets_names = np.unique(buckets_names).tolist()
    bucket_ids = [unique_buckets_names.index(l) for l in buckets_names]
    bucket_count = np.array([len(np.where(buckets_names == l)[0]) for l in unique_buckets_names])
    weight_bucket = 1.0 / bucket_count
    dataset_samples_weight = np.array([weight_bucket[l] for l in bucket_ids])
    # normalize
    dataset_samples_weight = dataset_samples_weight / np.linalg.norm(dataset_samples_weight)
    return torch.from_numpy(dataset_samples_weight).float()


================================================
FILE: TTS/tts/utils/helpers.py
================================================
import numpy as np
import torch
from torch.nn import functional as F

try:
    from TTS.tts.utils.monotonic_align.core import maximum_path_c

    CYTHON = True
except ModuleNotFoundError:
    CYTHON = False


class StandardScaler:
    """StandardScaler for mean-scale normalization with the given mean and scale values."""

    def __init__(self, mean: np.ndarray = None, scale: np.ndarray = None) -> None:
        self.mean_ = mean
        self.scale_ = scale

    def set_stats(self, mean, scale):
        self.mean_ = mean
        self.scale_ = scale

    def reset_stats(self):
        delattr(self, "mean_")
        delattr(self, "scale_")

    def transform(self, X):
        X = np.asarray(X)
        X -= self.mean_
        X /= self.scale_
        return X

    def inverse_transform(self, X):
        X = np.asarray(X)
        X *= self.scale_
        X += self.mean_
        return X


# from https://gist.github.com/jihunchoi/f1434a77df9db1bb337417854b398df1
def sequence_mask(sequence_length, max_len=None):
    """Create a sequence mask for filtering padding in a sequence tensor.

    Args:
        sequence_length (torch.tensor): Sequence lengths.
        max_len (int, Optional): Maximum sequence length. Defaults to None.

    Shapes:
        - mask: :math:`[B, T_max]`
    """
    if max_len is None:
        max_len = sequence_length.data.max()
    seq_range = torch.arange(max_len, dtype=sequence_length.dtype, device=sequence_length.device)
    # B x T_max
    mask = seq_range.unsqueeze(0) < sequence_length.unsqueeze(1)
    return mask


def segment(x: torch.tensor, segment_indices: torch.tensor, segment_size=4, pad_short=False):
    """Segment each sample in a batch based on the provided segment indices

    Args:
        x (torch.tensor): Input tensor.
        segment_indices (torch.tensor): Segment indices.
        segment_size (int): Expected output segment size.
        pad_short (bool): Pad the end of input tensor with zeros if shorter than the segment size.
    """
    # pad the input tensor if it is shorter than the segment size
    if pad_short and x.shape[-1] < segment_size:
        x = torch.nn.functional.pad(x, (0, segment_size - x.size(2)))

    segments = torch.zeros_like(x[:, :, :segment_size])

    for i in range(x.size(0)):
        index_start = segment_indices[i]
        index_end = index_start + segment_size
        x_i = x[i]
        if pad_short and index_end >= x.size(2):
            # pad the sample if it is shorter than the segment size
            x_i = torch.nn.functional.pad(x_i, (0, (index_end + 1) - x.size(2)))
        segments[i] = x_i[:, index_start:index_end]
    return segments


def rand_segments(
    x: torch.tensor, x_lengths: torch.tensor = None, segment_size=4, let_short_samples=False, pad_short=False
):
    """Create random segments based on the input lengths.

    Args:
        x (torch.tensor): Input tensor.
        x_lengths (torch.tensor): Input lengths.
        segment_size (int): Expected output segment size.
        let_short_samples (bool): Allow shorter samples than the segment size.
        pad_short (bool): Pad the end of input tensor with zeros if shorter than the segment size.

    Shapes:
        - x: :math:`[B, C, T]`
        - x_lengths: :math:`[B]`
    """
    _x_lenghts = x_lengths.clone()
    B, _, T = x.size()
    if pad_short:
        if T < segment_size:
            x = torch.nn.functional.pad(x, (0, segment_size - T))
            T = segment_size
    if _x_lenghts is None:
        _x_lenghts = T
    len_diff = _x_lenghts - segment_size
    if let_short_samples:
        _x_lenghts[len_diff < 0] = segment_size
        len_diff = _x_lenghts - segment_size
    else:
        assert all(
            len_diff > 0
        ), f" [!] At least one sample is shorter than the segment size ({segment_size}). \n {_x_lenghts}"
    segment_indices = (torch.rand([B]).type_as(x) * (len_diff + 1)).long()
    ret = segment(x, segment_indices, segment_size, pad_short=pad_short)
    return ret, segment_indices


def average_over_durations(values, durs):
    """Average values over durations.

    Shapes:
        - values: :math:`[B, 1, T_de]`
        - durs: :math:`[B, T_en]`
        - avg: :math:`[B, 1, T_en]`
    """
    durs_cums_ends = torch.cumsum(durs, dim=1).long()
    durs_cums_starts = torch.nn.functional.pad(durs_cums_ends[:, :-1], (1, 0))
    values_nonzero_cums = torch.nn.functional.pad(torch.cumsum(values != 0.0, dim=2), (1, 0))
    values_cums = torch.nn.functional.pad(torch.cumsum(values, dim=2), (1, 0))

    bs, l = durs_cums_ends.size()
    n_formants = values.size(1)
    dcs = durs_cums_starts[:, None, :].expand(bs, n_formants, l)
    dce = durs_cums_ends[:, None, :].expand(bs, n_formants, l)

    values_sums = (torch.gather(values_cums, 2, dce) - torch.gather(values_cums, 2, dcs)).float()
    values_nelems = (torch.gather(values_nonzero_cums, 2, dce) - torch.gather(values_nonzero_cums, 2, dcs)).float()

    avg = torch.where(values_nelems == 0.0, values_nelems, values_sums / values_nelems)
    return avg


def convert_pad_shape(pad_shape):
    l = pad_shape[::-1]
    pad_shape = [item for sublist in l for item in sublist]
    return pad_shape


def generate_path(duration, mask):
    """
    Shapes:
        - duration: :math:`[B, T_en]`
        - mask: :math:'[B, T_en, T_de]`
        - path: :math:`[B, T_en, T_de]`
    """
    device = duration.device
    b, t_x, t_y = mask.shape
    cum_duration = torch.cumsum(duration, 1)
    path = torch.zeros(b, t_x, t_y, dtype=mask.dtype).to(device=device)

    cum_duration_flat = cum_duration.view(b * t_x)
    path = sequence_mask(cum_duration_flat, t_y).to(mask.dtype)
    path = path.view(b, t_x, t_y)
    path = path - F.pad(path, convert_pad_shape([[0, 0], [1, 0], [0, 0]]))[:, :-1]
    path = path * mask
    return path


def maximum_path(value, mask):
    if CYTHON:
        return maximum_path_cython(value, mask)
    return maximum_path_numpy(value, mask)


def maximum_path_cython(value, mask):
    """Cython optimised version.
    Shapes:
        - value: :math:`[B, T_en, T_de]`
        - mask: :math:`[B, T_en, T_de]`
    """
    value = value * mask
    device = value.device
    dtype = value.dtype
    value = value.data.cpu().numpy().astype(np.float32)
    path = np.zeros_like(value).astype(np.int32)
    mask = mask.data.cpu().numpy()

    t_x_max = mask.sum(1)[:, 0].astype(np.int32)
    t_y_max = mask.sum(2)[:, 0].astype(np.int32)
    maximum_path_c(path, value, t_x_max, t_y_max)
    return torch.from_numpy(path).to(device=device, dtype=dtype)


def maximum_path_numpy(value, mask, max_neg_val=None):
    """
    Monotonic alignment search algorithm
    Numpy-friendly version. It's about 4 times faster than torch version.
    value: [b, t_x, t_y]
    mask: [b, t_x, t_y]
    """
    if max_neg_val is None:
        max_neg_val = -np.inf  # Patch for Sphinx complaint
    value = value * mask

    device = value.device
    dtype = value.dtype
    value = value.cpu().detach().numpy()
    mask = mask.cpu().detach().numpy().astype(np.bool)

    b, t_x, t_y = value.shape
    direction = np.zeros(value.shape, dtype=np.int64)
    v = np.zeros((b, t_x), dtype=np.float32)
    x_range = np.arange(t_x, dtype=np.float32).reshape(1, -1)
    for j in range(t_y):
        v0 = np.pad(v, [[0, 0], [1, 0]], mode="constant", constant_values=max_neg_val)[:, :-1]
        v1 = v
        max_mask = v1 >= v0
        v_max = np.where(max_mask, v1, v0)
        direction[:, :, j] = max_mask

        index_mask = x_range <= j
        v = np.where(index_mask, v_max + value[:, :, j], max_neg_val)
    direction = np.where(mask, direction, 1)

    path = np.zeros(value.shape, dtype=np.float32)
    index = mask[:, :, 0].sum(1).astype(np.int64) - 1
    index_range = np.arange(b)
    for j in reversed(range(t_y)):
        path[index_range, index, j] = 1
        index = index + direction[index_range, index, j] - 1
    path = path * mask.astype(np.float32)
    path = torch.from_numpy(path).to(device=device, dtype=dtype)
    return path


================================================
FILE: TTS/tts/utils/languages.py
================================================
import os
from typing import Any, Dict, List

import fsspec
import numpy as np
import torch
from coqpit import Coqpit

from TTS.config import check_config_and_model_args
from TTS.tts.utils.managers import BaseIDManager


class LanguageManager(BaseIDManager):
    """Manage the languages for multi-lingual 🐸TTS models. Load a datafile and parse the information
    in a way that can be queried by language.

    Args:
        language_ids_file_path (str, optional): Path to the metafile that maps language names to ids used by
        TTS models. Defaults to "".
        config (Coqpit, optional): Coqpit config that contains the language information in the datasets filed.
        Defaults to None.

    Examples:
        >>> manager = LanguageManager(language_ids_file_path=language_ids_file_path)
        >>> language_id_mapper = manager.language_ids
    """

    def __init__(
        self,
        language_ids_file_path: str = "",
        config: Coqpit = None,
    ):
        super().__init__(id_file_path=language_ids_file_path)

        if config:
            self.set_language_ids_from_config(config)

    @property
    def num_languages(self) -> int:
        return len(list(self.name_to_id.keys()))

    @property
    def language_names(self) -> List:
        return list(self.name_to_id.keys())

    @staticmethod
    def parse_language_ids_from_config(c: Coqpit) -> Dict:
        """Set language id from config.

        Args:
            c (Coqpit): Config

        Returns:
            Tuple[Dict, int]: Language ID mapping and the number of languages.
        """
        languages = set({})
        for dataset in c.datasets:
            if "language" in dataset:
                languages.add(dataset["language"])
            else:
                raise ValueError(f"Dataset {dataset['name']} has no language specified.")
        return {name: i for i, name in enumerate(sorted(list(languages)))}

    def set_language_ids_from_config(self, c: Coqpit) -> None:
        """Set language IDs from config samples.

        Args:
            c (Coqpit): Config.
        """
        self.name_to_id = self.parse_language_ids_from_config(c)

    @staticmethod
    def parse_ids_from_data(items: List, parse_key: str) -> Any:
        raise NotImplementedError

    def set_ids_from_data(self, items: List, parse_key: str) -> Any:
        raise NotImplementedError

    def save_ids_to_file(self, file_path: str) -> None:
        """Save language IDs to a json file.

        Args:
            file_path (str): Path to the output file.
        """
        self._save_json(file_path, self.name_to_id)

    @staticmethod
    def init_from_config(config: Coqpit) -> "LanguageManager":
        """Initialize the language manager from a Coqpit config.

        Args:
            config (Coqpit): Coqpit config.
        """
        language_manager = None
        if check_config_and_model_args(config, "use_language_embedding", True):
            if config.get("language_ids_file", None):
                language_manager = LanguageManager(language_ids_file_path=config.language_ids_file)
            language_manager = LanguageManager(config=config)
        return language_manager


def _set_file_path(path):
    """Find the language_ids.json under the given path or the above it.
    Intended to band aid the different paths returned in restored and continued training."""
    path_restore = os.path.join(os.path.dirname(path), "language_ids.json")
    path_continue = os.path.join(path, "language_ids.json")
    fs = fsspec.get_mapper(path).fs
    if fs.exists(path_restore):
        return path_restore
    if fs.exists(path_continue):
        return path_continue
    return None


def get_language_balancer_weights(items: list):
    language_names = np.array([item["language"] for item in items])
    unique_language_names = np.unique(language_names).tolist()
    language_ids = [unique_language_names.index(l) for l in language_names]
    language_count = np.array([len(np.where(language_names == l)[0]) for l in unique_language_names])
    weight_language = 1.0 / language_count
    # get weight for each sample
    dataset_samples_weight = np.array([weight_language[l] for l in language_ids])
    # normalize
    dataset_samples_weight = dataset_samples_weight / np.linalg.norm(dataset_samples_weight)
    return torch.from_numpy(dataset_samples_weight).float()


================================================
FILE: TTS/tts/utils/managers.py
================================================
import json
import random
from typing import Any, Dict, List, Tuple, Union

import fsspec
import numpy as np
import torch

from TTS.config import load_config
from TTS.encoder.utils.generic_utils import setup_encoder_model
from TTS.utils.audio import AudioProcessor


def load_file(path: str):
    if path.endswith(".json"):
        with fsspec.open(path, "r") as f:
            return json.load(f)
    elif path.endswith(".pth"):
        with fsspec.open(path, "rb") as f:
            return torch.load(f, map_location="cpu")
    else:
        raise ValueError("Unsupported file type")


def save_file(obj: Any, path: str):
    if path.endswith(".json"):
        with fsspec.open(path, "w") as f:
            json.dump(obj, f, indent=4)
    elif path.endswith(".pth"):
        with fsspec.open(path, "wb") as f:
            torch.save(obj, f)
    else:
        raise ValueError("Unsupported file type")


class BaseIDManager:
    """Base `ID` Manager class. Every new `ID` manager must inherit this.
    It defines common `ID` manager specific functions.
    """

    def __init__(self, id_file_path: str = ""):
        self.name_to_id = {}

        if id_file_path:
            self.load_ids_from_file(id_file_path)

    @staticmethod
    def _load_json(json_file_path: str) -> Dict:
        with fsspec.open(json_file_path, "r") as f:
            return json.load(f)

    @staticmethod
    def _save_json(json_file_path: str, data: dict) -> None:
        with fsspec.open(json_file_path, "w") as f:
            json.dump(data, f, indent=4)

    def set_ids_from_data(self, items: List, parse_key: str) -> None:
        """Set IDs from data samples.

        Args:
            items (List): Data sampled returned by `load_tts_samples()`.
        """
        self.name_to_id = self.parse_ids_from_data(items, parse_key=parse_key)

    def load_ids_from_file(self, file_path: str) -> None:
        """Set IDs from a file.

        Args:
            file_path (str): Path to the file.
        """
        self.name_to_id = load_file(file_path)

    def save_ids_to_file(self, file_path: str) -> None:
        """Save IDs to a json file.

        Args:
            file_path (str): Path to the output file.
        """
        save_file(self.name_to_id, file_path)

    def get_random_id(self) -> Any:
        """Get a random embedding.

        Args:

        Returns:
            np.ndarray: embedding.
        """
        if self.name_to_id:
            return self.name_to_id[random.choices(list(self.name_to_id.keys()))[0]]

        return None

    @staticmethod
    def parse_ids_from_data(items: List, parse_key: str) -> Tuple[Dict]:
        """Parse IDs from data samples retured by `load_tts_samples()`.

        Args:
            items (list): Data sampled returned by `load_tts_samples()`.
            parse_key (str): The key to being used to parse the data.
        Returns:
            Tuple[Dict]: speaker IDs.
        """
        classes = sorted({item[parse_key] for item in items})
        ids = {name: i for i, name in enumerate(classes)}
        return ids


class EmbeddingManager(BaseIDManager):
    """Base `Embedding` Manager class. Every new `Embedding` manager must inherit this.
    It defines common `Embedding` manager specific functions.

    It expects embeddings files in the following format:

    ::

        {
            'audio_file_key':{
                'name': 'category_name',
                'embedding'[<embedding_values>]
            },
            ...
        }

    `audio_file_key` is a unique key to the audio file in the dataset. It can be the path to the file or any other unique key.
    `embedding` is the embedding vector of the audio file.
    `name` can be name of the speaker of the audio file.
    """

    def __init__(
        self,
        embedding_file_path: Union[str, List[str]] = "",
        id_file_path: str = "",
        encoder_model_path: str = "",
        encoder_config_path: str = "",
        use_cuda: bool = False,
    ):
        super().__init__(id_file_path=id_file_path)

        self.embeddings = {}
        self.embeddings_by_names = {}
        self.clip_ids = []
        self.encoder = None
        self.encoder_ap = None
        self.use_cuda = use_cuda

        if embedding_file_path:
            if isinstance(embedding_file_path, list):
                self.load_embeddings_from_list_of_files(embedding_file_path)
            else:
                self.load_embeddings_from_file(embedding_file_path)

        if encoder_model_path and encoder_config_path:
            self.init_encoder(encoder_model_path, encoder_config_path, use_cuda)

    @property
    def num_embeddings(self):
        """Get number of embeddings."""
        return len(self.embeddings)

    @property
    def num_names(self):
        """Get number of embeddings."""
        return len(self.embeddings_by_names)

    @property
    def embedding_dim(self):
        """Dimensionality of embeddings. If embeddings are not loaded, returns zero."""
        if self.embeddings:
            return len(self.embeddings[list(self.embeddings.keys())[0]]["embedding"])
        return 0

    @property
    def embedding_names(self):
        """Get embedding names."""
        return list(self.embeddings_by_names.keys())

    def save_embeddings_to_file(self, file_path: str) -> None:
        """Save embeddings to a json file.

        Args:
            file_path (str): Path to the output file.
        """
        save_file(self.embeddings, file_path)

    @staticmethod
    def read_embeddings_from_file(file_path: str):
        """Load embeddings from a json file.

        Args:
            file_path (str): Path to the file.
        """
        embeddings = load_file(file_path)
        speakers = sorted({x["name"] for x in embeddings.values()})
        name_to_id = {name: i for i, name in enumerate(speakers)}
        clip_ids = list(set(sorted(clip_name for clip_name in embeddings.keys())))
        # cache embeddings_by_names for fast inference using a bigger speakers.json
        embeddings_by_names = {}
        for x in embeddings.values():
            if x["name"] not in embeddings_by_names.keys():
                embeddings_by_names[x["name"]] = [x["embedding"]]
            else:
                embeddings_by_names[x["name"]].append(x["embedding"])
        return name_to_id, clip_ids, embeddings, embeddings_by_names

    def load_embeddings_from_file(self, file_path: str) -> None:
        """Load embeddings from a json file.

        Args:
            file_path (str): Path to the target json file.
        """
        self.name_to_id, self.clip_ids, self.embeddings, self.embeddings_by_names = self.read_embeddings_from_file(
            file_path
        )

    def load_embeddings_from_list_of_files(self, file_paths: List[str]) -> None:
        """Load embeddings from a list of json files and don't allow duplicate keys.

        Args:
            file_paths (List[str]): List of paths to the target json files.
        """
        self.name_to_id = {}
        self.clip_ids = []
        self.embeddings_by_names = {}
        self.embeddings = {}
        for file_path in file_paths:
            ids, clip_ids, embeddings, embeddings_by_names = self.read_embeddings_from_file(file_path)
            # check colliding keys
            duplicates = set(self.embeddings.keys()) & set(embeddings.keys())
            if duplicates:
                raise ValueError(f" [!] Duplicate embedding names <{duplicates}> in {file_path}")
            # store values
            self.name_to_id.update(ids)
            self.clip_ids.extend(clip_ids)
            self.embeddings_by_names.update(embeddings_by_names)
            self.embeddings.update(embeddings)

        # reset name_to_id to get the right speaker ids
        self.name_to_id = {name: i for i, name in enumerate(self.name_to_id)}

    def get_embedding_by_clip(self, clip_idx: str) -> List:
        """Get embedding by clip ID.

        Args:
            clip_idx (str): Target clip ID.

        Returns:
            List: embedding as a list.
        """
        return self.embeddings[clip_idx]["embedding"]

    def get_embeddings_by_name(self, idx: str) -> List[List]:
        """Get all embeddings of a speaker.

        Args:
            idx (str): Target name.

        Returns:
            List[List]: all the embeddings of the given speaker.
        """
        return self.embeddings_by_names[idx]

    def get_embeddings_by_names(self) -> Dict:
        """Get all embeddings by names.

        Returns:
            Dict: all the embeddings of each speaker.
        """
        embeddings_by_names = {}
        for x in self.embeddings.values():
            if x["name"] not in embeddings_by_names.keys():
                embeddings_by_names[x["name"]] = [x["embedding"]]
            else:
                embeddings_by_names[x["name"]].append(x["embedding"])
        return embeddings_by_names

    def get_mean_embedding(self, idx: str, num_samples: int = None, randomize: bool = False) -> np.ndarray:
        """Get mean embedding of a idx.

        Args:
            idx (str): Target name.
            num_samples (int, optional): Number of samples to be averaged. Defaults to None.
            randomize (bool, optional): Pick random `num_samples` of embeddings. Defaults to False.

        Returns:
            np.ndarray: Mean embedding.
        """
        embeddings = self.get_embeddings_by_name(idx)
        if num_samples is None:
            embeddings = np.stack(embeddings).mean(0)
        else:
            assert len(embeddings) >= num_samples, f" [!] {idx} has number of samples < {num_samples}"
            if randomize:
                embeddings = np.stack(random.choices(embeddings, k=num_samples)).mean(0)
            else:
                embeddings = np.stack(embeddings[:num_samples]).mean(0)
        return embeddings

    def get_random_embedding(self) -> Any:
        """Get a random embedding.

        Args:

        Returns:
            np.ndarray: embedding.
        """
        if self.embeddings:
            return self.embeddings[random.choices(list(self.embeddings.keys()))[0]]["embedding"]

        return None

    def get_clips(self) -> List:
        return sorted(self.embeddings.keys())

    def init_encoder(self, model_path: str, config_path: str, use_cuda=False) -> None:
        """Initialize a speaker encoder model.

        Args:
            model_path (str): Model file path.
            config_path (str): Model config file path.
            use_cuda (bool, optional): Use CUDA. Defaults to False.
        """
        self.use_cuda = use_cuda
        self.encoder_config = load_config(config_path)
        self.encoder = setup_encoder_model(self.encoder_config)
        self.encoder_criterion = self.encoder.load_checkpoint(
            self.encoder_config, model_path, eval=True, use_cuda=use_cuda, cache=True
        )
        self.encoder_ap = AudioProcessor(**self.encoder_config.audio)

    def compute_embedding_from_clip(self, wav_file: Union[str, List[str]]) -> list:
        """Compute a embedding from a given audio file.

        Args:
            wav_file (Union[str, List[str]]): Target file path.

        Returns:
            list: Computed embedding.
        """

        def _compute(wav_file: str):
            waveform = self.encoder_ap.load_wav(wav_file, sr=self.encoder_ap.sample_rate)
            if not self.encoder_config.model_params.get("use_torch_spec", False):
                m_input = self.encoder_ap.melspectrogram(waveform)
                m_input = torch.from_numpy(m_input)
            else:
                m_input = torch.from_numpy(waveform)

            if self.use_cuda:
                m_input = m_input.cuda()
            m_input = m_input.unsqueeze(0)
            embedding = self.encoder.compute_embedding(m_input)
            return embedding

        if isinstance(wav_file, list):
            # compute the mean embedding
            embeddings = None
            for wf in wav_file:
                embedding = _compute(wf)
                if embeddings is None:
                    embeddings = embedding
                else:
                    embeddings += embedding
            return (embeddings / len(wav_file))[0].tolist()
        embedding = _compute(wav_file)
        return embedding[0].tolist()

    def compute_embeddings(self, feats: Union[torch.Tensor, np.ndarray]) -> List:
        """Compute embedding from features.

        Args:
            feats (Union[torch.Tensor, np.ndarray]): Input features.

        Returns:
            List: computed embedding.
        """
        if isinstance(feats, np.ndarray):
            feats = torch.from_numpy(feats)
        if feats.ndim == 2:
            feats = feats.unsqueeze(0)
        if self.use_cuda:
            feats = feats.cuda()
        return self.encoder.compute_embedding(feats)


================================================
FILE: TTS/tts/utils/measures.py
================================================
def alignment_diagonal_score(alignments, binary=False):
    """
    Compute how diagonal alignment predictions are. It is useful
    to measure the alignment consistency of a model
    Args:
        alignments (torch.Tensor): batch of alignments.
        binary (bool): if True, ignore scores and consider attention
        as a binary mask.
    Shape:
        - alignments : :math:`[B, T_de, T_en]`
    """
    maxs = alignments.max(dim=1)[0]
    if binary:
        maxs[maxs > 0] = 1
    return maxs.mean(dim=1).mean(dim=0).item()


================================================
FILE: TTS/tts/utils/monotonic_align/__init__.py
================================================


================================================
FILE: TTS/tts/utils/monotonic_align/core.c
================================================
/* Generated by Cython 0.29.28 */

/* BEGIN: Cython Metadata
{
    "distutils": {
        "depends": [],
        "name": "TTS.tts.utils.monotonic_align.core",
        "sources": [
            "TTS/tts/utils/monotonic_align/core.pyx"
        ]
    },
    "module_name": "TTS.tts.utils.monotonic_align.core"
}
END: Cython Metadata */

#ifndef PY_SSIZE_T_CLEAN
#define PY_SSIZE_T_CLEAN
#endif /* PY_SSIZE_T_CLEAN */
#include "Python.h"
#ifndef Py_PYTHON_H
    #error Python headers needed to compile C extensions, please install development version of Python.
#elif PY_VERSION_HEX < 0x02060000 || (0x03000000 <= PY_VERSION_HEX && PY_VERSION_HEX < 0x03030000)
    #error Cython requires Python 2.6+ or Python 3.3+.
#else
#define CYTHON_ABI "0_29_28"
#define CYTHON_HEX_VERSION 0x001D1CF0
#define CYTHON_FUTURE_DIVISION 1
#include <stddef.h>
#ifndef offsetof
  #define offsetof(type, member) ( (size_t) & ((type*)0) -> member )
#endif
#if !defined(WIN32) && !defined(MS_WINDOWS)
  #ifndef __stdcall
    #define __stdcall
  #endif
  #ifndef __cdecl
    #define __cdecl
  #endif
  #ifndef __fastcall
    #define __fastcall
  #endif
#endif
#ifndef DL_IMPORT
  #define DL_IMPORT(t) t
#endif
#ifndef DL_EXPORT
  #define DL_EXPORT(t) t
#endif
#define __PYX_COMMA ,
#ifndef HAVE_LONG_LONG
  #if PY_VERSION_HEX >= 0x02070000
    #define HAVE_LONG_LONG
  #endif
#endif
#ifndef PY_LONG_LONG
  #define PY_LONG_LONG LONG_LONG
#endif
#ifndef Py_HUGE_VAL
  #define Py_HUGE_VAL HUGE_VAL
#endif
#ifdef PYPY_VERSION
  #define CYTHON_COMPILING_IN_PYPY 1
  #define CYTHON_COMPILING_IN_PYSTON 0
  #define CYTHON_COMPILING_IN_CPYTHON 0
  #undef CYTHON_USE_TYPE_SLOTS
  #define CYTHON_USE_TYPE_SLOTS 0
  #undef CYTHON_USE_PYTYPE_LOOKUP
  #define CYTHON_USE_PYTYPE_LOOKUP 0
  #if PY_VERSION_HEX < 0x03050000
    #undef CYTHON_USE_ASYNC_SLOTS
    #define CYTHON_USE_ASYNC_SLOTS 0
  #elif !defined(CYTHON_USE_ASYNC_SLOTS)
    #define CYTHON_USE_ASYNC_SLOTS 1
  #endif
  #undef CYTHON_USE_PYLIST_INTERNALS
  #define CYTHON_USE_PYLIST_INTERNALS 0
  #undef CYTHON_USE_UNICODE_INTERNALS
  #define CYTHON_USE_UNICODE_INTERNALS 0
  #undef CYTHON_USE_UNICODE_WRITER
  #define CYTHON_USE_UNICODE_WRITER 0
  #undef CYTHON_USE_PYLONG_INTERNALS
  #define CYTHON_USE_PYLONG_INTERNALS 0
  #undef CYTHON_AVOID_BORROWED_REFS
  #define CYTHON_AVOID_BORROWED_REFS 1
  #undef CYTHON_ASSUME_SAFE_MACROS
  #define CYTHON_ASSUME_SAFE_MACROS 0
  #undef CYTHON_UNPACK_METHODS
  #define CYTHON_UNPACK_METHODS 0
  #undef CYTHON_FAST_THREAD_STATE
  #define CYTHON_FAST_THREAD_STATE 0
  #undef CYTHON_FAST_PYCALL
  #define CYTHON_FAST_PYCALL 0
  #undef CYTHON_PEP489_MULTI_PHASE_INIT
  #define CYTHON_PEP489_MULTI_PHASE_INIT 0
  #undef CYTHON_USE_TP_FINALIZE
  #define CYTHON_USE_TP_FINALIZE 0
  #undef CYTHON_USE_DICT_VERSIONS
  #define CYTHON_USE_DICT_VERSIONS 0
  #undef CYTHON_USE_EXC_INFO_STACK
  #define CYTHON_USE_EXC_INFO_STACK 0
#elif defined(PYSTON_VERSION)
  #define CYTHON_COMPILING_IN_PYPY 0
  #define CYTHON_COMPILING_IN_PYSTON 1
  #define CYTHON_COMPILING_IN_CPYTHON 0
  #ifndef CYTHON_USE_TYPE_SLOTS
    #define CYTHON_USE_TYPE_SLOTS 1
  #endif
  #undef CYTHON_USE_PYTYPE_LOOKUP
  #define CYTHON_USE_PYTYPE_LOOKUP 0
  #undef CYTHON_USE_ASYNC_SLOTS
  #define CYTHON_USE_ASYNC_SLOTS 0
  #undef CYTHON_USE_PYLIST_INTERNALS
  #define CYTHON_USE_PYLIST_INTERNALS 0
  #ifndef CYTHON_USE_UNICODE_INTERNALS
    #define CYTHON_USE_UNICODE_INTERNALS 1
  #endif
  #undef CYTHON_USE_UNICODE_WRITER
  #define CYTHON_USE_UNICODE_WRITER 0
  #undef CYTHON_USE_PYLONG_INTERNALS
  #define CYTHON_USE_PYLONG_INTERNALS 0
  #ifndef CYTHON_AVOID_BORROWED_REFS
    #define CYTHON_AVOID_BORROWED_REFS 0
  #endif
  #ifndef CYTHON_ASSUME_SAFE_MACROS
    #define CYTHON_ASSUME_SAFE_MACROS 1
  #endif
  #ifndef CYTHON_UNPACK_METHODS
    #define CYTHON_UNPACK_METHODS 1
  #endif
  #undef CYTHON_FAST_THREAD_STATE
  #define CYTHON_FAST_THREAD_STATE 0
  #undef CYTHON_FAST_PYCALL
  #define CYTHON_FAST_PYCALL 0
  #undef CYTHON_PEP489_MULTI_PHASE_INIT
  #define CYTHON_PEP489_MULTI_PHASE_INIT 0
  #undef CYTHON_USE_TP_FINALIZE
  #define CYTHON_USE_TP_FINALIZE 0
  #undef CYTHON_USE_DICT_VERSIONS
  #define CYTHON_USE_DICT_VERSIONS 0
  #undef CYTHON_USE_EXC_INFO_STACK
  #define CYTHON_USE_EXC_INFO_STACK 0
#else
  #define CYTHON_COMPILING_IN_PYPY 0
  #define CYTHON_COMPILING_IN_PYSTON 0
  #define CYTHON_COMPILING_IN_CPYTHON 1
  #ifndef CYTHON_USE_TYPE_SLOTS
    #define CYTHON_USE_TYPE_SLOTS 1
  #endif
  #if PY_VERSION_HEX < 0x02070000
    #undef CYTHON_USE_PYTYPE_LOOKUP
    #define CYTHON_USE_PYTYPE_LOOKUP 0
  #elif !defined(CYTHON_USE_PYTYPE_LOOKUP)
    #define CYTHON_USE_PYTYPE_LOOKUP 1
  #endif
  #if PY_MAJOR_VERSION < 3
    #undef CYTHON_USE_ASYNC_SLOTS
    #define CYTHON_USE_ASYNC_SLOTS 0
  #elif !defined(CYTHON_USE_ASYNC_SLOTS)
    #define CYTHON_USE_ASYNC_SLOTS 1
  #endif
  #if PY_VERSION_HEX < 0x02070000
    #undef CYTHON_USE_PYLONG_INTERNALS
    #define CYTHON_USE_PYLONG_INTERNALS 0
  #elif !defined(CYTHON_USE_PYLONG_INTERNALS)
    #define CYTHON_USE_PYLONG_INTERNALS 1
  #endif
  #ifndef CYTHON_USE_PYLIST_INTERNALS
    #define CYTHON_USE_PYLIST_INTERNALS 1
  #endif
  #ifndef CYTHON_USE_UNICODE_INTERNALS
    #define CYTHON_USE_UNICODE_INTERNALS 1
  #endif
  #if PY_VERSION_HEX < 0x030300F0 || PY_VERSION_HEX >= 0x030B00A2
    #undef CYTHON_USE_UNICODE_WRITER
    #define CYTHON_USE_UNICODE_WRITER 0
  #elif !defined(CYTHON_USE_UNICODE_WRITER)
    #define CYTHON_USE_UNICODE_WRITER 1
  #endif
  #ifndef CYTHON_AVOID_BORROWED_REFS
    #define CYTHON_AVOID_BORROWED_REFS 0
  #endif
  #ifndef CYTHON_ASSUME_SAFE_MACROS
    #define CYTHON_ASSUME_SAFE_MACROS 1
  #endif
  #ifndef CYTHON_UNPACK_METHODS
    #define CYTHON_UNPACK_METHODS 1
  #endif
  #if PY_VERSION_HEX >= 0x030B00A4
    #undef CYTHON_FAST_THREAD_STATE
    #define CYTHON_FAST_THREAD_STATE 0
  #elif !defined(CYTHON_FAST_THREAD_STATE)
    #define CYTHON_FAST_THREAD_STATE 1
  #endif
  #ifndef CYTHON_FAST_PYCALL
    #define CYTHON_FAST_PYCALL (PY_VERSION_HEX < 0x030B00A1)
  #endif
  #ifndef CYTHON_PEP489_MULTI_PHASE_INIT
    #define CYTHON_PEP489_MULTI_PHASE_INIT (PY_VERSION_HEX >= 0x03050000)
  #endif
  #ifndef CYTHON_USE_TP_FINALIZE
    #define CYTHON_USE_TP_FINALIZE (PY_VERSION_HEX >= 0x030400a1)
  #endif
  #ifndef CYTHON_USE_DICT_VERSIONS
    #define CYTHON_USE_DICT_VERSIONS (PY_VERSION_HEX >= 0x030600B1)
  #endif
  #if PY_VERSION_HEX >= 0x030B00A4
    #undef CYTHON_USE_EXC_INFO_STACK
    #define CYTHON_USE_EXC_INFO_STACK 0
  #elif !defined(CYTHON_USE_EXC_INFO_STACK)
    #define CYTHON_USE_EXC_INFO_STACK (PY_VERSION_HEX >= 0x030700A3)
  #endif
#endif
#if !defined(CYTHON_FAST_PYCCALL)
#define CYTHON_FAST_PYCCALL  (CYTHON_FAST_PYCALL && PY_VERSION_HEX >= 0x030600B1)
#endif
#if CYTHON_USE_PYLONG_INTERNALS
  #if PY_MAJOR_VERSION < 3
    #include "longintrepr.h"
  #endif
  #undef SHIFT
  #undef BASE
  #undef MASK
  #ifdef SIZEOF_VOID_P
    enum { __pyx_check_sizeof_voidp = 1 / (int)(SIZEOF_VOID_P == sizeof(void*)) };
  #endif
#endif
#ifndef __has_attribute
  #define __has_attribute(x) 0
#endif
#ifndef __has_cpp_attribute
  #define __has_cpp_attribute(x) 0
#endif
#ifndef CYTHON_RESTRICT
  #if defined(__GNUC__)
    #define CYTHON_RESTRICT __restrict__
  #elif defined(_MSC_VER) && _MSC_VER >= 1400
    #define CYTHON_RESTRICT __restrict
  #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
    #define CYTHON_RESTRICT restrict
  #else
    #define CYTHON_RESTRICT
  #endif
#endif
#ifndef CYTHON_UNUSED
# if defined(__GNUC__)
#   if !(defined(__cplusplus)) || (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 4))
#     define CYTHON_UNUSED __attribute__ ((__unused__))
#   else
#     define CYTHON_UNUSED
#   endif
# elif defined(__ICC) || (defined(__INTEL_COMPILER) && !defined(_MSC_VER))
#   define CYTHON_UNUSED __attribute__ ((__unused__))
# else
#   define CYTHON_UNUSED
# endif
#endif
#ifndef CYTHON_MAYBE_UNUSED_VAR
#  if defined(__cplusplus)
     template<class T> void CYTHON_MAYBE_UNUSED_VAR( const T& ) { }
#  else
#    define CYTHON_MAYBE_UNUSED_VAR(x) (void)(x)
#  endif
#endif
#ifndef CYTHON_NCP_UNUSED
# if CYTHON_COMPILING_IN_CPYTHON
#  define CYTHON_NCP_UNUSED
# else
#  define CYTHON_NCP_UNUSED CYTHON_UNUSED
# endif
#endif
#define __Pyx_void_to_None(void_result) ((void)(void_result), Py_INCREF(Py_None), Py_None)
#ifdef _MSC_VER
    #ifndef _MSC_STDINT_H_
        #if _MSC_VER < 1300
           typedef unsigned char     uint8_t;
           typedef unsigned int      uint32_t;
        #else
           typedef unsigned __int8   uint8_t;
           typedef unsigned __int32  uint32_t;
        #endif
    #endif
#else
   #include <stdint.h>
#endif
#ifndef CYTHON_FALLTHROUGH
  #if defined(__cplusplus) && __cplusplus >= 201103L
    #if __has_cpp_attribute(fallthrough)
      #define CYTHON_FALLTHROUGH [[fallthrough]]
    #elif __has_cpp_attribute(clang::fallthrough)
      #define CYTHON_FALLTHROUGH [[clang::fallthrough]]
    #elif __has_cpp_attribute(gnu::fallthrough)
      #define CYTHON_FALLTHROUGH [[gnu::fallthrough]]
    #endif
  #endif
  #ifndef CYTHON_FALLTHROUGH
    #if __has_attribute(fallthrough)
      #define CYTHON_FALLTHROUGH __attribute__((fallthrough))
    #else
      #define CYTHON_FALLTHROUGH
    #endif
  #endif
  #if defined(__clang__ ) && defined(__apple_build_version__)
    #if __apple_build_version__ < 7000000
      #undef  CYTHON_FALLTHROUGH
      #define CYTHON_FALLTHROUGH
    #endif
  #endif
#endif

#ifndef CYTHON_INLINE
  #if defined(__clang__)
    #define CYTHON_INLINE __inline__ __attribute__ ((__unused__))
  #elif defined(__GNUC__)
    #define CYTHON_INLINE __inline__
  #elif defined(_MSC_VER)
    #define CYTHON_INLINE __inline
  #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
    #define CYTHON_INLINE inline
  #else
    #define CYTHON_INLINE
  #endif
#endif

#if CYTHON_COMPILING_IN_PYPY && PY_VERSION_HEX < 0x02070600 && !defined(Py_OptimizeFlag)
  #define Py_OptimizeFlag 0
#endif
#define __PYX_BUILD_PY_SSIZE_T "n"
#define CYTHON_FORMAT_SSIZE_T "z"
#if PY_MAJOR_VERSION < 3
  #define __Pyx_BUILTIN_MODULE_NAME "__builtin__"
  #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos)\
          PyCode_New(a+k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos)
  #define __Pyx_DefaultClassType PyClass_Type
#else
  #define __Pyx_BUILTIN_MODULE_NAME "builtins"
  #define __Pyx_DefaultClassType PyType_Type
#if PY_VERSION_HEX >= 0x030B00A1
    static CYTHON_INLINE PyCodeObject* __Pyx_PyCode_New(int a, int k, int l, int s, int f,
                                                    PyObject *code, PyObject *c, PyObject* n, PyObject *v,
                                                    PyObject *fv, PyObject *cell, PyObject* fn,
                                                    PyObject *name, int fline, PyObject *lnos) {
        PyObject *kwds=NULL, *argcount=NULL, *posonlyargcount=NULL, *kwonlyargcount=NULL;
        PyObject *nlocals=NULL, *stacksize=NULL, *flags=NULL, *replace=NULL, *call_result=NULL, *empty=NULL;
        const char *fn_cstr=NULL;
        const char *name_cstr=NULL;
        PyCodeObject* co=NULL;
        PyObject *type, *value, *traceback;
        PyErr_Fetch(&type, &value, &traceback);
        if (!(kwds=PyDict_New())) goto end;
        if (!(argcount=PyLong_FromLong(a))) goto end;
        if (PyDict_SetItemString(kwds, "co_argcount", argcount) != 0) goto end;
        if (!(posonlyargcount=PyLong_FromLong(0))) goto end;
        if (PyDict_SetItemString(kwds, "co_posonlyargcount", posonlyargcount) != 0) goto end;
        if (!(kwonlyargcount=PyLong_FromLong(k))) goto end;
        if (PyDict_SetItemString(kwds, "co_kwonlyargcount", kwonlyargcount) != 0) goto end;
        if (!(nlocals=PyLong_FromLong(l))) goto end;
        if (PyDict_SetItemString(kwds, "co_nlocals", nlocals) != 0) goto end;
        if (!(stacksize=PyLong_FromLong(s))) goto end;
        if (PyDict_SetItemString(kwds, "co_stacksize", stacksize) != 0) goto end;
        if (!(flags=PyLong_FromLong(f))) goto end;
        if (PyDict_SetItemString(kwds, "co_flags", flags) != 0) goto end;
        if (PyDict_SetItemString(kwds, "co_code", code) != 0) goto end;
        if (PyDict_SetItemString(kwds, "co_consts", c) != 0) goto end;
        if (PyDict_SetItemString(kwds, "co_names", n) != 0) goto end;
        if (PyDict_SetItemString(kwds, "co_varnames", v) != 0) goto end;
        if (PyDict_SetItemString(kwds, "co_freevars", fv) != 0) goto end;
        if (PyDict_SetItemString(kwds, "co_cellvars", cell) != 0) goto end;
        if (PyDict_SetItemString(kwds, "co_linetable", lnos) != 0) goto end;
        if (!(fn_cstr=PyUnicode_AsUTF8AndSize(fn, NULL))) goto end;
        if (!(name_cstr=PyUnicode_AsUTF8AndSize(name, NULL))) goto end;
        if (!(co = PyCode_NewEmpty(fn_cstr, name_cstr, fline))) goto end;
        if (!(replace = PyObject_GetAttrString((PyObject*)co, "replace"))) goto cleanup_code_too;
        if (!(empty = PyTuple_New(0))) goto cleanup_code_too; // unfortunately __pyx_empty_tuple isn't available here
        if (!(call_result = PyObject_Call(replace, empty, kwds))) goto cleanup_code_too;
        Py_XDECREF((PyObject*)co);
        co = (PyCodeObject*)call_result;
        call_result = NULL;
        if (0) {
            cleanup_code_too:
            Py_XDECREF((PyObject*)co);
            co = NULL;
        }
        end:
        Py_XDECREF(kwds);
        Py_XDECREF(argcount);
        Py_XDECREF(posonlyargcount);
        Py_XDECREF(kwonlyargcount);
        Py_XDECREF(nlocals);
        Py_XDECREF(stacksize);
        Py_XDECREF(replace);
        Py_XDECREF(call_result);
        Py_XDECREF(empty);
        if (type) {
            PyErr_Restore(type, value, traceback);
        }
        return co;
    }
#else
  #define __Pyx_PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos)\
          PyCode_New(a, k, l, s, f, code, c, n, v, fv, cell, fn, name, fline, lnos)
#endif
  #define __Pyx_DefaultClassType PyType_Type
#endif
#ifndef Py_TPFLAGS_CHECKTYPES
  #define Py_TPFLAGS_CHECKTYPES 0
#endif
#ifndef Py_TPFLAGS_HAVE_INDEX
  #define Py_TPFLAGS_HAVE_INDEX 0
#endif
#ifndef Py_TPFLAGS_HAVE_NEWBUFFER
  #define Py_TPFLAGS_HAVE_NEWBUFFER 0
#endif
#ifndef Py_TPFLAGS_HAVE_FINALIZE
  #define Py_TPFLAGS_HAVE_FINALIZE 0
#endif
#ifndef METH_STACKLESS
  #define METH_STACKLESS 0
#endif
#if PY_VERSION_HEX <= 0x030700A3 || !defined(METH_FASTCALL)
  #ifndef METH_FASTCALL
     #define METH_FASTCALL 0x80
  #endif
  typedef PyObject *(*__Pyx_PyCFunctionFast) (PyObject *self, PyObject *const *args, Py_ssize_t nargs);
  typedef PyObject *(*__Pyx_PyCFunctionFastWithKeywords) (PyObject *self, PyObject *const *args,
                                                          Py_ssize_t nargs, PyObject *kwnames);
#else
  #define __Pyx_PyCFunctionFast _PyCFunctionFast
  #define __Pyx_PyCFunctionFastWithKeywords _PyCFunctionFastWithKeywords
#endif
#if CYTHON_FAST_PYCCALL
#define __Pyx_PyFastCFunction_Check(func)\
    ((PyCFunction_Check(func) && (METH_FASTCALL == (PyCFunction_GET_FLAGS(func) & ~(METH_CLASS | METH_STATIC | METH_COEXIST | METH_KEYWORDS | METH_STACKLESS)))))
#else
#define __Pyx_PyFastCFunction_Check(func) 0
#endif
#if CYTHON_COMPILING_IN_PYPY && !defined(PyObject_Malloc)
  #define PyObject_Malloc(s)   PyMem_Malloc(s)
  #define PyObject_Free(p)     PyMem_Free(p)
  #define PyObject_Realloc(p)  PyMem_Realloc(p)
#endif
#if CYTHON_COMPILING_IN_CPYTHON && PY_VERSION_HEX < 0x030400A1
  #define PyMem_RawMalloc(n)           PyMem_Malloc(n)
  #define PyMem_RawRealloc(p, n)       PyMem_Realloc(p, n)
  #define PyMem_RawFree(p)             PyMem_Free(p)
#endif
#if CYTHON_COMPILING_IN_PYSTON
  #define __Pyx_PyCode_HasFreeVars(co)  PyCode_HasFreeVars(co)
  #define __Pyx_PyFrame_SetLineNumber(frame, lineno) PyFrame_SetLineNumber(frame, lineno)
#else
  #define __Pyx_PyCode_HasFreeVars(co)  (PyCode_GetNumFree(co) > 0)
  #define __Pyx_PyFrame_SetLineNumber(frame, lineno)  (frame)->f_lineno = (lineno)
#endif
#if !CYTHON_FAST_THREAD_STATE || PY_VERSION_HEX < 0x02070000
  #define __Pyx_PyThreadState_Current PyThreadState_GET()
#elif PY_VERSION_HEX >= 0x03060000
  #define __Pyx_PyThreadState_Current _PyThreadState_UncheckedGet()
#elif PY_VERSION_HEX >= 0x03000000
  #define __Pyx_PyThreadState_Current PyThreadState_GET()
#else
  #define __Pyx_PyThreadState_Current _PyThreadState_Current
#endif
#if PY_VERSION_HEX < 0x030700A2 && !defined(PyThread_tss_create) && !defined(Py_tss_NEEDS_INIT)
#include "pythread.h"
#define Py_tss_NEEDS_INIT 0
typedef int Py_tss_t;
static CYTHON_INLINE int PyThread_tss_create(Py_tss_t *key) {
  *key = PyThread_create_key();
  return 0;
}
static CYTHON_INLINE Py_tss_t * PyThread_tss_alloc(void) {
  Py_tss_t *key = (Py_tss_t *)PyObject_Malloc(sizeof(Py_tss_t));
  *key = Py_tss_NEEDS_INIT;
  return key;
}
static CYTHON_INLINE void PyThread_tss_free(Py_tss_t *key) {
  PyObject_Free(key);
}
static CYTHON_INLINE int PyThread_tss_is_created(Py_tss_t *key) {
  return *key != Py_tss_NEEDS_INIT;
}
static CYTHON_INLINE void PyThread_tss_delete(Py_tss_t *key) {
  PyThread_delete_key(*key);
  *key = Py_tss_NEEDS_INIT;
}
static CYTHON_INLINE int PyThread_tss_set(Py_tss_t *key, void *value) {
  return PyThread_set_key_value(*key, value);
}
static CYTHON_INLINE void * PyThread_tss_get(Py_tss_t *key) {
  return PyThread_get_key_value(*key);
}
#endif
#if CYTHON_COMPILING_IN_CPYTHON || defined(_PyDict_NewPresized)
#define __Pyx_PyDict_NewPresized(n)  ((n <= 8) ? PyDict_New() : _PyDict_NewPresized(n))
#else
#define __Pyx_PyDict_NewPresized(n)  PyDict_New()
#endif
#if PY_MAJOR_VERSION >= 3 || CYTHON_FUTURE_DIVISION
  #define __Pyx_PyNumber_Divide(x,y)         PyNumber_TrueDivide(x,y)
  #define __Pyx_PyNumber_InPlaceDivide(x,y)  PyNumber_InPlaceTrueDivide(x,y)
#else
  #define __Pyx_PyNumber_Divide(x,y)         PyNumber_Divide(x,y)
  #define __Pyx_PyNumber_InPlaceDivide(x,y)  PyNumber_InPlaceDivide(x,y)
#endif
#if CYTHON_COMPILING_IN_CPYTHON && PY_VERSION_HEX >= 0x030500A1 && CYTHON_USE_UNICODE_INTERNALS
#define __Pyx_PyDict_GetItemStr(dict, name)  _PyDict_GetItem_KnownHash(dict, name, ((PyASCIIObject *) name)->hash)
#else
#define __Pyx_PyDict_GetItemStr(dict, name)  PyDict_GetItem(dict, name)
#endif
#if PY_VERSION_HEX > 0x03030000 && defined(PyUnicode_KIND)
  #define CYTHON_PEP393_ENABLED 1
  #if defined(PyUnicode_IS_READY)
  #define __Pyx_PyUnicode_READY(op)       (likely(PyUnicode_IS_READY(op)) ?\
                                              0 : _PyUnicode_Ready((PyObject *)(op)))
  #else
  #define __Pyx_PyUnicode_READY(op)       (0)
  #endif
  #define __Pyx_PyUnicode_GET_LENGTH(u)   PyUnicode_GET_LENGTH(u)
  #define __Pyx_PyUnicode_READ_CHAR(u, i) PyUnicode_READ_CHAR(u, i)
  #define __Pyx_PyUnicode_MAX_CHAR_VALUE(u)   PyUnicode_MAX_CHAR_VALUE(u)
  #define __Pyx_PyUnicode_KIND(u)         PyUnicode_KIND(u)
  #define __Pyx_PyUnicode_DATA(u)         PyUnicode_DATA(u)
  #define __Pyx_PyUnicode_READ(k, d, i)   PyUnicode_READ(k, d, i)
  #define __Pyx_PyUnicode_WRITE(k, d, i, ch)  PyUnicode_WRITE(k, d, i, ch)
  #if defined(PyUnicode_IS_READY) && defined(PyUnicode_GET_SIZE)
  #if CYTHON_COMPILING_IN_CPYTHON && PY_VERSION_HEX >= 0x03090000
  #define __Pyx_PyUnicode_IS_TRUE(u)      (0 != (likely(PyUnicode_IS_READY(u)) ? PyUnicode_GET_LENGTH(u) : ((PyCompactUnicodeObject *)(u))->wstr_length))
  #else
  #define __Pyx_PyUnicode_IS_TRUE(u)      (0 != (likely(PyUnicode_IS_READY(u)) ? PyUnicode_GET_LENGTH(u) : PyUnicode_GET_SIZE(u)))
  #endif
  #else
  #define __Pyx_PyUnicode_IS_TRUE(u)      (0 != PyUnicode_GET_LENGTH(u))
  #endif
#else
  #define CYTHON_PEP393_ENABLED 0
  #define PyUnicode_1BYTE_KIND  1
  #define PyUnicode_2BYTE_KIND  2
  #define PyUnicode_4BYTE_KIND  4
  #define __Pyx_PyUnicode_READY(op)       (0)
  #define __Pyx_PyUnicode_GET_LENGTH(u)   PyUnicode_GET_SIZE(u)
  #define __Pyx_PyUnicode_READ_CHAR(u, i) ((Py_UCS4)(PyUnicode_AS_UNICODE(u)[i]))
  #define __Pyx_PyUnicode_MAX_CHAR_VALUE(u)   ((sizeof(Py_UNICODE) == 2) ? 65535 : 1114111)
  #define __Pyx_PyUnicode_KIND(u)         (sizeof(Py_UNICODE))
  #define __Pyx_PyUnicode_DATA(u)         ((void*)PyUnicode_AS_UNICODE(u))
  #define __Pyx_PyUnicode_READ(k, d, i)   ((void)(k), (Py_UCS4)(((Py_UNICODE*)d)[i]))
  #define __Pyx_PyUnicode_WRITE(k, d, i, ch)  (((void)(k)), ((Py_UNICODE*)d)[i] = ch)
  #define __Pyx_PyUnicode_IS_TRUE(u)      (0 != PyUnicode_GET_SIZE(u))
#endif
#if CYTHON_COMPILING_IN_PYPY
  #define __Pyx_PyUnicode_Concat(a, b)      PyNumber_Add(a, b)
  #define __Pyx_PyUnicode_ConcatSafe(a, b)  PyNumber_Add(a, b)
#else
  #define __Pyx_PyUnicode_Concat(a, b)      PyUnicode_Concat(a, b)
  #define __Pyx_PyUnicode_ConcatSafe(a, b)  ((unlikely((a) == Py_None) || unlikely((b) == Py_None)) ?\
      PyNumber_Add(a, b) : __Pyx_PyUnicode_Concat(a, b))
#endif
#if CYTHON_COMPILING_IN_PYPY && !defined(PyUnicode_Contains)
  #define PyUnicode_Contains(u, s)  PySequence_Contains(u, s)
#endif
#if CYTHON_COMPILING_IN_PYPY && !defined(PyByteArray_Check)
  #define PyByteArray_Check(obj)  PyObject_TypeCheck(obj, &PyByteArray_Type)
#endif
#if CYTHON_COMPILING_IN_PYPY && !defined(PyObject_Format)
  #define PyObject_Format(obj, fmt)  PyObject_CallMethod(obj, "__format__", "O", fmt)
#endif
#define __Pyx_PyString_FormatSafe(a, b)   ((unlikely((a) == Py_None || (PyString_Check(b) && !PyString_CheckExact(b)))) ? PyNumber_Remainder(a, b) : __Pyx_PyString_Format(a, b))
#define __Pyx_PyUnicode_FormatSafe(a, b)  ((unlikely((a) == Py_None || (PyUnicode_Check(b) && !PyUnicode_CheckExact(b)))) ? PyNumber_Remainder(a, b) : PyUnicode_Format(a, b))
#if PY_MAJOR_VERSION >= 3
  #define __Pyx_PyString_Format(a, b)  PyUnicode_Format(a, b)
#else
  #define __Pyx_PyString_Format(a, b)  PyString_Format(a, b)
#endif
#if PY_MAJOR_VERSION < 3 && !defined(PyObject_ASCII)
  #define PyObject_ASCII(o)            PyObject_Repr(o)
#endif
#if PY_MAJOR_VERSION >= 3
  #define PyBaseString_Type            PyUnicode_Type
  #define PyStringObject               PyUnicodeObject
  #define PyString_Type                PyUnicode_Type
  #define PyString_Check               PyUnicode_Check
  #define PyString_CheckExact          PyUnicode_CheckExact
#ifndef PyObject_Unicode
  #define PyObject_Unicode             PyObject_Str
#endif
#endif
#if PY_MAJOR_VERSION >= 3
  #define __Pyx_PyBaseString_Check(obj) PyUnicode_Check(obj)
  #define __Pyx_PyBaseString_CheckExact(obj) PyUnicode_CheckExact(obj)
#else
  #define __Pyx_PyBaseString_Check(obj) (PyString_Check(obj) || PyUnicode_Check(obj))
  #define __Pyx_PyBaseString_CheckExact(obj) (PyString_CheckExact(obj) || PyUnicode_CheckExact(obj))
#endif
#ifndef PySet_CheckExact
  #define PySet_CheckExact(obj)        (Py_TYPE(obj) == &PySet_Type)
#endif
#if PY_VERSION_HEX >= 0x030900A4
  #define __Pyx_SET_REFCNT(obj, refcnt) Py_SET_REFCNT(obj, refcnt)
  #define __Pyx_SET_SIZE(obj, size) Py_SET_SIZE(obj, size)
#else
  #define __Pyx_SET_REFCNT(obj, refcnt) Py_REFCNT(obj) = (refcnt)
  #define __Pyx_SET_SIZE(obj, size) Py_SIZE(obj) = (size)
#endif
#if CYTHON_ASSUME_SAFE_MACROS
  #define __Pyx_PySequence_SIZE(seq)  Py_SIZE(seq)
#else
  #define __Pyx_PySequence_SIZE(seq)  PySequence_Size(seq)
#endif
#if PY_MAJOR_VERSION >= 3
  #define PyIntObject                  PyLongObject
  #define PyInt_Type                   PyLong_Type
  #define PyInt_Check(op)              PyLong_Check(op)
  #define PyInt_CheckExact(op)         PyLong_CheckExact(op)
  #define PyInt_FromString             PyLong_FromString
  #define PyInt_FromUnicode            PyLong_FromUnicode
  #define PyInt_FromLong               PyLong_FromLong
  #define PyInt_FromSize_t             PyLong_FromSize_t
  #define PyInt_FromSsize_t            PyLong_FromSsize_t
  #define PyInt_AsLong                 PyLong_AsLong
  #define PyInt_AS_LONG                PyLong_AS_LONG
  #define PyInt_AsSsize_t              PyLong_AsSsize_t
  #define PyInt_AsUnsignedLongMask     PyLong_AsUnsignedLongMask
  #define PyInt_AsUnsignedLongLongMask PyLong_AsUnsignedLongLongMask
  #define PyNumber_Int                 PyNumber_Long
#endif
#if PY_MAJOR_VERSION >= 3
  #define PyBoolObject                 PyLongObject
#endif
#if PY_MAJOR_VERSION >= 3 && CYTHON_COMPILING_IN_PYPY
  #ifndef PyUnicode_InternFromString
    #define PyUnicode_InternFromString(s) PyUnicode_FromString(s)
  #endif
#endif
#if PY_VERSION_HEX < 0x030200A4
  typedef long Py_hash_t;
  #define __Pyx_PyInt_FromHash_t PyInt_FromLong
  #define __Pyx_PyInt_AsHash_t   __Pyx_PyIndex_AsHash_t
#else
  #define __Pyx_PyInt_FromHash_t PyInt_FromSsize_t
  #define __Pyx_PyInt_AsHash_t   __Pyx_PyIndex_AsSsize_t
#endif
#if PY_MAJOR_VERSION >= 3
  #define __Pyx_PyMethod_New(func, self, klass) ((self) ? ((void)(klass), PyMethod_New(func, self)) : __Pyx_NewRef(func))
#else
  #define __Pyx_PyMethod_New(func, self, klass) PyMethod_New(func, self, klass)
#endif
#if CYTHON_USE_ASYNC_SLOTS
  #if PY_VERSION_HEX >= 0x030500B1
    #define __Pyx_PyAsyncMethodsStruct PyAsyncMethods
    #define __Pyx_PyType_AsAsync(obj) (Py_TYPE(obj)->tp_as_async)
  #else
    #define __Pyx_PyType_AsAsync(obj) ((__Pyx_PyAsyncMethodsStruct*) (Py_TYPE(obj)->tp_reserved))
  #endif
#else
  #define __Pyx_PyType_AsAsync(obj) NULL
#endif
#ifndef __Pyx_PyAsyncMethodsStruct
    typedef struct {
        unaryfunc am_await;
        unaryfunc am_aiter;
        unaryfunc am_anext;
    } __Pyx_PyAsyncMethodsStruct;
#endif

#if defined(WIN32) || defined(MS_WINDOWS)
  #define _USE_MATH_DEFINES
#endif
#include <math.h>
#ifdef NAN
#define __PYX_NAN() ((float) NAN)
#else
static CYTHON_INLINE float __PYX_NAN() {
  float value;
  memset(&value, 0xFF, sizeof(value));
  return value;
}
#endif
#if defined(__CYGWIN__) && defined(_LDBL_EQ_DBL)
#define __Pyx_truncl trunc
#else
#define __Pyx_truncl truncl
#endif

#define __PYX_MARK_ERR_POS(f_index, lineno) \
    { __pyx_filename = __pyx_f[f_index]; (void)__pyx_filename; __pyx_lineno = lineno; (void)__pyx_lineno; __pyx_clineno = __LINE__; (void)__pyx_clineno; }
#define __PYX_ERR(f_index, lineno, Ln_error) \
    { __PYX_MARK_ERR_POS(f_index, lineno) goto Ln_error; }

#ifndef __PYX_EXTERN_C
  #ifdef __cplusplus
    #define __PYX_EXTERN_C extern "C"
  #else
    #define __PYX_EXTERN_C extern
  #endif
#endif

#define __PYX_HAVE__TTS__tts__utils__monotonic_align__core
#define __PYX_HAVE_API__TTS__tts__utils__monotonic_align__core
/* Early includes */
#include <string.h>
#include <stdio.h>
#include "numpy/arrayobject.h"
#include "numpy/ndarrayobject.h"
#include "numpy/ndarraytypes.h"
#include "numpy/arrayscalars.h"
#include "numpy/ufuncobject.h"

    /* NumPy API declarations from "numpy/__init__.pxd" */
    
#include "pythread.h"
#include <stdlib.h>
#include "pystate.h"
#ifdef _OPENMP
#include <omp.h>
#endif /* _OPENMP */

#if defined(PYREX_WITHOUT_ASSERTIONS) && !defined(CYTHON_WITHOUT_ASSERTIONS)
#define CYTHON_WITHOUT_ASSERTIONS
#endif

typedef struct {PyObject **p; const char *s; const Py_ssize_t n; const char* encoding;
                const char is_unicode; const char is_str; const char intern; } __Pyx_StringTabEntry;

#define __PYX_DEFAULT_STRING_ENCODING_IS_ASCII 0
#define __PYX_DEFAULT_STRING_ENCODING_IS_UTF8 0
#define __PYX_DEFAULT_STRING_ENCODING_IS_DEFAULT (PY_MAJOR_VERSION >= 3 && __PYX_DEFAULT_STRING_ENCODING_IS_UTF8)
#define __PYX_DEFAULT_STRING_ENCODING ""
#define __Pyx_PyObject_FromString __Pyx_PyBytes_FromString
#define __Pyx_PyObject_FromStringAndSize __Pyx_PyBytes_FromStringAndSize
#define __Pyx_uchar_cast(c) ((unsigned char)c)
#define __Pyx_long_cast(x) ((long)x)
#define __Pyx_fits_Py_ssize_t(v, type, is_signed)  (\
    (sizeof(type) < sizeof(Py_ssize_t))  ||\
    (sizeof(type) > sizeof(Py_ssize_t) &&\
          likely(v < (type)PY_SSIZE_T_MAX ||\
                 v == (type)PY_SSIZE_T_MAX)  &&\
          (!is_signed || likely(v > (type)PY_SSIZE_T_MIN ||\
                                v == (type)PY_SSIZE_T_MIN)))  ||\
    (sizeof(type) == sizeof(Py_ssize_t) &&\
          (is_signed || likely(v < (type)PY_SSIZE_T_MAX ||\
                               v == (type)PY_SSIZE_T_MAX)))  )
static CYTHON_INLINE int __Pyx_is_valid_index(Py_ssize_t i, Py_ssize_t limit) {
    return (size_t) i < (size_t) limit;
}
#if defined (__cplusplus) && __cplusplus >= 201103L
    #include <cstdlib>
    #define __Pyx_sst_abs(value) std::abs(value)
#elif SIZEOF_INT >= SIZEOF_SIZE_T
    #define __Pyx_sst_abs(value) abs(value)
#elif SIZEOF_LONG >= SIZEOF_SIZE_T
    #define __Pyx_sst_abs(value) labs(value)
#elif defined (_MSC_VER)
    #define __Pyx_sst_abs(value) ((Py_ssize_t)_abs64(value))
#elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
    #define __Pyx_sst_abs(value) llabs(value)
#elif defined (__GNUC__)
    #define __Pyx_sst_abs(value) __builtin_llabs(value)
#else
    #define __Pyx_sst_abs(value) ((value<0) ? -value : value)
#endif
static CYTHON_INLINE const char* __Pyx_PyObject_AsString(PyObject*);
static CYTHON_INLINE const char* __Pyx_PyObject_AsStringAndSize(PyObject*, Py_ssize_t* length);
#define __Pyx_PyByteArray_FromString(s) PyByteArray_FromStringAndSize((const char*)s, strlen((const char*)s))
#define __Pyx_PyByteArray_FromStringAndSize(s, l) PyByteArray_FromStringAndSize((const char*)s, l)
#define __Pyx_PyBytes_FromString        PyBytes_FromString
#define __Pyx_PyBytes_FromStringAndSize PyBytes_FromStringAndSize
static CYTHON_INLINE PyObject* __Pyx_PyUnicode_FromString(const char*);
#if PY_MAJOR_VERSION < 3
    #define __Pyx_PyStr_FromString        __Pyx_PyBytes_FromString
    #define __Pyx_PyStr_FromStringAndSize __Pyx_PyBytes_FromStringAndSize
#else
    #define __Pyx_PyStr_FromString        __Pyx_PyUnicode_FromString
    #define __Pyx_PyStr_FromStringAndSize __Pyx_PyUnicode_FromStringAndSize
#endif
#define __Pyx_PyBytes_AsWritableString(s)     ((char*) PyBytes_AS_STRING(s))
#define __Pyx_PyBytes_AsWritableSString(s)    ((signed char*) PyBytes_AS_STRING(s))
#define __Pyx_PyBytes_AsWritableUString(s)    ((unsigned char*) PyBytes_AS_STRING(s))
#define __Pyx_PyBytes_AsString(s)     ((const char*) PyBytes_AS_STRING(s))
#define __Pyx_PyBytes_AsSString(s)    ((const signed char*) PyBytes_AS_STRING(s))
#define __Pyx_PyBytes_AsUString(s)    ((const unsigned char*) PyBytes_AS_STRING(s))
#define __Pyx_PyObject_AsWritableString(s)    ((char*) __Pyx_PyObject_AsString(s))
#define __Pyx_PyObject_AsWritableSString(s)    ((signed char*) __Pyx_PyObject_AsString(s))
#define __Pyx_PyObject_AsWritableUString(s)    ((unsigned char*) __Pyx_PyObject_AsString(s))
#define __Pyx_PyObject_AsSString(s)    ((const signed char*) __Pyx_PyObject_AsString(s))
#define __Pyx_PyObject_AsUString(s)    ((const unsigned char*) __Pyx_PyObject_AsString(s))
#define __Pyx_PyObject_FromCString(s)  __Pyx_PyObject_FromString((const char*)s)
#define __Pyx_PyBytes_FromCString(s)   __Pyx_PyBytes_FromString((const char*)s)
#define __Pyx_PyByteArray_FromCString(s)   __Pyx_PyByteArray_FromString((const char*)s)
#define __Pyx_PyStr_FromCString(s)     __Pyx_PyStr_FromString((const char*)s)
#define __Pyx_PyUnicode_FromCString(s) __Pyx_PyUnicode_FromString((const char*)s)
static CYTHON_INLINE size_t __Pyx_Py_UNICODE_strlen(const Py_UNICODE *u) {
    const Py_UNICODE *u_end = u;
    while (*u_end++) ;
    return (size_t)(u_end - u - 1);
}
#define __Pyx_PyUnicode_FromUnicode(u)       PyUnicode_FromUnicode(u, __Pyx_Py_UNICODE_strlen(u))
#define __Pyx_PyUnicode_FromUnicodeAndLength PyUnicode_FromUnicode
#define __Pyx_PyUnicode_AsUnicode            PyUnicode_AsUnicode
#define __Pyx_NewRef(obj) (Py_INCREF(obj), obj)
#define __Pyx_Owned_Py_None(b) __Pyx_NewRef(Py_None)
static CYTHON_INLINE PyObject * __Pyx_PyBool_FromLong(long b);
static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject*);
static CYTHON_INLINE int __Pyx_PyObject_IsTrueAndDecref(PyObject*);
static CYTHON_INLINE PyObject* __Pyx_PyNumber_IntOrLong(PyObject* x);
#define __Pyx_PySequence_Tuple(obj)\
    (likely(PyTuple_CheckExact(obj)) ? __Pyx_NewRef(obj) : PySequence_Tuple(obj))
static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject*);
static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t);
static CYTHON_INLINE Py_hash_t __Pyx_PyIndex_AsHash_t(PyObject*);
#if CYTHON_ASSUME_SAFE_MACROS
#define __pyx_PyFloat_AsDouble(x) (PyFloat_CheckExact(x) ? PyFloat_AS_DOUBLE(x) : PyFloat_AsDouble(x))
#else
#define __pyx_PyFloat_AsDouble(x) PyFloat_AsDouble(x)
#endif
#define __pyx_PyFloat_AsFloat(x) ((float) __pyx_PyFloat_AsDouble(x))
#if PY_MAJOR_VERSION >= 3
#define __Pyx_PyNumber_Int(x) (PyLong_CheckExact(x) ? __Pyx_NewRef(x) : PyNumber_Long(x))
#else
#define __Pyx_PyNumber_Int(x) (PyInt_CheckExact(x) ? __Pyx_NewRef(x) : PyNumber_Int(x))
#endif
#define __Pyx_PyNumber_Float(x) (PyFloat_CheckExact(x) ? __Pyx_NewRef(x) : PyNumber_Float(x))
#if PY_MAJOR_VERSION < 3 && __PYX_DEFAULT_STRING_ENCODING_IS_ASCII
static int __Pyx_sys_getdefaultencoding_not_ascii;
static int __Pyx_init_sys_getdefaultencoding_params(void) {
    PyObject* sys;
    PyObject* default_encoding = NULL;
    PyObject* ascii_chars_u = NULL;
    PyObject* ascii_chars_b = NULL;
    const char* default_encoding_c;
    sys = PyImport_ImportModule("sys");
    if (!sys) goto bad;
    default_encoding = PyObject_CallMethod(sys, (char*) "getdefaultencoding", NULL);
    Py_DECREF(sys);
    if (!default_encoding) goto bad;
    default_encoding_c = PyBytes_AsString(default_encoding);
    if (!default_encoding_c) goto bad;
    if (strcmp(default_encoding_c, "ascii") == 0) {
        __Pyx_sys_getdefaultencoding_not_ascii = 0;
    } else {
        char ascii_chars[128];
        int c;
        for (c = 0; c < 128; c++) {
            ascii_chars[c] = c;
        }
        __Pyx_sys_getdefaultencoding_not_ascii = 1;
        ascii_chars_u = PyUnicode_DecodeASCII(ascii_chars, 128, NULL);
        if (!ascii_chars_u) goto bad;
        ascii_chars_b = PyUnicode_AsEncodedString(ascii_chars_u, default_encoding_c, NULL);
        if (!ascii_chars_b || !PyBytes_Check(ascii_chars_b) || memcmp(ascii_chars, PyBytes_AS_STRING(ascii_chars_b), 128) != 0) {
            PyErr_Format(
                PyExc_ValueError,
                "This module compiled with c_string_encoding=ascii, but default encoding '%.200s' is not a superset of ascii.",
                default_encoding_c);
            goto bad;
        }
        Py_DECREF(ascii_chars_u);
        Py_DECREF(ascii_chars_b);
    }
    Py_DECREF(default_encoding);
    return 0;
bad:
    Py_XDECREF(default_encoding);
    Py_XDECREF(ascii_chars_u);
    Py_XDECREF(ascii_chars_b);
    return -1;
}
#endif
#if __PYX_DEFAULT_STRING_ENCODING_IS_DEFAULT && PY_MAJOR_VERSION >= 3
#define __Pyx_PyUnicode_FromStringAndSize(c_str, size) PyUnicode_DecodeUTF8(c_str, size, NULL)
#else
#define __Pyx_PyUnicode_FromStringAndSize(c_str, size) PyUnicode_Decode(c_str, size, __PYX_DEFAULT_STRING_ENCODING, NULL)
#if __PYX_DEFAULT_STRING_ENCODING_IS_DEFAULT
static char* __PYX_DEFAULT_STRING_ENCODING;
static int __Pyx_init_sys_getdefaultencoding_params(void) {
    PyObject* sys;
    PyObject* default_encoding = NULL;
    char* default_encoding_c;
    sys = PyImport_ImportModule("sys");
    if (!sys) goto bad;
    default_encoding = PyObject_CallMethod(sys, (char*) (const char*) "getdefaultencoding", NULL);
    Py_DECREF(sys);
    if (!default_encoding) goto bad;
    default_encoding_c = PyBytes_AsString(default_encoding);
    if (!default_encoding_c) goto bad;
    __PYX_DEFAULT_STRING_ENCODING = (char*) malloc(strlen(default_encoding_c) + 1);
    if (!__PYX_DEFAULT_STRING_ENCODING) goto bad;
    strcpy(__PYX_DEFAULT_STRING_ENCODING, default_encoding_c);
    Py_DECREF(default_encoding);
    return 0;
bad:
    Py_XDECREF(default_encoding);
    return -1;
}
#endif
#endif


/* Test for GCC > 2.95 */
#if defined(__GNUC__)     && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))
  #define likely(x)   __builtin_expect(!!(x), 1)
  #define unlikely(x) __builtin_expect(!!(x), 0)
#else /* !__GNUC__ or GCC < 2.95 */
  #define likely(x)   (x)
  #define unlikely(x) (x)
#endif /* __GNUC__ */
static CYTHON_INLINE void __Pyx_pretend_to_initialize(void* ptr) { (void)ptr; }

static PyObject *__pyx_m = NULL;
static PyObject *__pyx_d;
static PyObject *__pyx_b;
static PyObject *__pyx_cython_runtime = NULL;
static PyObject *__pyx_empty_tuple;
static PyObject *__pyx_empty_bytes;
static PyObject *__pyx_empty_unicode;
static int __pyx_lineno;
static int __pyx_clineno = 0;
static const char * __pyx_cfilenm= __FILE__;
static const char *__pyx_filename;

/* Header.proto */
#if !defined(CYTHON_CCOMPLEX)
  #if defined(__cplusplus)
    #define CYTHON_CCOMPLEX 1
  #elif defined(_Complex_I)
    #define CYTHON_CCOMPLEX 1
  #else
    #define CYTHON_CCOMPLEX 0
  #endif
#endif
#if CYTHON_CCOMPLEX
  #ifdef __cplusplus
    #include <complex>
  #else
    #include <complex.h>
  #endif
#endif
#if CYTHON_CCOMPLEX && !defined(__cplusplus) && defined(__sun__) && defined(__GNUC__)
  #undef _Complex_I
  #define _Complex_I 1.0fj
#endif


static const char *__pyx_f[] = {
  "TTS\\tts\\utils\\monotonic_align\\core.pyx",
  "__init__.pxd",
  "stringsource",
  "type.pxd",
};
/* NoFastGil.proto */
#define __Pyx_PyGILState_Ensure PyGILState_Ensure
#define __Pyx_PyGILState_Release PyGILState_Release
#define __Pyx_FastGIL_Remember()
#define __Pyx_FastGIL_Forget()
#define __Pyx_FastGilFuncInit()

/* MemviewSliceStruct.proto */
struct __pyx_memoryview_obj;
typedef struct {
  struct __pyx_memoryview_obj *memview;
  char *data;
  Py_ssize_t shape[8];
  Py_ssize_t strides[8];
  Py_ssize_t suboffsets[8];
} __Pyx_memviewslice;
#define __Pyx_MemoryView_Len(m)  (m.shape[0])

/* Atomics.proto */
#include <pythread.h>
#ifndef CYTHON_ATOMICS
    #define CYTHON_ATOMICS 1
#endif
#define __pyx_atomic_int_type int
#if CYTHON_ATOMICS && __GNUC__ >= 4 && (__GNUC_MINOR__ > 1 ||\
                    (__GNUC_MINOR__ == 1 && __GNUC_PATCHLEVEL >= 2)) &&\
                    !defined(__i386__)
    #define __pyx_atomic_incr_aligned(value, lock) __sync_fetch_and_add(value, 1)
    #define __pyx_atomic_decr_aligned(value, lock) __sync_fetch_and_sub(value, 1)
    #ifdef __PYX_DEBUG_ATOMICS
        #warning "Using GNU atomics"
    #endif
#elif CYTHON_ATOMICS && defined(_MSC_VER) && 0
    #include <Windows.h>
    #undef __pyx_atomic_int_type
    #define __pyx_atomic_int_type LONG
    #define __pyx_atomic_incr_aligned(value, lock) InterlockedIncrement(value)
    #define __pyx_atomic_decr_aligned(value, lock) InterlockedDecrement(value)
    #ifdef __PYX_DEBUG_ATOMICS
        #pragma message ("Using MSVC atomics")
    #endif
#elif CYTHON_ATOMICS && (defined(__ICC) || defined(__INTEL_COMPILER)) && 0
    #define __pyx_atomic_incr_aligned(value, lock) _InterlockedIncrement(value)
    #define __pyx_atomic_decr_aligned(value, lock) _InterlockedDecrement(value)
    #ifdef __PYX_DEBUG_ATOMICS
        #warning "Using Intel atomics"
    #endif
#else
    #undef CYTHON_ATOMICS
    #define CYTHON_ATOMICS 0
    #ifdef __PYX_DEBUG_ATOMICS
        #warning "Not using atomics"
    #endif
#endif
typedef volatile __pyx_atomic_int_type __pyx_atomic_int;
#if CYTHON_ATOMICS
    #define __pyx_add_acquisition_count(memview)\
             __pyx_atomic_incr_aligned(__pyx_get_slice_count_pointer(memview), memview->lock)
    #define __pyx_sub_acquisition_count(memview)\
            __pyx_atomic_decr_aligned(__pyx_get_slice_count_pointer(memview), memview->lock)
#else
    #define __pyx_add_acquisition_count(memview)\
            __pyx_add_acquisition_count_locked(__pyx_get_slice_count_pointer(memview), memview->lock)
    #define __pyx_sub_acquisition_count(memview)\
            __pyx_sub_acquisition_count_locked(__pyx_get_slice_count_pointer(memview), memview->lock)
#endif

/* ForceInitThreads.proto */
#ifndef __PYX_FORCE_INIT_THREADS
  #define __PYX_FORCE_INIT_THREADS 0
#endif

/* BufferFormatStructs.proto */
#define IS_UNSIGNED(type) (((type) -1) > 0)
struct __Pyx_StructField_;
#define __PYX_BUF_FLAGS_PACKED_STRUCT (1 << 0)
typedef struct {
  const char* name;
  struct __Pyx_StructField_* fields;
  size_t size;
  size_t arraysize[8];
  int ndim;
  char typegroup;
  char is_unsigned;
  int flags;
} __Pyx_TypeInfo;
typedef struct __Pyx_StructField_ {
  __Pyx_TypeInfo* type;
  const char* name;
  size_t offset;
} __Pyx_StructField;
typedef struct {
  __Pyx_StructField* field;
  size_t parent_offset;
} __Pyx_BufFmt_StackElem;
typedef struct {
  __Pyx_StructField root;
  __Pyx_BufFmt_StackElem* head;
  size_t fmt_offset;
  size_t new_count, enc_count;
  size_t struct_alignment;
  int is_complex;
  char enc_type;
  char new_packmode;
  char enc_packmode;
  char is_valid_array;
} __Pyx_BufFmt_Context;


/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":690
 * # in Cython to enable them only on the right systems.
 * 
 * ctypedef npy_int8       int8_t             # <<<<<<<<<<<<<<
 * ctypedef npy_int16      int16_t
 * ctypedef npy_int32      int32_t
 */
typedef npy_int8 __pyx_t_5numpy_int8_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":691
 * 
 * ctypedef npy_int8       int8_t
 * ctypedef npy_int16      int16_t             # <<<<<<<<<<<<<<
 * ctypedef npy_int32      int32_t
 * ctypedef npy_int64      int64_t
 */
typedef npy_int16 __pyx_t_5numpy_int16_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":692
 * ctypedef npy_int8       int8_t
 * ctypedef npy_int16      int16_t
 * ctypedef npy_int32      int32_t             # <<<<<<<<<<<<<<
 * ctypedef npy_int64      int64_t
 * #ctypedef npy_int96      int96_t
 */
typedef npy_int32 __pyx_t_5numpy_int32_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":693
 * ctypedef npy_int16      int16_t
 * ctypedef npy_int32      int32_t
 * ctypedef npy_int64      int64_t             # <<<<<<<<<<<<<<
 * #ctypedef npy_int96      int96_t
 * #ctypedef npy_int128     int128_t
 */
typedef npy_int64 __pyx_t_5numpy_int64_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":697
 * #ctypedef npy_int128     int128_t
 * 
 * ctypedef npy_uint8      uint8_t             # <<<<<<<<<<<<<<
 * ctypedef npy_uint16     uint16_t
 * ctypedef npy_uint32     uint32_t
 */
typedef npy_uint8 __pyx_t_5numpy_uint8_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":698
 * 
 * ctypedef npy_uint8      uint8_t
 * ctypedef npy_uint16     uint16_t             # <<<<<<<<<<<<<<
 * ctypedef npy_uint32     uint32_t
 * ctypedef npy_uint64     uint64_t
 */
typedef npy_uint16 __pyx_t_5numpy_uint16_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":699
 * ctypedef npy_uint8      uint8_t
 * ctypedef npy_uint16     uint16_t
 * ctypedef npy_uint32     uint32_t             # <<<<<<<<<<<<<<
 * ctypedef npy_uint64     uint64_t
 * #ctypedef npy_uint96     uint96_t
 */
typedef npy_uint32 __pyx_t_5numpy_uint32_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":700
 * ctypedef npy_uint16     uint16_t
 * ctypedef npy_uint32     uint32_t
 * ctypedef npy_uint64     uint64_t             # <<<<<<<<<<<<<<
 * #ctypedef npy_uint96     uint96_t
 * #ctypedef npy_uint128    uint128_t
 */
typedef npy_uint64 __pyx_t_5numpy_uint64_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":704
 * #ctypedef npy_uint128    uint128_t
 * 
 * ctypedef npy_float32    float32_t             # <<<<<<<<<<<<<<
 * ctypedef npy_float64    float64_t
 * #ctypedef npy_float80    float80_t
 */
typedef npy_float32 __pyx_t_5numpy_float32_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":705
 * 
 * ctypedef npy_float32    float32_t
 * ctypedef npy_float64    float64_t             # <<<<<<<<<<<<<<
 * #ctypedef npy_float80    float80_t
 * #ctypedef npy_float128   float128_t
 */
typedef npy_float64 __pyx_t_5numpy_float64_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":714
 * # The int types are mapped a bit surprising --
 * # numpy.int corresponds to 'l' and numpy.long to 'q'
 * ctypedef npy_long       int_t             # <<<<<<<<<<<<<<
 * ctypedef npy_longlong   long_t
 * ctypedef npy_longlong   longlong_t
 */
typedef npy_long __pyx_t_5numpy_int_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":715
 * # numpy.int corresponds to 'l' and numpy.long to 'q'
 * ctypedef npy_long       int_t
 * ctypedef npy_longlong   long_t             # <<<<<<<<<<<<<<
 * ctypedef npy_longlong   longlong_t
 * 
 */
typedef npy_longlong __pyx_t_5numpy_long_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":716
 * ctypedef npy_long       int_t
 * ctypedef npy_longlong   long_t
 * ctypedef npy_longlong   longlong_t             # <<<<<<<<<<<<<<
 * 
 * ctypedef npy_ulong      uint_t
 */
typedef npy_longlong __pyx_t_5numpy_longlong_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":718
 * ctypedef npy_longlong   longlong_t
 * 
 * ctypedef npy_ulong      uint_t             # <<<<<<<<<<<<<<
 * ctypedef npy_ulonglong  ulong_t
 * ctypedef npy_ulonglong  ulonglong_t
 */
typedef npy_ulong __pyx_t_5numpy_uint_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":719
 * 
 * ctypedef npy_ulong      uint_t
 * ctypedef npy_ulonglong  ulong_t             # <<<<<<<<<<<<<<
 * ctypedef npy_ulonglong  ulonglong_t
 * 
 */
typedef npy_ulonglong __pyx_t_5numpy_ulong_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":720
 * ctypedef npy_ulong      uint_t
 * ctypedef npy_ulonglong  ulong_t
 * ctypedef npy_ulonglong  ulonglong_t             # <<<<<<<<<<<<<<
 * 
 * ctypedef npy_intp       intp_t
 */
typedef npy_ulonglong __pyx_t_5numpy_ulonglong_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":722
 * ctypedef npy_ulonglong  ulonglong_t
 * 
 * ctypedef npy_intp       intp_t             # <<<<<<<<<<<<<<
 * ctypedef npy_uintp      uintp_t
 * 
 */
typedef npy_intp __pyx_t_5numpy_intp_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":723
 * 
 * ctypedef npy_intp       intp_t
 * ctypedef npy_uintp      uintp_t             # <<<<<<<<<<<<<<
 * 
 * ctypedef npy_double     float_t
 */
typedef npy_uintp __pyx_t_5numpy_uintp_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":725
 * ctypedef npy_uintp      uintp_t
 * 
 * ctypedef npy_double     float_t             # <<<<<<<<<<<<<<
 * ctypedef npy_double     double_t
 * ctypedef npy_longdouble longdouble_t
 */
typedef npy_double __pyx_t_5numpy_float_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":726
 * 
 * ctypedef npy_double     float_t
 * ctypedef npy_double     double_t             # <<<<<<<<<<<<<<
 * ctypedef npy_longdouble longdouble_t
 * 
 */
typedef npy_double __pyx_t_5numpy_double_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":727
 * ctypedef npy_double     float_t
 * ctypedef npy_double     double_t
 * ctypedef npy_longdouble longdouble_t             # <<<<<<<<<<<<<<
 * 
 * ctypedef npy_cfloat      cfloat_t
 */
typedef npy_longdouble __pyx_t_5numpy_longdouble_t;
/* Declarations.proto */
#if CYTHON_CCOMPLEX
  #ifdef __cplusplus
    typedef ::std::complex< float > __pyx_t_float_complex;
  #else
    typedef float _Complex __pyx_t_float_complex;
  #endif
#else
    typedef struct { float real, imag; } __pyx_t_float_complex;
#endif
static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float, float);

/* Declarations.proto */
#if CYTHON_CCOMPLEX
  #ifdef __cplusplus
    typedef ::std::complex< double > __pyx_t_double_complex;
  #else
    typedef double _Complex __pyx_t_double_complex;
  #endif
#else
    typedef struct { double real, imag; } __pyx_t_double_complex;
#endif
static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double, double);


/*--- Type declarations ---*/
struct __pyx_array_obj;
struct __pyx_MemviewEnum_obj;
struct __pyx_memoryview_obj;
struct __pyx_memoryviewslice_obj;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":729
 * ctypedef npy_longdouble longdouble_t
 * 
 * ctypedef npy_cfloat      cfloat_t             # <<<<<<<<<<<<<<
 * ctypedef npy_cdouble     cdouble_t
 * ctypedef npy_clongdouble clongdouble_t
 */
typedef npy_cfloat __pyx_t_5numpy_cfloat_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":730
 * 
 * ctypedef npy_cfloat      cfloat_t
 * ctypedef npy_cdouble     cdouble_t             # <<<<<<<<<<<<<<
 * ctypedef npy_clongdouble clongdouble_t
 * 
 */
typedef npy_cdouble __pyx_t_5numpy_cdouble_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":731
 * ctypedef npy_cfloat      cfloat_t
 * ctypedef npy_cdouble     cdouble_t
 * ctypedef npy_clongdouble clongdouble_t             # <<<<<<<<<<<<<<
 * 
 * ctypedef npy_cdouble     complex_t
 */
typedef npy_clongdouble __pyx_t_5numpy_clongdouble_t;

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":733
 * ctypedef npy_clongdouble clongdouble_t
 * 
 * ctypedef npy_cdouble     complex_t             # <<<<<<<<<<<<<<
 * 
 * cdef inline object PyArray_MultiIterNew1(a):
 */
typedef npy_cdouble __pyx_t_5numpy_complex_t;
struct __pyx_opt_args_3TTS_3tts_5utils_15monotonic_align_4core_maximum_path_c;

/* "TTS/tts/utils/monotonic_align/core.pyx":42
 * @cython.boundscheck(False)
 * @cython.wraparound(False)
 * cpdef void maximum_path_c(int[:,:,::1] paths, float[:,:,::1] values, int[::1] t_xs, int[::1] t_ys, float max_neg_val=-1e9) nogil:             # <<<<<<<<<<<<<<
 *   cdef int b = values.shape[0]
 * 
 */
struct __pyx_opt_args_3TTS_3tts_5utils_15monotonic_align_4core_maximum_path_c {
  int __pyx_n;
  float max_neg_val;
};

/* "View.MemoryView":105
 * 
 * @cname("__pyx_array")
 * cdef class array:             # <<<<<<<<<<<<<<
 * 
 *     cdef:
 */
struct __pyx_array_obj {
  PyObject_HEAD
  struct __pyx_vtabstruct_array *__pyx_vtab;
  char *data;
  Py_ssize_t len;
  char *format;
  int ndim;
  Py_ssize_t *_shape;
  Py_ssize_t *_strides;
  Py_ssize_t itemsize;
  PyObject *mode;
  PyObject *_format;
  void (*callback_free_data)(void *);
  int free_data;
  int dtype_is_object;
};


/* "View.MemoryView":279
 * 
 * @cname('__pyx_MemviewEnum')
 * cdef class Enum(object):             # <<<<<<<<<<<<<<
 *     cdef object name
 *     def __init__(self, name):
 */
struct __pyx_MemviewEnum_obj {
  PyObject_HEAD
  PyObject *name;
};


/* "View.MemoryView":330
 * 
 * @cname('__pyx_memoryview')
 * cdef class memoryview(object):             # <<<<<<<<<<<<<<
 * 
 *     cdef object obj
 */
struct __pyx_memoryview_obj {
  PyObject_HEAD
  struct __pyx_vtabstruct_memoryview *__pyx_vtab;
  PyObject *obj;
  PyObject *_size;
  PyObject *_array_interface;
  PyThread_type_lock lock;
  __pyx_atomic_int acquisition_count[2];
  __pyx_atomic_int *acquisition_count_aligned_p;
  Py_buffer view;
  int flags;
  int dtype_is_object;
  __Pyx_TypeInfo *typeinfo;
};


/* "View.MemoryView":965
 * 
 * @cname('__pyx_memoryviewslice')
 * cdef class _memoryviewslice(memoryview):             # <<<<<<<<<<<<<<
 *     "Internal class for passing memoryview slices to Python"
 * 
 */
struct __pyx_memoryviewslice_obj {
  struct __pyx_memoryview_obj __pyx_base;
  __Pyx_memviewslice from_slice;
  PyObject *from_object;
  PyObject *(*to_object_func)(char *);
  int (*to_dtype_func)(char *, PyObject *);
};


/* "View.MemoryView":105
 * 
 * @cname("__pyx_array")
 * cdef class array:             # <<<<<<<<<<<<<<
 * 
 *     cdef:
 */

struct __pyx_vtabstruct_array {
  PyObject *(*get_memview)(struct __pyx_array_obj *);
};
static struct __pyx_vtabstruct_array *__pyx_vtabptr_array;


/* "View.MemoryView":330
 * 
 * @cname('__pyx_memoryview')
 * cdef class memoryview(object):             # <<<<<<<<<<<<<<
 * 
 *     cdef object obj
 */

struct __pyx_vtabstruct_memoryview {
  char *(*get_item_pointer)(struct __pyx_memoryview_obj *, PyObject *);
  PyObject *(*is_slice)(struct __pyx_memoryview_obj *, PyObject *);
  PyObject *(*setitem_slice_assignment)(struct __pyx_memoryview_obj *, PyObject *, PyObject *);
  PyObject *(*setitem_slice_assign_scalar)(struct __pyx_memoryview_obj *, struct __pyx_memoryview_obj *, PyObject *);
  PyObject *(*setitem_indexed)(struct __pyx_memoryview_obj *, PyObject *, PyObject *);
  PyObject *(*convert_item_to_object)(struct __pyx_memoryview_obj *, char *);
  PyObject *(*assign_item_from_object)(struct __pyx_memoryview_obj *, char *, PyObject *);
};
static struct __pyx_vtabstruct_memoryview *__pyx_vtabptr_memoryview;


/* "View.MemoryView":965
 * 
 * @cname('__pyx_memoryviewslice')
 * cdef class _memoryviewslice(memoryview):             # <<<<<<<<<<<<<<
 *     "Internal class for passing memoryview slices to Python"
 * 
 */

struct __pyx_vtabstruct__memoryviewslice {
  struct __pyx_vtabstruct_memoryview __pyx_base;
};
static struct __pyx_vtabstruct__memoryviewslice *__pyx_vtabptr__memoryviewslice;

/* --- Runtime support code (head) --- */
/* Refnanny.proto */
#ifndef CYTHON_REFNANNY
  #define CYTHON_REFNANNY 0
#endif
#if CYTHON_REFNANNY
  typedef struct {
    void (*INCREF)(void*, PyObject*, int);
    void (*DECREF)(void*, PyObject*, int);
    void (*GOTREF)(void*, PyObject*, int);
    void (*GIVEREF)(void*, PyObject*, int);
    void* (*SetupContext)(const char*, int, const char*);
    void (*FinishContext)(void**);
  } __Pyx_RefNannyAPIStruct;
  static __Pyx_RefNannyAPIStruct *__Pyx_RefNanny = NULL;
  static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname);
  #define __Pyx_RefNannyDeclarations void *__pyx_refnanny = NULL;
#ifdef WITH_THREAD
  #define __Pyx_RefNannySetupContext(name, acquire_gil)\
          if (acquire_gil) {\
              PyGILState_STATE __pyx_gilstate_save = PyGILState_Ensure();\
              __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__);\
              PyGILState_Release(__pyx_gilstate_save);\
          } else {\
              __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__);\
          }
#else
  #define __Pyx_RefNannySetupContext(name, acquire_gil)\
          __pyx_refnanny = __Pyx_RefNanny->SetupContext((name), __LINE__, __FILE__)
#endif
  #define __Pyx_RefNannyFinishContext()\
          __Pyx_RefNanny->FinishContext(&__pyx_refnanny)
  #define __Pyx_INCREF(r)  __Pyx_RefNanny->INCREF(__pyx_refnanny, (PyObject *)(r), __LINE__)
  #define __Pyx_DECREF(r)  __Pyx_RefNanny->DECREF(__pyx_refnanny, (PyObject *)(r), __LINE__)
  #define __Pyx_GOTREF(r)  __Pyx_RefNanny->GOTREF(__pyx_refnanny, (PyObject *)(r), __LINE__)
  #define __Pyx_GIVEREF(r) __Pyx_RefNanny->GIVEREF(__pyx_refnanny, (PyObject *)(r), __LINE__)
  #define __Pyx_XINCREF(r)  do { if((r) != NULL) {__Pyx_INCREF(r); }} while(0)
  #define __Pyx_XDECREF(r)  do { if((r) != NULL) {__Pyx_DECREF(r); }} while(0)
  #define __Pyx_XGOTREF(r)  do { if((r) != NULL) {__Pyx_GOTREF(r); }} while(0)
  #define __Pyx_XGIVEREF(r) do { if((r) != NULL) {__Pyx_GIVEREF(r);}} while(0)
#else
  #define __Pyx_RefNannyDeclarations
  #define __Pyx_RefNannySetupContext(name, acquire_gil)
  #define __Pyx_RefNannyFinishContext()
  #define __Pyx_INCREF(r) Py_INCREF(r)
  #define __Pyx_DECREF(r) Py_DECREF(r)
  #define __Pyx_GOTREF(r)
  #define __Pyx_GIVEREF(r)
  #define __Pyx_XINCREF(r) Py_XINCREF(r)
  #define __Pyx_XDECREF(r) Py_XDECREF(r)
  #define __Pyx_XGOTREF(r)
  #define __Pyx_XGIVEREF(r)
#endif
#define __Pyx_XDECREF_SET(r, v) do {\
        PyObject *tmp = (PyObject *) r;\
        r = v; __Pyx_XDECREF(tmp);\
    } while (0)
#define __Pyx_DECREF_SET(r, v) do {\
        PyObject *tmp = (PyObject *) r;\
        r = v; __Pyx_DECREF(tmp);\
    } while (0)
#define __Pyx_CLEAR(r)    do { PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);} while(0)
#define __Pyx_XCLEAR(r)   do { if((r) != NULL) {PyObject* tmp = ((PyObject*)(r)); r = NULL; __Pyx_DECREF(tmp);}} while(0)

/* PyObjectGetAttrStr.proto */
#if CYTHON_USE_TYPE_SLOTS
static CYTHON_INLINE PyObject* __Pyx_PyObject_GetAttrStr(PyObject* obj, PyObject* attr_name);
#else
#define __Pyx_PyObject_GetAttrStr(o,n) PyObject_GetAttr(o,n)
#endif

/* GetBuiltinName.proto */
static PyObject *__Pyx_GetBuiltinName(PyObject *name);

/* MemviewSliceInit.proto */
#define __Pyx_BUF_MAX_NDIMS %(BUF_MAX_NDIMS)d
#define __Pyx_MEMVIEW_DIRECT   1
#define __Pyx_MEMVIEW_PTR      2
#define __Pyx_MEMVIEW_FULL     4
#define __Pyx_MEMVIEW_CONTIG   8
#define __Pyx_MEMVIEW_STRIDED  16
#define __Pyx_MEMVIEW_FOLLOW   32
#define __Pyx_IS_C_CONTIG 1
#define __Pyx_IS_F_CONTIG 2
static int __Pyx_init_memviewslice(
                struct __pyx_memoryview_obj *memview,
                int ndim,
                __Pyx_memviewslice *memviewslice,
                int memview_is_new_reference);
static CYTHON_INLINE int __pyx_add_acquisition_count_locked(
    __pyx_atomic_int *acquisition_count, PyThread_type_lock lock);
static CYTHON_INLINE int __pyx_sub_acquisition_count_locked(
    __pyx_atomic_int *acquisition_count, PyThread_type_lock lock);
#define __pyx_get_slice_count_pointer(memview) (memview->acquisition_count_aligned_p)
#define __pyx_get_slice_count(memview) (*__pyx_get_slice_count_pointer(memview))
#define __PYX_INC_MEMVIEW(slice, have_gil) __Pyx_INC_MEMVIEW(slice, have_gil, __LINE__)
#define __PYX_XDEC_MEMVIEW(slice, have_gil) __Pyx_XDEC_MEMVIEW(slice, have_gil, __LINE__)
static CYTHON_INLINE void __Pyx_INC_MEMVIEW(__Pyx_memviewslice *, int, int);
static CYTHON_INLINE void __Pyx_XDEC_MEMVIEW(__Pyx_memviewslice *, int, int);

/* RaiseArgTupleInvalid.proto */
static void __Pyx_RaiseArgtupleInvalid(const char* func_name, int exact,
    Py_ssize_t num_min, Py_ssize_t num_max, Py_ssize_t num_found);

/* RaiseDoubleKeywords.proto */
static void __Pyx_RaiseDoubleKeywordsError(const char* func_name, PyObject* kw_name);

/* ParseKeywords.proto */
static int __Pyx_ParseOptionalKeywords(PyObject *kwds, PyObject **argnames[],\
    PyObject *kwds2, PyObject *values[], Py_ssize_t num_pos_args,\
    const char* function_name);

/* None.proto */
static CYTHON_INLINE void __Pyx_RaiseUnboundLocalError(const char *varname);

/* GetTopmostException.proto */
#if CYTHON_USE_EXC_INFO_STACK
static _PyErr_StackItem * __Pyx_PyErr_GetTopmostException(PyThreadState *tstate);
#endif

/* PyThreadStateGet.proto */
#if CYTHON_FAST_THREAD_STATE
#define __Pyx_PyThreadState_declare  PyThreadState *__pyx_tstate;
#define __Pyx_PyThreadState_assign  __pyx_tstate = __Pyx_PyThreadState_Current;
#define __Pyx_PyErr_Occurred()  __pyx_tstate->curexc_type
#else
#define __Pyx_PyThreadState_declare
#define __Pyx_PyThreadState_assign
#define __Pyx_PyErr_Occurred()  PyErr_Occurred()
#endif

/* SaveResetException.proto */
#if CYTHON_FAST_THREAD_STATE
#define __Pyx_ExceptionSave(type, value, tb)  __Pyx__ExceptionSave(__pyx_tstate, type, value, tb)
static CYTHON_INLINE void __Pyx__ExceptionSave(PyThreadState *tstate, PyObject **type, PyObject **value, PyObject **tb);
#define __Pyx_ExceptionReset(type, value, tb)  __Pyx__ExceptionReset(__pyx_tstate, type, value, tb)
static CYTHON_INLINE void __Pyx__ExceptionReset(PyThreadState *tstate, PyObject *type, PyObject *value, PyObject *tb);
#else
#define __Pyx_ExceptionSave(type, value, tb)   PyErr_GetExcInfo(type, value, tb)
#define __Pyx_ExceptionReset(type, value, tb)  PyErr_SetExcInfo(type, value, tb)
#endif

/* PyErrExceptionMatches.proto */
#if CYTHON_FAST_THREAD_STATE
#define __Pyx_PyErr_ExceptionMatches(err) __Pyx_PyErr_ExceptionMatchesInState(__pyx_tstate, err)
static CYTHON_INLINE int __Pyx_PyErr_ExceptionMatchesInState(PyThreadState* tstate, PyObject* err);
#else
#define __Pyx_PyErr_ExceptionMatches(err)  PyErr_ExceptionMatches(err)
#endif

/* GetException.proto */
#if CYTHON_FAST_THREAD_STATE
#define __Pyx_GetException(type, value, tb)  __Pyx__GetException(__pyx_tstate, type, value, tb)
static int __Pyx__GetException(PyThreadState *tstate, PyObject **type, PyObject **value, PyObject **tb);
#else
static int __Pyx_GetException(PyObject **type, PyObject **value, PyObject **tb);
#endif

/* PyObjectCall.proto */
#if CYTHON_COMPILING_IN_CPYTHON
static CYTHON_INLINE PyObject* __Pyx_PyObject_Call(PyObject *func, PyObject *arg, PyObject *kw);
#else
#define __Pyx_PyObject_Call(func, arg, kw) PyObject_Call(func, arg, kw)
#endif

/* PyErrFetchRestore.proto */
#if CYTHON_FAST_THREAD_STATE
#define __Pyx_PyErr_Clear() __Pyx_ErrRestore(NULL, NULL, NULL)
#define __Pyx_ErrRestoreWithState(type, value, tb)  __Pyx_ErrRestoreInState(PyThreadState_GET(), type, value, tb)
#define __Pyx_ErrFetchWithState(type, value, tb)    __Pyx_ErrFetchInState(PyThreadState_GET(), type, value, tb)
#define __Pyx_ErrRestore(type, value, tb)  __Pyx_ErrRestoreInState(__pyx_tstate, type, value, tb)
#define __Pyx_ErrFetch(type, value, tb)    __Pyx_ErrFetchInState(__pyx_tstate, type, value, tb)
static CYTHON_INLINE void __Pyx_ErrRestoreInState(PyThreadState *tstate, PyObject *type, PyObject *value, PyObject *tb);
static CYTHON_INLINE void __Pyx_ErrFetchInState(PyThreadState *tstate, PyObject **type, PyObject **value, PyObject **tb);
#if CYTHON_COMPILING_IN_CPYTHON
#define __Pyx_PyErr_SetNone(exc) (Py_INCREF(exc), __Pyx_ErrRestore((exc), NULL, NULL))
#else
#define __Pyx_PyErr_SetNone(exc) PyErr_SetNone(exc)
#endif
#else
#define __Pyx_PyErr_Clear() PyErr_Clear()
#define __Pyx_PyErr_SetNone(exc) PyErr_SetNone(exc)
#define __Pyx_ErrRestoreWithState(type, value, tb)  PyErr_Restore(type, value, tb)
#define __Pyx_ErrFetchWithState(type, value, tb)  PyErr_Fetch(type, value, tb)
#define __Pyx_ErrRestoreInState(tstate, type, value, tb)  PyErr_Restore(type, value, tb)
#define __Pyx_ErrFetchInState(tstate, type, value, tb)  PyErr_Fetch(type, value, tb)
#define __Pyx_ErrRestore(type, value, tb)  PyErr_Restore(type, value, tb)
#define __Pyx_ErrFetch(type, value, tb)  PyErr_Fetch(type, value, tb)
#endif

/* RaiseException.proto */
static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause);

/* ArgTypeTest.proto */
#define __Pyx_ArgTypeTest(obj, type, none_allowed, name, exact)\
    ((likely((Py_TYPE(obj) == type) | (none_allowed && (obj == Py_None)))) ? 1 :\
        __Pyx__ArgTypeTest(obj, type, name, exact))
static int __Pyx__ArgTypeTest(PyObject *obj, PyTypeObject *type, const char *name, int exact);

/* PyCFunctionFastCall.proto */
#if CYTHON_FAST_PYCCALL
static CYTHON_INLINE PyObject *__Pyx_PyCFunction_FastCall(PyObject *func, PyObject **args, Py_ssize_t nargs);
#else
#define __Pyx_PyCFunction_FastCall(func, args, nargs)  (assert(0), NULL)
#endif

/* PyFunctionFastCall.proto */
#if CYTHON_FAST_PYCALL
#define __Pyx_PyFunction_FastCall(func, args, nargs)\
    __Pyx_PyFunction_FastCallDict((func), (args), (nargs), NULL)
#if 1 || PY_VERSION_HEX < 0x030600B1
static PyObject *__Pyx_PyFunction_FastCallDict(PyObject *func, PyObject **args, Py_ssize_t nargs, PyObject *kwargs);
#else
#define __Pyx_PyFunction_FastCallDict(func, args, nargs, kwargs) _PyFunction_FastCallDict(func, args, nargs, kwargs)
#endif
#define __Pyx_BUILD_ASSERT_EXPR(cond)\
    (sizeof(char [1 - 2*!(cond)]) - 1)
#ifndef Py_MEMBER_SIZE
#define Py_MEMBER_SIZE(type, member) sizeof(((type *)0)->member)
#endif
#if CYTHON_FAST_PYCALL
  static size_t __pyx_pyframe_localsplus_offset = 0;
  #include "frameobject.h"
  #define __Pxy_PyFrame_Initialize_Offsets()\
    ((void)__Pyx_BUILD_ASSERT_EXPR(sizeof(PyFrameObject) == offsetof(PyFrameObject, f_localsplus) + Py_MEMBER_SIZE(PyFrameObject, f_localsplus)),\
     (void)(__pyx_pyframe_localsplus_offset = ((size_t)PyFrame_Type.tp_basicsize) - Py_MEMBER_SIZE(PyFrameObject, f_localsplus)))
  #define __Pyx_PyFrame_GetLocalsplus(frame)\
    (assert(__pyx_pyframe_localsplus_offset), (PyObject **)(((char *)(frame)) + __pyx_pyframe_localsplus_offset))
#endif // CYTHON_FAST_PYCALL
#endif

/* PyObjectCall2Args.proto */
static CYTHON_UNUSED PyObject* __Pyx_PyObject_Call2Args(PyObject* function, PyObject* arg1, PyObject* arg2);

/* PyObjectCallMethO.proto */
#if CYTHON_COMPILING_IN_CPYTHON
static CYTHON_INLINE PyObject* __Pyx_PyObject_CallMethO(PyObject *func, PyObject *arg);
#endif

/* PyObjectCallOneArg.proto */
static CYTHON_INLINE PyObject* __Pyx_PyObject_CallOneArg(PyObject *func, PyObject *arg);

/* IncludeStringH.proto */
#include <string.h>

/* BytesEquals.proto */
static CYTHON_INLINE int __Pyx_PyBytes_Equals(PyObject* s1, PyObject* s2, int equals);

/* UnicodeEquals.proto */
static CYTHON_INLINE int __Pyx_PyUnicode_Equals(PyObject* s1, PyObject* s2, int equals);

/* StrEquals.proto */
#if PY_MAJOR_VERSION >= 3
#define __Pyx_PyString_Equals __Pyx_PyUnicode_Equals
#else
#define __Pyx_PyString_Equals __Pyx_PyBytes_Equals
#endif

/* DivInt[Py_ssize_t].proto */
static CYTHON_INLINE Py_ssize_t __Pyx_div_Py_ssize_t(Py_ssize_t, Py_ssize_t);

/* UnaryNegOverflows.proto */
#define UNARY_NEG_WOULD_OVERFLOW(x)\
        (((x) < 0) & ((unsigned long)(x) == 0-(unsigned long)(x)))

static CYTHON_UNUSED int __pyx_array_getbuffer(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /*proto*/
static PyObject *__pyx_array_get_memview(struct __pyx_array_obj *); /*proto*/
/* GetAttr.proto */
static CYTHON_INLINE PyObject *__Pyx_GetAttr(PyObject *, PyObject *);

/* GetItemInt.proto */
#define __Pyx_GetItemInt(o, i, type, is_signed, to_py_func, is_list, wraparound, boundscheck)\
    (__Pyx_fits_Py_ssize_t(i, type, is_signed) ?\
    __Pyx_GetItemInt_Fast(o, (Py_ssize_t)i, is_list, wraparound, boundscheck) :\
    (is_list ? (PyErr_SetString(PyExc_IndexError, "list index out of range"), (PyObject*)NULL) :\
               __Pyx_GetItemInt_Generic(o, to_py_func(i))))
#define __Pyx_GetItemInt_List(o, i, type, is_signed, to_py_func, is_list, wraparound, boundscheck)\
    (__Pyx_fits_Py_ssize_t(i, type, is_signed) ?\
    __Pyx_GetItemInt_List_Fast(o, (Py_ssize_t)i, wraparound, boundscheck) :\
    (PyErr_SetString(PyExc_IndexError, "list index out of range"), (PyObject*)NULL))
static CYTHON_INLINE PyObject *__Pyx_GetItemInt_List_Fast(PyObject *o, Py_ssize_t i,
                                                              int wraparound, int boundscheck);
#define __Pyx_GetItemInt_Tuple(o, i, type, is_signed, to_py_func, is_list, wraparound, boundscheck)\
    (__Pyx_fits_Py_ssize_t(i, type, is_signed) ?\
    __Pyx_GetItemInt_Tuple_Fast(o, (Py_ssize_t)i, wraparound, boundscheck) :\
    (PyErr_SetString(PyExc_IndexError, "tuple index out of range"), (PyObject*)NULL))
static CYTHON_INLINE PyObject *__Pyx_GetItemInt_Tuple_Fast(PyObject *o, Py_ssize_t i,
                                                              int wraparound, int boundscheck);
static PyObject *__Pyx_GetItemInt_Generic(PyObject *o, PyObject* j);
static CYTHON_INLINE PyObject *__Pyx_GetItemInt_Fast(PyObject *o, Py_ssize_t i,
                                                     int is_list, int wraparound, int boundscheck);

/* ObjectGetItem.proto */
#if CYTHON_USE_TYPE_SLOTS
static CYTHON_INLINE PyObject *__Pyx_PyObject_GetItem(PyObject *obj, PyObject* key);
#else
#define __Pyx_PyObject_GetItem(obj, key)  PyObject_GetItem(obj, key)
#endif

/* decode_c_string_utf16.proto */
static CYTHON_INLINE PyObject *__Pyx_PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors) {
    int byteorder = 0;
    return PyUnicode_DecodeUTF16(s, size, errors, &byteorder);
}
static CYTHON_INLINE PyObject *__Pyx_PyUnicode_DecodeUTF16LE(const char *s, Py_ssize_t size, const char *errors) {
    int byteorder = -1;
    return PyUnicode_DecodeUTF16(s, size, errors, &byteorder);
}
static CYTHON_INLINE PyObject *__Pyx_PyUnicode_DecodeUTF16BE(const char *s, Py_ssize_t size, const char *errors) {
    int byteorder = 1;
    return PyUnicode_DecodeUTF16(s, size, errors, &byteorder);
}

/* decode_c_string.proto */
static CYTHON_INLINE PyObject* __Pyx_decode_c_string(
         const char* cstring, Py_ssize_t start, Py_ssize_t stop,
         const char* encoding, const char* errors,
         PyObject* (*decode_func)(const char *s, Py_ssize_t size, const char *errors));

/* GetAttr3.proto */
static CYTHON_INLINE PyObject *__Pyx_GetAttr3(PyObject *, PyObject *, PyObject *);

/* PyDictVersioning.proto */
#if CYTHON_USE_DICT_VERSIONS && CYTHON_USE_TYPE_SLOTS
#define __PYX_DICT_VERSION_INIT  ((PY_UINT64_T) -1)
#define __PYX_GET_DICT_VERSION(dict)  (((PyDictObject*)(dict))->ma_version_tag)
#define __PYX_UPDATE_DICT_CACHE(dict, value, cache_var, version_var)\
    (version_var) = __PYX_GET_DICT_VERSION(dict);\
    (cache_var) = (value);
#define __PYX_PY_DICT_LOOKUP_IF_MODIFIED(VAR, DICT, LOOKUP) {\
    static PY_UINT64_T __pyx_dict_version = 0;\
    static PyObject *__pyx_dict_cached_value = NULL;\
    if (likely(__PYX_GET_DICT_VERSION(DICT) == __pyx_dict_version)) {\
        (VAR) = __pyx_dict_cached_value;\
    } else {\
        (VAR) = __pyx_dict_cached_value = (LOOKUP);\
        __pyx_dict_version = __PYX_GET_DICT_VERSION(DICT);\
    }\
}
static CYTHON_INLINE PY_UINT64_T __Pyx_get_tp_dict_version(PyObject *obj);
static CYTHON_INLINE PY_UINT64_T __Pyx_get_object_dict_version(PyObject *obj);
static CYTHON_INLINE int __Pyx_object_dict_version_matches(PyObject* obj, PY_UINT64_T tp_dict_version, PY_UINT64_T obj_dict_version);
#else
#define __PYX_GET_DICT_VERSION(dict)  (0)
#define __PYX_UPDATE_DICT_CACHE(dict, value, cache_var, version_var)
#define __PYX_PY_DICT_LOOKUP_IF_MODIFIED(VAR, DICT, LOOKUP)  (VAR) = (LOOKUP);
#endif

/* GetModuleGlobalName.proto */
#if CYTHON_USE_DICT_VERSIONS
#define __Pyx_GetModuleGlobalName(var, name)  {\
    static PY_UINT64_T __pyx_dict_version = 0;\
    static PyObject *__pyx_dict_cached_value = NULL;\
    (var) = (likely(__pyx_dict_version == __PYX_GET_DICT_VERSION(__pyx_d))) ?\
        (likely(__pyx_dict_cached_value) ? __Pyx_NewRef(__pyx_dict_cached_value) : __Pyx_GetBuiltinName(name)) :\
        __Pyx__GetModuleGlobalName(name, &__pyx_dict_version, &__pyx_dict_cached_value);\
}
#define __Pyx_GetModuleGlobalNameUncached(var, name)  {\
    PY_UINT64_T __pyx_dict_version;\
    PyObject *__pyx_dict_cached_value;\
    (var) = __Pyx__GetModuleGlobalName(name, &__pyx_dict_version, &__pyx_dict_cached_value);\
}
static PyObject *__Pyx__GetModuleGlobalName(PyObject *name, PY_UINT64_T *dict_version, PyObject **dict_cached_value);
#else
#define __Pyx_GetModuleGlobalName(var, name)  (var) = __Pyx__GetModuleGlobalName(name)
#define __Pyx_GetModuleGlobalNameUncached(var, name)  (var) = __Pyx__GetModuleGlobalName(name)
static CYTHON_INLINE PyObject *__Pyx__GetModuleGlobalName(PyObject *name);
#endif

/* RaiseTooManyValuesToUnpack.proto */
static CYTHON_INLINE void __Pyx_RaiseTooManyValuesError(Py_ssize_t expected);

/* RaiseNeedMoreValuesToUnpack.proto */
static CYTHON_INLINE void __Pyx_RaiseNeedMoreValuesError(Py_ssize_t index);

/* RaiseNoneIterError.proto */
static CYTHON_INLINE void __Pyx_RaiseNoneNotIterableError(void);

/* ExtTypeTest.proto */
static CYTHON_INLINE int __Pyx_TypeTest(PyObject *obj, PyTypeObject *type);

/* SwapException.proto */
#if CYTHON_FAST_THREAD_STATE
#define __Pyx_ExceptionSwap(type, value, tb)  __Pyx__ExceptionSwap(__pyx_tstate, type, value, tb)
static CYTHON_INLINE void __Pyx__ExceptionSwap(PyThreadState *tstate, PyObject **type, PyObject **value, PyObject **tb);
#else
static CYTHON_INLINE void __Pyx_ExceptionSwap(PyObject **type, PyObject **value, PyObject **tb);
#endif

/* Import.proto */
static PyObject *__Pyx_Import(PyObject *name, PyObject *from_list, int level);

/* FastTypeChecks.proto */
#if CYTHON_COMPILING_IN_CPYTHON
#define __Pyx_TypeCheck(obj, type) __Pyx_IsSubtype(Py_TYPE(obj), (PyTypeObject *)type)
static CYTHON_INLINE int __Pyx_IsSubtype(PyTypeObject *a, PyTypeObject *b);
static CYTHON_INLINE int __Pyx_PyErr_GivenExceptionMatches(PyObject *err, PyObject *type);
static CYTHON_INLINE int __Pyx_PyErr_GivenExceptionMatches2(PyObject *err, PyObject *type1, PyObject *type2);
#else
#define __Pyx_TypeCheck(obj, type) PyObject_TypeCheck(obj, (PyTypeObject *)type)
#define __Pyx_PyErr_GivenExceptionMatches(err, type) PyErr_GivenExceptionMatches(err, type)
#define __Pyx_PyErr_GivenExceptionMatches2(err, type1, type2) (PyErr_GivenExceptionMatches(err, type1) || PyErr_GivenExceptionMatches(err, type2))
#endif
#define __Pyx_PyException_Check(obj) __Pyx_TypeCheck(obj, PyExc_Exception)

static CYTHON_UNUSED int __pyx_memoryview_getbuffer(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /*proto*/
/* ListCompAppend.proto */
#if CYTHON_USE_PYLIST_INTERNALS && CYTHON_ASSUME_SAFE_MACROS
static CYTHON_INLINE int __Pyx_ListComp_Append(PyObject* list, PyObject* x) {
    PyListObject* L = (PyListObject*) list;
    Py_ssize_t len = Py_SIZE(list);
    if (likely(L->allocated > len)) {
        Py_INCREF(x);
        PyList_SET_ITEM(list, len, x);
        __Pyx_SET_SIZE(list, len + 1);
        return 0;
    }
    return PyList_Append(list, x);
}
#else
#define __Pyx_ListComp_Append(L,x) PyList_Append(L,x)
#endif

/* PyIntBinop.proto */
#if !CYTHON_COMPILING_IN_PYPY
static PyObject* __Pyx_PyInt_AddObjC(PyObject *op1, PyObject *op2, long intval, int inplace, int zerodivision_check);
#else
#define __Pyx_PyInt_AddObjC(op1, op2, intval, inplace, zerodivision_check)\
    (inplace ? PyNumber_InPlaceAdd(op1, op2) : PyNumber_Add(op1, op2))
#endif

/* ListExtend.proto */
static CYTHON_INLINE int __Pyx_PyList_Extend(PyObject* L, PyObject* v) {
#if CYTHON_COMPILING_IN_CPYTHON
    PyObject* none = _PyList_Extend((PyListObject*)L, v);
    if (unlikely(!none))
        return -1;
    Py_DECREF(none);
    return 0;
#else
    return PyList_SetSlice(L, PY_SSIZE_T_MAX, PY_SSIZE_T_MAX, v);
#endif
}

/* ListAppend.proto */
#if CYTHON_USE_PYLIST_INTERNALS && CYTHON_ASSUME_SAFE_MACROS
static CYTHON_INLINE int __Pyx_PyList_Append(PyObject* list, PyObject* x) {
    PyListObject* L = (PyListObject*) list;
    Py_ssize_t len = Py_SIZE(list);
    if (likely(L->allocated > len) & likely(len > (L->allocated >> 1))) {
        Py_INCREF(x);
        PyList_SET_ITEM(list, len, x);
        __Pyx_SET_SIZE(list, len + 1);
        return 0;
    }
    return PyList_Append(list, x);
}
#else
#define __Pyx_PyList_Append(L,x) PyList_Append(L,x)
#endif

/* DivInt[long].proto */
static CYTHON_INLINE long __Pyx_div_long(long, long);

/* ImportFrom.proto */
static PyObject* __Pyx_ImportFrom(PyObject* module, PyObject* name);

/* HasAttr.proto */
static CYTHON_INLINE int __Pyx_HasAttr(PyObject *, PyObject *);

/* PyObject_GenericGetAttrNoDict.proto */
#if CYTHON_USE_TYPE_SLOTS && CYTHON_USE_PYTYPE_LOOKUP && PY_VERSION_HEX < 0x03070000
static CYTHON_INLINE PyObject* __Pyx_PyObject_GenericGetAttrNoDict(PyObject* obj, PyObject* attr_name);
#else
#define __Pyx_PyObject_GenericGetAttrNoDict PyObject_GenericGetAttr
#endif

/* PyObject_GenericGetAttr.proto */
#if CYTHON_USE_TYPE_SLOTS && CYTHON_USE_PYTYPE_LOOKUP && PY_VERSION_HEX < 0x03070000
static PyObject* __Pyx_PyObject_GenericGetAttr(PyObject* obj, PyObject* attr_name);
#else
#define __Pyx_PyObject_GenericGetAttr PyObject_GenericGetAttr
#endif

/* SetVTable.proto */
static int __Pyx_SetVtable(PyObject *dict, void *vtable);

/* PyObjectGetAttrStrNoError.proto */
static CYTHON_INLINE PyObject* __Pyx_PyObject_GetAttrStrNoError(PyObject* obj, PyObject* attr_name);

/* SetupReduce.proto */
static int __Pyx_setup_reduce(PyObject* type_obj);

/* TypeImport.proto */
#ifndef __PYX_HAVE_RT_ImportType_proto
#define __PYX_HAVE_RT_ImportType_proto
enum __Pyx_ImportType_CheckSize {
   __Pyx_ImportType_CheckSize_Error = 0,
   __Pyx_ImportType_CheckSize_Warn = 1,
   __Pyx_ImportType_CheckSize_Ignore = 2
};
static PyTypeObject *__Pyx_ImportType(PyObject* module, const char *module_name, const char *class_name, size_t size, enum __Pyx_ImportType_CheckSize check_size);
#endif

/* CLineInTraceback.proto */
#ifdef CYTHON_CLINE_IN_TRACEBACK
#define __Pyx_CLineForTraceback(tstate, c_line)  (((CYTHON_CLINE_IN_TRACEBACK)) ? c_line : 0)
#else
static int __Pyx_CLineForTraceback(PyThreadState *tstate, int c_line);
#endif

/* CodeObjectCache.proto */
typedef struct {
    PyCodeObject* code_object;
    int code_line;
} __Pyx_CodeObjectCacheEntry;
struct __Pyx_CodeObjectCache {
    int count;
    int max_count;
    __Pyx_CodeObjectCacheEntry* entries;
};
static struct __Pyx_CodeObjectCache __pyx_code_cache = {0,0,NULL};
static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line);
static PyCodeObject *__pyx_find_code_object(int code_line);
static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object);

/* AddTraceback.proto */
static void __Pyx_AddTraceback(const char *funcname, int c_line,
                               int py_line, const char *filename);

#if PY_MAJOR_VERSION < 3
    static int __Pyx_GetBuffer(PyObject *obj, Py_buffer *view, int flags);
    static void __Pyx_ReleaseBuffer(Py_buffer *view);
#else
    #define __Pyx_GetBuffer PyObject_GetBuffer
    #define __Pyx_ReleaseBuffer PyBuffer_Release
#endif


/* BufferStructDeclare.proto */
typedef struct {
  Py_ssize_t shape, strides, suboffsets;
} __Pyx_Buf_DimInfo;
typedef struct {
  size_t refcount;
  Py_buffer pybuffer;
} __Pyx_Buffer;
typedef struct {
  __Pyx_Buffer *rcbuffer;
  char *data;
  __Pyx_Buf_DimInfo diminfo[8];
} __Pyx_LocalBuf_ND;

/* MemviewSliceIsContig.proto */
static int __pyx_memviewslice_is_contig(const __Pyx_memviewslice mvs, char order, int ndim);

/* OverlappingSlices.proto */
static int __pyx_slices_overlap(__Pyx_memviewslice *slice1,
                                __Pyx_memviewslice *slice2,
                                int ndim, size_t itemsize);

/* Capsule.proto */
static CYTHON_INLINE PyObject *__pyx_capsule_create(void *p, const char *sig);

/* IsLittleEndian.proto */
static CYTHON_INLINE int __Pyx_Is_Little_Endian(void);

/* BufferFormatCheck.proto */
static const char* __Pyx_BufFmt_CheckString(__Pyx_BufFmt_Context* ctx, const char* ts);
static void __Pyx_BufFmt_Init(__Pyx_BufFmt_Context* ctx,
                              __Pyx_BufFmt_StackElem* stack,
                              __Pyx_TypeInfo* type);

/* TypeInfoCompare.proto */
static int __pyx_typeinfo_cmp(__Pyx_TypeInfo *a, __Pyx_TypeInfo *b);

/* MemviewSliceValidateAndInit.proto */
static int __Pyx_ValidateAndInit_memviewslice(
                int *axes_specs,
                int c_or_f_flag,
                int buf_flags,
                int ndim,
                __Pyx_TypeInfo *dtype,
                __Pyx_BufFmt_StackElem stack[],
                __Pyx_memviewslice *memviewslice,
                PyObject *original_obj);

/* ObjectToMemviewSlice.proto */
static CYTHON_INLINE __Pyx_memviewslice __Pyx_PyObject_to_MemoryviewSlice_d_d_dc_int(PyObject *, int writable_flag);

/* ObjectToMemviewSlice.proto */
static CYTHON_INLINE __Pyx_memviewslice __Pyx_PyObject_to_MemoryviewSlice_d_d_dc_float(PyObject *, int writable_flag);

/* ObjectToMemviewSlice.proto */
static CYTHON_INLINE __Pyx_memviewslice __Pyx_PyObject_to_MemoryviewSlice_dc_int(PyObject *, int writable_flag);

/* GCCDiagnostics.proto */
#if defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6))
#define __Pyx_HAS_GCC_DIAGNOSTIC
#endif

/* RealImag.proto */
#if CYTHON_CCOMPLEX
  #ifdef __cplusplus
    #define __Pyx_CREAL(z) ((z).real())
    #define __Pyx_CIMAG(z) ((z).imag())
  #else
    #define __Pyx_CREAL(z) (__real__(z))
    #define __Pyx_CIMAG(z) (__imag__(z))
  #endif
#else
    #define __Pyx_CREAL(z) ((z).real)
    #define __Pyx_CIMAG(z) ((z).imag)
#endif
#if defined(__cplusplus) && CYTHON_CCOMPLEX\
        && (defined(_WIN32) || defined(__clang__) || (defined(__GNUC__) && (__GNUC__ >= 5 || __GNUC__ == 4 && __GNUC_MINOR__ >= 4 )) || __cplusplus >= 201103)
    #define __Pyx_SET_CREAL(z,x) ((z).real(x))
    #define __Pyx_SET_CIMAG(z,y) ((z).imag(y))
#else
    #define __Pyx_SET_CREAL(z,x) __Pyx_CREAL(z) = (x)
    #define __Pyx_SET_CIMAG(z,y) __Pyx_CIMAG(z) = (y)
#endif

/* Arithmetic.proto */
#if CYTHON_CCOMPLEX
    #define __Pyx_c_eq_float(a, b)   ((a)==(b))
    #define __Pyx_c_sum_float(a, b)  ((a)+(b))
    #define __Pyx_c_diff_float(a, b) ((a)-(b))
    #define __Pyx_c_prod_float(a, b) ((a)*(b))
    #define __Pyx_c_quot_float(a, b) ((a)/(b))
    #define __Pyx_c_neg_float(a)     (-(a))
  #ifdef __cplusplus
    #define __Pyx_c_is_zero_float(z) ((z)==(float)0)
    #define __Pyx_c_conj_float(z)    (::std::conj(z))
    #if 1
        #define __Pyx_c_abs_float(z)     (::std::abs(z))
        #define __Pyx_c_pow_float(a, b)  (::std::pow(a, b))
    #endif
  #else
    #define __Pyx_c_is_zero_float(z) ((z)==0)
    #define __Pyx_c_conj_float(z)    (conjf(z))
    #if 1
        #define __Pyx_c_abs_float(z)     (cabsf(z))
        #define __Pyx_c_pow_float(a, b)  (cpowf(a, b))
    #endif
 #endif
#else
    static CYTHON_INLINE int __Pyx_c_eq_float(__pyx_t_float_complex, __pyx_t_float_complex);
    static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_sum_float(__pyx_t_float_complex, __pyx_t_float_complex);
    static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_diff_float(__pyx_t_float_complex, __pyx_t_float_complex);
    static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_prod_float(__pyx_t_float_complex, __pyx_t_float_complex);
    static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quot_float(__pyx_t_float_complex, __pyx_t_float_complex);
    static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_neg_float(__pyx_t_float_complex);
    static CYTHON_INLINE int __Pyx_c_is_zero_float(__pyx_t_float_complex);
    static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_conj_float(__pyx_t_float_complex);
    #if 1
        static CYTHON_INLINE float __Pyx_c_abs_float(__pyx_t_float_complex);
        static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_pow_float(__pyx_t_float_complex, __pyx_t_float_complex);
    #endif
#endif

/* Arithmetic.proto */
#if CYTHON_CCOMPLEX
    #define __Pyx_c_eq_double(a, b)   ((a)==(b))
    #define __Pyx_c_sum_double(a, b)  ((a)+(b))
    #define __Pyx_c_diff_double(a, b) ((a)-(b))
    #define __Pyx_c_prod_double(a, b) ((a)*(b))
    #define __Pyx_c_quot_double(a, b) ((a)/(b))
    #define __Pyx_c_neg_double(a)     (-(a))
  #ifdef __cplusplus
    #define __Pyx_c_is_zero_double(z) ((z)==(double)0)
    #define __Pyx_c_conj_double(z)    (::std::conj(z))
    #if 1
        #define __Pyx_c_abs_double(z)     (::std::abs(z))
        #define __Pyx_c_pow_double(a, b)  (::std::pow(a, b))
    #endif
  #else
    #define __Pyx_c_is_zero_double(z) ((z)==0)
    #define __Pyx_c_conj_double(z)    (conj(z))
    #if 1
        #define __Pyx_c_abs_double(z)     (cabs(z))
        #define __Pyx_c_pow_double(a, b)  (cpow(a, b))
    #endif
 #endif
#else
    static CYTHON_INLINE int __Pyx_c_eq_double(__pyx_t_double_complex, __pyx_t_double_complex);
    static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_sum_double(__pyx_t_double_complex, __pyx_t_double_complex);
    static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_diff_double(__pyx_t_double_complex, __pyx_t_double_complex);
    static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_prod_double(__pyx_t_double_complex, __pyx_t_double_complex);
    static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot_double(__pyx_t_double_complex, __pyx_t_double_complex);
    static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_neg_double(__pyx_t_double_complex);
    static CYTHON_INLINE int __Pyx_c_is_zero_double(__pyx_t_double_complex);
    static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_conj_double(__pyx_t_double_complex);
    #if 1
        static CYTHON_INLINE double __Pyx_c_abs_double(__pyx_t_double_complex);
        static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_pow_double(__pyx_t_double_complex, __pyx_t_double_complex);
    #endif
#endif

/* MemviewSliceCopyTemplate.proto */
static __Pyx_memviewslice
__pyx_memoryview_copy_new_contig(const __Pyx_memviewslice *from_mvs,
                                 const char *mode, int ndim,
                                 size_t sizeof_dtype, int contig_flag,
                                 int dtype_is_object);

/* CIntToPy.proto */
static CYTHON_INLINE PyObject* __Pyx_PyInt_From_int(int value);

/* CIntFromPy.proto */
static CYTHON_INLINE int __Pyx_PyInt_As_int(PyObject *);

/* CIntToPy.proto */
static CYTHON_INLINE PyObject* __Pyx_PyInt_From_long(long value);

/* CIntFromPy.proto */
static CYTHON_INLINE long __Pyx_PyInt_As_long(PyObject *);

/* CIntFromPy.proto */
static CYTHON_INLINE char __Pyx_PyInt_As_char(PyObject *);

/* CheckBinaryVersion.proto */
static int __Pyx_check_binary_version(void);

/* InitStrings.proto */
static int __Pyx_InitStrings(__Pyx_StringTabEntry *t);

static PyObject *__pyx_array_get_memview(struct __pyx_array_obj *__pyx_v_self); /* proto*/
static char *__pyx_memoryview_get_item_pointer(struct __pyx_memoryview_obj *__pyx_v_self, PyObject *__pyx_v_index); /* proto*/
static PyObject *__pyx_memoryview_is_slice(struct __pyx_memoryview_obj *__pyx_v_self, PyObject *__pyx_v_obj); /* proto*/
static PyObject *__pyx_memoryview_setitem_slice_assignment(struct __pyx_memoryview_obj *__pyx_v_self, PyObject *__pyx_v_dst, PyObject *__pyx_v_src); /* proto*/
static PyObject *__pyx_memoryview_setitem_slice_assign_scalar(struct __pyx_memoryview_obj *__pyx_v_self, struct __pyx_memoryview_obj *__pyx_v_dst, PyObject *__pyx_v_value); /* proto*/
static PyObject *__pyx_memoryview_setitem_indexed(struct __pyx_memoryview_obj *__pyx_v_self, PyObject *__pyx_v_index, PyObject *__pyx_v_value); /* proto*/
static PyObject *__pyx_memoryview_convert_item_to_object(struct __pyx_memoryview_obj *__pyx_v_self, char *__pyx_v_itemp); /* proto*/
static PyObject *__pyx_memoryview_assign_item_from_object(struct __pyx_memoryview_obj *__pyx_v_self, char *__pyx_v_itemp, PyObject *__pyx_v_value); /* proto*/
static PyObject *__pyx_memoryviewslice_convert_item_to_object(struct __pyx_memoryviewslice_obj *__pyx_v_self, char *__pyx_v_itemp); /* proto*/
static PyObject *__pyx_memoryviewslice_assign_item_from_object(struct __pyx_memoryviewslice_obj *__pyx_v_self, char *__pyx_v_itemp, PyObject *__pyx_v_value); /* proto*/

/* Module declarations from 'cython.view' */

/* Module declarations from 'cython' */

/* Module declarations from 'cpython.buffer' */

/* Module declarations from 'libc.string' */

/* Module declarations from 'libc.stdio' */

/* Module declarations from '__builtin__' */

/* Module declarations from 'cpython.type' */
static PyTypeObject *__pyx_ptype_7cpython_4type_type = 0;

/* Module declarations from 'cpython' */

/* Module declarations from 'cpython.object' */

/* Module declarations from 'cpython.ref' */

/* Module declarations from 'cpython.mem' */

/* Module declarations from 'numpy' */

/* Module declarations from 'numpy' */
static PyTypeObject *__pyx_ptype_5numpy_dtype = 0;
static PyTypeObject *__pyx_ptype_5numpy_flatiter = 0;
static PyTypeObject *__pyx_ptype_5numpy_broadcast = 0;
static PyTypeObject *__pyx_ptype_5numpy_ndarray = 0;
static PyTypeObject *__pyx_ptype_5numpy_generic = 0;
static PyTypeObject *__pyx_ptype_5numpy_number = 0;
static PyTypeObject *__pyx_ptype_5numpy_integer = 0;
static PyTypeObject *__pyx_ptype_5numpy_signedinteger = 0;
static PyTypeObject *__pyx_ptype_5numpy_unsignedinteger = 0;
static PyTypeObject *__pyx_ptype_5numpy_inexact = 0;
static PyTypeObject *__pyx_ptype_5numpy_floating = 0;
static PyTypeObject *__pyx_ptype_5numpy_complexfloating = 0;
static PyTypeObject *__pyx_ptype_5numpy_flexible = 0;
static PyTypeObject *__pyx_ptype_5numpy_character = 0;
static PyTypeObject *__pyx_ptype_5numpy_ufunc = 0;

/* Module declarations from 'TTS.tts.utils.monotonic_align.core' */
static PyTypeObject *__pyx_array_type = 0;
static PyTypeObject *__pyx_MemviewEnum_type = 0;
static PyTypeObject *__pyx_memoryview_type = 0;
static PyTypeObject *__pyx_memoryviewslice_type = 0;
static PyObject *generic = 0;
static PyObject *strided = 0;
static PyObject *indirect = 0;
static PyObject *contiguous = 0;
static PyObject *indirect_contiguous = 0;
static int __pyx_memoryview_thread_locks_used;
static PyThread_type_lock __pyx_memoryview_thread_locks[8];
static void __pyx_f_3TTS_3tts_5utils_15monotonic_align_4core_maximum_path_each(__Pyx_memviewslice, __Pyx_memviewslice, int, int, float); /*proto*/
static void __pyx_f_3TTS_3tts_5utils_15monotonic_align_4core_maximum_path_c(__Pyx_memviewslice, __Pyx_memviewslice, __Pyx_memviewslice, __Pyx_memviewslice, int __pyx_skip_dispatch, struct __pyx_opt_args_3TTS_3tts_5utils_15monotonic_align_4core_maximum_path_c *__pyx_optional_args); /*proto*/
static struct __pyx_array_obj *__pyx_array_new(PyObject *, Py_ssize_t, char *, char *, char *); /*proto*/
static void *__pyx_align_pointer(void *, size_t); /*proto*/
static PyObject *__pyx_memoryview_new(PyObject *, int, int, __Pyx_TypeInfo *); /*proto*/
static CYTHON_INLINE int __pyx_memoryview_check(PyObject *); /*proto*/
static PyObject *_unellipsify(PyObject *, int); /*proto*/
static PyObject *assert_direct_dimensions(Py_ssize_t *, int); /*proto*/
static struct __pyx_memoryview_obj *__pyx_memview_slice(struct __pyx_memoryview_obj *, PyObject *); /*proto*/
static int __pyx_memoryview_slice_memviewslice(__Pyx_memviewslice *, Py_ssize_t, Py_ssize_t, Py_ssize_t, int, int, int *, Py_ssize_t, Py_ssize_t, Py_ssize_t, int, int, int, int); /*proto*/
static char *__pyx_pybuffer_index(Py_buffer *, char *, Py_ssize_t, Py_ssize_t); /*proto*/
static int __pyx_memslice_transpose(__Pyx_memviewslice *); /*proto*/
static PyObject *__pyx_memoryview_fromslice(__Pyx_memviewslice, int, PyObject *(*)(char *), int (*)(char *, PyObject *), int); /*proto*/
static __Pyx_memviewslice *__pyx_memoryview_get_slice_from_memoryview(struct __pyx_memoryview_obj *, __Pyx_memviewslice *); /*proto*/
static void __pyx_memoryview_slice_copy(struct __pyx_memoryview_obj *, __Pyx_memviewslice *); /*proto*/
static PyObject *__pyx_memoryview_copy_object(struct __pyx_memoryview_obj *); /*proto*/
static PyObject *__pyx_memoryview_copy_object_from_slice(struct __pyx_memoryview_obj *, __Pyx_memviewslice *); /*proto*/
static Py_ssize_t abs_py_ssize_t(Py_ssize_t); /*proto*/
static char __pyx_get_best_slice_order(__Pyx_memviewslice *, int); /*proto*/
static void _copy_strided_to_strided(char *, Py_ssize_t *, char *, Py_ssize_t *, Py_ssize_t *, Py_ssize_t *, int, size_t); /*proto*/
static void copy_strided_to_strided(__Pyx_memviewslice *, __Pyx_memviewslice *, int, size_t); /*proto*/
static Py_ssize_t __pyx_memoryview_slice_get_size(__Pyx_memviewslice *, int); /*proto*/
static Py_ssize_t __pyx_fill_contig_strides_array(Py_ssize_t *, Py_ssize_t *, Py_ssize_t, int, char); /*proto*/
static void *__pyx_memoryview_copy_data_to_temp(__Pyx_memviewslice *, __Pyx_memviewslice *, char, int); /*proto*/
static int __pyx_memoryview_err_extents(int, Py_ssize_t, Py_ssize_t); /*proto*/
static int __pyx_memoryview_err_dim(PyObject *, char *, int); /*proto*/
static int __pyx_memoryview_err(PyObject *, char *); /*proto*/
static int __pyx_memoryview_copy_contents(__Pyx_memviewslice, __Pyx_memviewslice, int, int, int); /*proto*/
static void __pyx_memoryview_broadcast_leading(__Pyx_memviewslice *, int, int); /*proto*/
static void __pyx_memoryview_refcount_copying(__Pyx_memviewslice *, int, int, int); /*proto*/
static void __pyx_memoryview_refcount_objects_in_slice_with_gil(char *, Py_ssize_t *, Py_ssize_t *, int, int); /*proto*/
static void __pyx_memoryview_refcount_objects_in_slice(char *, Py_ssize_t *, Py_ssize_t *, int, int); /*proto*/
static void __pyx_memoryview_slice_assign_scalar(__Pyx_memviewslice *, int, size_t, void *, int); /*proto*/
static void __pyx_memoryview__slice_assign_scalar(char *, Py_ssize_t *, Py_ssize_t *, int, size_t, void *); /*proto*/
static PyObject *__pyx_unpickle_Enum__set_state(struct __pyx_MemviewEnum_obj *, PyObject *); /*proto*/
static __Pyx_TypeInfo __Pyx_TypeInfo_int = { "int", NULL, sizeof(int), { 0 }, 0, IS_UNSIGNED(int) ? 'U' : 'I', IS_UNSIGNED(int), 0 };
static __Pyx_TypeInfo __Pyx_TypeInfo_float = { "float", NULL, sizeof(float), { 0 }, 0, 'R', 0, 0 };
#define __Pyx_MODULE_NAME "TTS.tts.utils.monotonic_align.core"
extern int __pyx_module_is_main_TTS__tts__utils__monotonic_align__core;
int __pyx_module_is_main_TTS__tts__utils__monotonic_align__core = 0;

/* Implementation of 'TTS.tts.utils.monotonic_align.core' */
static PyObject *__pyx_builtin_range;
static PyObject *__pyx_builtin_ImportError;
static PyObject *__pyx_builtin_ValueError;
static PyObject *__pyx_builtin_MemoryError;
static PyObject *__pyx_builtin_enumerate;
static PyObject *__pyx_builtin_TypeError;
static PyObject *__pyx_builtin_Ellipsis;
static PyObject *__pyx_builtin_id;
static PyObject *__pyx_builtin_IndexError;
static const char __pyx_k_O[] = "O";
static const char __pyx_k_c[] = "c";
static const char __pyx_k_id[] = "id";
static const char __pyx_k_np[] = "np";
static const char __pyx_k_new[] = "__new__";
static const char __pyx_k_obj[] = "obj";
static const char __pyx_k_base[] = "base";
static const char __pyx_k_dict[] = "__dict__";
static const char __pyx_k_main[] = "__main__";
static const char __pyx_k_mode[] = "mode";
static const char __pyx_k_name[] = "name";
static const char __pyx_k_ndim[] = "ndim";
static const char __pyx_k_pack[] = "pack";
static const char __pyx_k_size[] = "size";
static const char __pyx_k_step[] = "step";
static const char __pyx_k_stop[] = "stop";
static const char __pyx_k_t_xs[] = "t_xs";
static const char __pyx_k_t_ys[] = "t_ys";
static const char __pyx_k_test[] = "__test__";
static const char __pyx_k_ASCII[] = "ASCII";
static const char __pyx_k_class[] = "__class__";
static const char __pyx_k_error[] = "error";
static const char __pyx_k_flags[] = "flags";
static const char __pyx_k_numpy[] = "numpy";
static const char __pyx_k_paths[] = "paths";
static const char __pyx_k_range[] = "range";
static const char __pyx_k_shape[] = "shape";
static const char __pyx_k_start[] = "start";
static const char __pyx_k_encode[] = "encode";
static const char __pyx_k_format[] = "format";
static const char __pyx_k_import[] = "__import__";
static const char __pyx_k_name_2[] = "__name__";
static const char __pyx_k_pickle[] = "pickle";
static const char __pyx_k_reduce[] = "__reduce__";
static const char __pyx_k_struct[] = "struct";
static const char __pyx_k_unpack[] = "unpack";
static const char __pyx_k_update[] = "update";
static const char __pyx_k_values[] = "values";
static const char __pyx_k_fortran[] = "fortran";
static const char __pyx_k_memview[] = "memview";
static const char __pyx_k_Ellipsis[] = "Ellipsis";
static const char __pyx_k_getstate[] = "__getstate__";
static const char __pyx_k_itemsize[] = "itemsize";
static const char __pyx_k_pyx_type[] = "__pyx_type";
static const char __pyx_k_setstate[] = "__setstate__";
static const char __pyx_k_TypeError[] = "TypeError";
static const char __pyx_k_enumerate[] = "enumerate";
static const char __pyx_k_pyx_state[] = "__pyx_state";
static const char __pyx_k_reduce_ex[] = "__reduce_ex__";
static const char __pyx_k_IndexError[] = "IndexError";
static const char __pyx_k_ValueError[] = "ValueError";
static const char __pyx_k_pyx_result[] = "__pyx_result";
static const char __pyx_k_pyx_vtable[] = "__pyx_vtable__";
static const char __pyx_k_ImportError[] = "ImportError";
static const char __pyx_k_MemoryError[] = "MemoryError";
static const char __pyx_k_PickleError[] = "PickleError";
static const char __pyx_k_max_neg_val[] = "max_neg_val";
static const char __pyx_k_pyx_checksum[] = "__pyx_checksum";
static const char __pyx_k_stringsource[] = "stringsource";
static const char __pyx_k_pyx_getbuffer[] = "__pyx_getbuffer";
static const char __pyx_k_reduce_cython[] = "__reduce_cython__";
static const char __pyx_k_View_MemoryView[] = "View.MemoryView";
static const char __pyx_k_allocate_buffer[] = "allocate_buffer";
static const char __pyx_k_dtype_is_object[] = "dtype_is_object";
static const char __pyx_k_pyx_PickleError[] = "__pyx_PickleError";
static const char __pyx_k_setstate_cython[] = "__setstate_cython__";
static const char __pyx_k_pyx_unpickle_Enum[] = "__pyx_unpickle_Enum";
static const char __pyx_k_cline_in_traceback[] = "cline_in_traceback";
static const char __pyx_k_strided_and_direct[] = "<strided and direct>";
static const char __pyx_k_strided_and_indirect[] = "<strided and indirect>";
static const char __pyx_k_contiguous_and_direct[] = "<contiguous and direct>";
static const char __pyx_k_MemoryView_of_r_object[] = "<MemoryView of %r object>";
static const char __pyx_k_MemoryView_of_r_at_0x_x[] = "<MemoryView of %r at 0x%x>";
static const char __pyx_k_contiguous_and_indirect[] = "<contiguous and indirect>";
static const char __pyx_k_Cannot_index_with_type_s[] = "Cannot index with type '%s'";
static const char __pyx_k_Invalid_shape_in_axis_d_d[] = "Invalid shape in axis %d: %d.";
static const char __pyx_k_itemsize_0_for_cython_array[] = "itemsize <= 0 for cython.array";
static const char __pyx_k_unable_to_allocate_array_data[] = "unable to allocate array data.";
static const char __pyx_k_strided_and_direct_or_indirect[] = "<strided and direct or indirect>";
static const char __pyx_k_numpy_core_multiarray_failed_to[] = "numpy.core.multiarray failed to import";
static const char __pyx_k_Buffer_view_does_not_expose_stri[] = "Buffer view does not expose strides";
static const char __pyx_k_Can_only_create_a_buffer_that_is[] = "Can only create a buffer that is contiguous in memory.";
static const char __pyx_k_Cannot_assign_to_read_only_memor[] = "Cannot assign to read-only memoryview";
static const char __pyx_k_Cannot_create_writable_memory_vi[] = "Cannot create writable memory view from read-only memoryview";
static const char __pyx_k_Empty_shape_tuple_for_cython_arr[] = "Empty shape tuple for cython.array";
static const char __pyx_k_Incompatible_checksums_s_vs_0xb0[] = "Incompatible checksums (%s vs 0xb068931 = (name))";
static const char __pyx_k_Indirect_dimensions_not_supporte[] = "Indirect dimensions not supported";
static const char __pyx_k_Invalid_mode_expected_c_or_fortr[] = "Invalid mode, expected 'c' or 'fortran', got %s";
static const char __pyx_k_Out_of_bounds_on_buffer_access_a[] = "Out of bounds on buffer access (axis %d)";
static const char __pyx_k_Unable_to_convert_item_to_object[] = "Unable to convert item to object";
static const char __pyx_k_got_differing_extents_in_dimensi[] = "got differing extents in dimension %d (got %d and %d)";
static const char __pyx_k_no_default___reduce___due_to_non[] = "no default __reduce__ due to non-trivial __cinit__";
static const char __pyx_k_numpy_core_umath_failed_to_impor[] = "numpy.core.umath failed to import";
static const char __pyx_k_unable_to_allocate_shape_and_str[] = "unable to allocate shape and strides.";
static PyObject *__pyx_n_s_ASCII;
static PyObject *__pyx_kp_s_Buffer_view_does_not_expose_stri;
static PyObject *__pyx_kp_s_Can_only_create_a_buffer_that_is;
static PyObject *__pyx_kp_s_Cannot_assign_to_read_only_memor;
static PyObject *__pyx_kp_s_Cannot_create_writable_memory_vi;
static PyObject *__pyx_kp_s_Cannot_index_with_type_s;
static PyObject *__pyx_n_s_Ellipsis;
static PyObject *__pyx_kp_s_Empty_shape_tuple_for_cython_arr;
static PyObject *__pyx_n_s_ImportError;
static PyObject *__pyx_kp_s_Incompatible_checksums_s_vs_0xb0;
static PyObject *__pyx_n_s_IndexError;
static PyObject *__pyx_kp_s_Indirect_dimensions_not_supporte;
static PyObject *__pyx_kp_s_Invalid_mode_expected_c_or_fortr;
static PyObject *__pyx_kp_s_Invalid_shape_in_axis_d_d;
static PyObject *__pyx_n_s_MemoryError;
static PyObject *__pyx_kp_s_MemoryView_of_r_at_0x_x;
static PyObject *__pyx_kp_s_MemoryView_of_r_object;
static PyObject *__pyx_n_b_O;
static PyObject *__pyx_kp_s_Out_of_bounds_on_buffer_access_a;
static PyObject *__pyx_n_s_PickleError;
static PyObject *__pyx_n_s_TypeError;
static PyObject *__pyx_kp_s_Unable_to_convert_item_to_object;
static PyObject *__pyx_n_s_ValueError;
static PyObject *__pyx_n_s_View_MemoryView;
static PyObject *__pyx_n_s_allocate_buffer;
static PyObject *__pyx_n_s_base;
static PyObject *__pyx_n_s_c;
static PyObject *__pyx_n_u_c;
static PyObject *__pyx_n_s_class;
static PyObject *__pyx_n_s_cline_in_traceback;
static PyObject *__pyx_kp_s_contiguous_and_direct;
static PyObject *__pyx_kp_s_contiguous_and_indirect;
static PyObject *__pyx_n_s_dict;
static PyObject *__pyx_n_s_dtype_is_object;
static PyObject *__pyx_n_s_encode;
static PyObject *__pyx_n_s_enumerate;
static PyObject *__pyx_n_s_error;
static PyObject *__pyx_n_s_flags;
static PyObject *__pyx_n_s_format;
static PyObject *__pyx_n_s_fortran;
static PyObject *__pyx_n_u_fortran;
static PyObject *__pyx_n_s_getstate;
static PyObject *__pyx_kp_s_got_differing_extents_in_dimensi;
static PyObject *__pyx_n_s_id;
static PyObject *__pyx_n_s_import;
static PyObject *__pyx_n_s_itemsize;
static PyObject *__pyx_kp_s_itemsize_0_for_cython_array;
static PyObject *__pyx_n_s_main;
static PyObject *__pyx_n_s_max_neg_val;
static PyObject *__pyx_n_s_memview;
static PyObject *__pyx_n_s_mode;
static PyObject *__pyx_n_s_name;
static PyObject *__pyx_n_s_name_2;
static PyObject *__pyx_n_s_ndim;
static PyObject *__pyx_n_s_new;
static PyObject *__pyx_kp_s_no_default___reduce___due_to_non;
static PyObject *__pyx_n_s_np;
static PyObject *__pyx_n_s_numpy;
static PyObject *__pyx_kp_u_numpy_core_multiarray_failed_to;
static PyObject *__pyx_kp_u_numpy_core_umath_failed_to_impor;
static PyObject *__pyx_n_s_obj;
static PyObject *__pyx_n_s_pack;
static PyObject *__pyx_n_s_paths;
static PyObject *__pyx_n_s_pickle;
static PyObject *__pyx_n_s_pyx_PickleError;
static PyObject *__pyx_n_s_pyx_checksum;
static PyObject *__pyx_n_s_pyx_getbuffer;
static PyObject *__pyx_n_s_pyx_result;
static PyObject *__pyx_n_s_pyx_state;
static PyObject *__pyx_n_s_pyx_type;
static PyObject *__pyx_n_s_pyx_unpickle_Enum;
static PyObject *__pyx_n_s_pyx_vtable;
static PyObject *__pyx_n_s_range;
static PyObject *__pyx_n_s_reduce;
static PyObject *__pyx_n_s_reduce_cython;
static PyObject *__pyx_n_s_reduce_ex;
static PyObject *__pyx_n_s_setstate;
static PyObject *__pyx_n_s_setstate_cython;
static PyObject *__pyx_n_s_shape;
static PyObject *__pyx_n_s_size;
static PyObject *__pyx_n_s_start;
static PyObject *__pyx_n_s_step;
static PyObject *__pyx_n_s_stop;
static PyObject *__pyx_kp_s_strided_and_direct;
static PyObject *__pyx_kp_s_strided_and_direct_or_indirect;
static PyObject *__pyx_kp_s_strided_and_indirect;
static PyObject *__pyx_kp_s_stringsource;
static PyObject *__pyx_n_s_struct;
static PyObject *__pyx_n_s_t_xs;
static PyObject *__pyx_n_s_t_ys;
static PyObject *__pyx_n_s_test;
static PyObject *__pyx_kp_s_unable_to_allocate_array_data;
static PyObject *__pyx_kp_s_unable_to_allocate_shape_and_str;
static PyObject *__pyx_n_s_unpack;
static PyObject *__pyx_n_s_update;
static PyObject *__pyx_n_s_values;
static PyObject *__pyx_pf_3TTS_3tts_5utils_15monotonic_align_4core_maximum_path_c(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_paths, __Pyx_memviewslice __pyx_v_values, __Pyx_memviewslice __pyx_v_t_xs, __Pyx_memviewslice __pyx_v_t_ys, float __pyx_v_max_neg_val); /* proto */
static int __pyx_array___pyx_pf_15View_dot_MemoryView_5array___cinit__(struct __pyx_array_obj *__pyx_v_self, PyObject *__pyx_v_shape, Py_ssize_t __pyx_v_itemsize, PyObject *__pyx_v_format, PyObject *__pyx_v_mode, int __pyx_v_allocate_buffer); /* proto */
static int __pyx_array___pyx_pf_15View_dot_MemoryView_5array_2__getbuffer__(struct __pyx_array_obj *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /* proto */
static void __pyx_array___pyx_pf_15View_dot_MemoryView_5array_4__dealloc__(struct __pyx_array_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf_15View_dot_MemoryView_5array_7memview___get__(struct __pyx_array_obj *__pyx_v_self); /* proto */
static Py_ssize_t __pyx_array___pyx_pf_15View_dot_MemoryView_5array_6__len__(struct __pyx_array_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_array___pyx_pf_15View_dot_MemoryView_5array_8__getattr__(struct __pyx_array_obj *__pyx_v_self, PyObject *__pyx_v_attr); /* proto */
static PyObject *__pyx_array___pyx_pf_15View_dot_MemoryView_5array_10__getitem__(struct __pyx_array_obj *__pyx_v_self, PyObject *__pyx_v_item); /* proto */
static int __pyx_array___pyx_pf_15View_dot_MemoryView_5array_12__setitem__(struct __pyx_array_obj *__pyx_v_self, PyObject *__pyx_v_item, PyObject *__pyx_v_value); /* proto */
static PyObject *__pyx_pf___pyx_array___reduce_cython__(CYTHON_UNUSED struct __pyx_array_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf___pyx_array_2__setstate_cython__(CYTHON_UNUSED struct __pyx_array_obj *__pyx_v_self, CYTHON_UNUSED PyObject *__pyx_v___pyx_state); /* proto */
static int __pyx_MemviewEnum___pyx_pf_15View_dot_MemoryView_4Enum___init__(struct __pyx_MemviewEnum_obj *__pyx_v_self, PyObject *__pyx_v_name); /* proto */
static PyObject *__pyx_MemviewEnum___pyx_pf_15View_dot_MemoryView_4Enum_2__repr__(struct __pyx_MemviewEnum_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf___pyx_MemviewEnum___reduce_cython__(struct __pyx_MemviewEnum_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf___pyx_MemviewEnum_2__setstate_cython__(struct __pyx_MemviewEnum_obj *__pyx_v_self, PyObject *__pyx_v___pyx_state); /* proto */
static int __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview___cinit__(struct __pyx_memoryview_obj *__pyx_v_self, PyObject *__pyx_v_obj, int __pyx_v_flags, int __pyx_v_dtype_is_object); /* proto */
static void __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_2__dealloc__(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_4__getitem__(struct __pyx_memoryview_obj *__pyx_v_self, PyObject *__pyx_v_index); /* proto */
static int __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_6__setitem__(struct __pyx_memoryview_obj *__pyx_v_self, PyObject *__pyx_v_index, PyObject *__pyx_v_value); /* proto */
static int __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_8__getbuffer__(struct __pyx_memoryview_obj *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /* proto */
static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_1T___get__(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_4base___get__(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_5shape___get__(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_7strides___get__(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_10suboffsets___get__(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_4ndim___get__(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_8itemsize___get__(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_6nbytes___get__(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_4size___get__(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static Py_ssize_t __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_10__len__(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_12__repr__(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_14__str__(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_16is_c_contig(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_18is_f_contig(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_20copy(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_22copy_fortran(struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf___pyx_memoryview___reduce_cython__(CYTHON_UNUSED struct __pyx_memoryview_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf___pyx_memoryview_2__setstate_cython__(CYTHON_UNUSED struct __pyx_memoryview_obj *__pyx_v_self, CYTHON_UNUSED PyObject *__pyx_v___pyx_state); /* proto */
static void __pyx_memoryviewslice___pyx_pf_15View_dot_MemoryView_16_memoryviewslice___dealloc__(struct __pyx_memoryviewslice_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf_15View_dot_MemoryView_16_memoryviewslice_4base___get__(struct __pyx_memoryviewslice_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf___pyx_memoryviewslice___reduce_cython__(CYTHON_UNUSED struct __pyx_memoryviewslice_obj *__pyx_v_self); /* proto */
static PyObject *__pyx_pf___pyx_memoryviewslice_2__setstate_cython__(CYTHON_UNUSED struct __pyx_memoryviewslice_obj *__pyx_v_self, CYTHON_UNUSED PyObject *__pyx_v___pyx_state); /* proto */
static PyObject *__pyx_pf_15View_dot_MemoryView___pyx_unpickle_Enum(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v___pyx_type, long __pyx_v___pyx_checksum, PyObject *__pyx_v___pyx_state); /* proto */
static PyObject *__pyx_tp_new_array(PyTypeObject *t, PyObject *a, PyObject *k); /*proto*/
static PyObject *__pyx_tp_new_Enum(PyTypeObject *t, PyObject *a, PyObject *k); /*proto*/
static PyObject *__pyx_tp_new_memoryview(PyTypeObject *t, PyObject *a, PyObject *k); /*proto*/
static PyObject *__pyx_tp_new__memoryviewslice(PyTypeObject *t, PyObject *a, PyObject *k); /*proto*/
static PyObject *__pyx_int_0;
static PyObject *__pyx_int_1;
static PyObject *__pyx_int_184977713;
static PyObject *__pyx_int_neg_1;
static float __pyx_k_;
static PyObject *__pyx_tuple__2;
static PyObject *__pyx_tuple__3;
static PyObject *__pyx_tuple__4;
static PyObject *__pyx_tuple__5;
static PyObject *__pyx_tuple__6;
static PyObject *__pyx_tuple__7;
static PyObject *__pyx_tuple__8;
static PyObject *__pyx_tuple__9;
static PyObject *__pyx_slice__18;
static PyObject *__pyx_tuple__10;
static PyObject *__pyx_tuple__11;
static PyObject *__pyx_tuple__12;
static PyObject *__pyx_tuple__13;
static PyObject *__pyx_tuple__14;
static PyObject *__pyx_tuple__15;
static PyObject *__pyx_tuple__16;
static PyObject *__pyx_tuple__17;
static PyObject *__pyx_tuple__19;
static PyObject *__pyx_tuple__20;
static PyObject *__pyx_tuple__21;
static PyObject *__pyx_tuple__22;
static PyObject *__pyx_tuple__23;
static PyObject *__pyx_tuple__24;
static PyObject *__pyx_tuple__25;
static PyObject *__pyx_tuple__26;
static PyObject *__pyx_tuple__27;
static PyObject *__pyx_codeobj__28;
/* Late includes */

/* "TTS/tts/utils/monotonic_align/core.pyx":11
 * @cython.boundscheck(False)
 * @cython.wraparound(False)
 * cdef void maximum_path_each(int[:,::1] path, float[:,::1] value, int t_x, int t_y, float max_neg_val) nogil:             # <<<<<<<<<<<<<<
 *   cdef int x
 *   cdef int y
 */

static void __pyx_f_3TTS_3tts_5utils_15monotonic_align_4core_maximum_path_each(__Pyx_memviewslice __pyx_v_path, __Pyx_memviewslice __pyx_v_value, int __pyx_v_t_x, int __pyx_v_t_y, float __pyx_v_max_neg_val) {
  int __pyx_v_x;
  int __pyx_v_y;
  float __pyx_v_v_prev;
  float __pyx_v_v_cur;
  int __pyx_v_index;
  int __pyx_t_1;
  int __pyx_t_2;
  int __pyx_t_3;
  long __pyx_t_4;
  int __pyx_t_5;
  long __pyx_t_6;
  long __pyx_t_7;
  int __pyx_t_8;
  Py_ssize_t __pyx_t_9;
  Py_ssize_t __pyx_t_10;
  float __pyx_t_11;
  float __pyx_t_12;
  float __pyx_t_13;
  Py_ssize_t __pyx_t_14;
  Py_ssize_t __pyx_t_15;
  int __pyx_t_16;

  /* "TTS/tts/utils/monotonic_align/core.pyx":17
 *   cdef float v_cur
 *   cdef float tmp
 *   cdef int index = t_x - 1             # <<<<<<<<<<<<<<
 * 
 *   for y in range(t_y):
 */
  __pyx_v_index = (__pyx_v_t_x - 1);

  /* "TTS/tts/utils/monotonic_align/core.pyx":19
 *   cdef int index = t_x - 1
 * 
 *   for y in range(t_y):             # <<<<<<<<<<<<<<
 *     for x in range(max(0, t_x + y - t_y), min(t_x, y + 1)):
 *       if x == y:
 */
  __pyx_t_1 = __pyx_v_t_y;
  __pyx_t_2 = __pyx_t_1;
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
    __pyx_v_y = __pyx_t_3;

    /* "TTS/tts/utils/monotonic_align/core.pyx":20
 * 
 *   for y in range(t_y):
 *     for x in range(max(0, t_x + y - t_y), min(t_x, y + 1)):             # <<<<<<<<<<<<<<
 *       if x == y:
 *         v_cur = max_neg_val
 */
    __pyx_t_4 = (__pyx_v_y + 1);
    __pyx_t_5 = __pyx_v_t_x;
    if (((__pyx_t_4 < __pyx_t_5) != 0)) {
      __pyx_t_6 = __pyx_t_4;
    } else {
      __pyx_t_6 = __pyx_t_5;
    }
    __pyx_t_4 = __pyx_t_6;
    __pyx_t_5 = ((__pyx_v_t_x + __pyx_v_y) - __pyx_v_t_y);
    __pyx_t_6 = 0;
    if (((__pyx_t_5 > __pyx_t_6) != 0)) {
      __pyx_t_7 = __pyx_t_5;
    } else {
      __pyx_t_7 = __pyx_t_6;
    }
    __pyx_t_6 = __pyx_t_4;
    for (__pyx_t_5 = __pyx_t_7; __pyx_t_5 < __pyx_t_6; __pyx_t_5+=1) {
      __pyx_v_x = __pyx_t_5;

      /* "TTS/tts/utils/monotonic_align/core.pyx":21
 *   for y in range(t_y):
 *     for x in range(max(0, t_x + y - t_y), min(t_x, y + 1)):
 *       if x == y:             # <<<<<<<<<<<<<<
 *         v_cur = max_neg_val
 *       else:
 */
      __pyx_t_8 = ((__pyx_v_x == __pyx_v_y) != 0);
      if (__pyx_t_8) {

        /* "TTS/tts/utils/monotonic_align/core.pyx":22
 *     for x in range(max(0, t_x + y - t_y), min(t_x, y + 1)):
 *       if x == y:
 *         v_cur = max_neg_val             # <<<<<<<<<<<<<<
 *       else:
 *         v_cur = value[x, y-1]
 */
        __pyx_v_v_cur = __pyx_v_max_neg_val;

        /* "TTS/tts/utils/monotonic_align/core.pyx":21
 *   for y in range(t_y):
 *     for x in range(max(0, t_x + y - t_y), min(t_x, y + 1)):
 *       if x == y:             # <<<<<<<<<<<<<<
 *         v_cur = max_neg_val
 *       else:
 */
        goto __pyx_L7;
      }

      /* "TTS/tts/utils/monotonic_align/core.pyx":24
 *         v_cur = max_neg_val
 *       else:
 *         v_cur = value[x, y-1]             # <<<<<<<<<<<<<<
 *       if x == 0:
 *         if y == 0:
 */
      /*else*/ {
        __pyx_t_9 = __pyx_v_x;
        __pyx_t_10 = (__pyx_v_y - 1);
        __pyx_v_v_cur = (*((float *) ( /* dim=1 */ ((char *) (((float *) ( /* dim=0 */ (__pyx_v_value.data + __pyx_t_9 * __pyx_v_value.strides[0]) )) + __pyx_t_10)) )));
      }
      __pyx_L7:;

      /* "TTS/tts/utils/monotonic_align/core.pyx":25
 *       else:
 *         v_cur = value[x, y-1]
 *       if x == 0:             # <<<<<<<<<<<<<<
 *         if y == 0:
 *           v_prev = 0.
 */
      __pyx_t_8 = ((__pyx_v_x == 0) != 0);
      if (__pyx_t_8) {

        /* "TTS/tts/utils/monotonic_align/core.pyx":26
 *         v_cur = value[x, y-1]
 *       if x == 0:
 *         if y == 0:             # <<<<<<<<<<<<<<
 *           v_prev = 0.
 *         else:
 */
        __pyx_t_8 = ((__pyx_v_y == 0) != 0);
        if (__pyx_t_8) {

          /* "TTS/tts/utils/monotonic_align/core.pyx":27
 *       if x == 0:
 *         if y == 0:
 *           v_prev = 0.             # <<<<<<<<<<<<<<
 *         else:
 *           v_prev = max_neg_val
 */
          __pyx_v_v_prev = 0.;

          /* "TTS/tts/utils/monotonic_align/core.pyx":26
 *         v_cur = value[x, y-1]
 *       if x == 0:
 *         if y == 0:             # <<<<<<<<<<<<<<
 *           v_prev = 0.
 *         else:
 */
          goto __pyx_L9;
        }

        /* "TTS/tts/utils/monotonic_align/core.pyx":29
 *           v_prev = 0.
 *         else:
 *           v_prev = max_neg_val             # <<<<<<<<<<<<<<
 *       else:
 *         v_prev = value[x-1, y-1]
 */
        /*else*/ {
          __pyx_v_v_prev = __pyx_v_max_neg_val;
        }
        __pyx_L9:;

        /* "TTS/tts/utils/monotonic_align/core.pyx":25
 *       else:
 *         v_cur = value[x, y-1]
 *       if x == 0:             # <<<<<<<<<<<<<<
 *         if y == 0:
 *           v_prev = 0.
 */
        goto __pyx_L8;
      }

      /* "TTS/tts/utils/monotonic_align/core.pyx":31
 *           v_prev = max_neg_val
 *       else:
 *         v_prev = value[x-1, y-1]             # <<<<<<<<<<<<<<
 *       value[x, y] = max(v_cur, v_prev) + value[x, y]
 * 
 */
      /*else*/ {
        __pyx_t_10 = (__pyx_v_x - 1);
        __pyx_t_9 = (__pyx_v_y - 1);
        __pyx_v_v_prev = (*((float *) ( /* dim=1 */ ((char *) (((float *) ( /* dim=0 */ (__pyx_v_value.data + __pyx_t_10 * __pyx_v_value.strides[0]) )) + __pyx_t_9)) )));
      }
      __pyx_L8:;

      /* "TTS/tts/utils/monotonic_align/core.pyx":32
 *       else:
 *         v_prev = value[x-1, y-1]
 *       value[x, y] = max(v_cur, v_prev) + value[x, y]             # <<<<<<<<<<<<<<
 * 
 *   for y in range(t_y - 1, -1, -1):
 */
      __pyx_t_11 = __pyx_v_v_prev;
      __pyx_t_12 = __pyx_v_v_cur;
      if (((__pyx_t_11 > __pyx_t_12) != 0)) {
        __pyx_t_13 = __pyx_t_11;
      } else {
        __pyx_t_13 = __pyx_t_12;
      }
      __pyx_t_9 = __pyx_v_x;
      __pyx_t_10 = __pyx_v_y;
      __pyx_t_14 = __pyx_v_x;
      __pyx_t_15 = __pyx_v_y;
      *((float *) ( /* dim=1 */ ((char *) (((float *) ( /* dim=0 */ (__pyx_v_value.data + __pyx_t_14 * __pyx_v_value.strides[0]) )) + __pyx_t_15)) )) = (__pyx_t_13 + (*((float *) ( /* dim=1 */ ((char *) (((float *) ( /* dim=0 */ (__pyx_v_value.data + __pyx_t_9 * __pyx_v_value.strides[0]) )) + __pyx_t_10)) ))));
    }
  }

  /* "TTS/tts/utils/monotonic_align/core.pyx":34
 *       value[x, y] = max(v_cur, v_prev) + value[x, y]
 * 
 *   for y in range(t_y - 1, -1, -1):             # <<<<<<<<<<<<<<
 *     path[index, y] = 1
 *     if index != 0 and (index == y or value[index, y-1] < value[index-1, y-1]):
 */
  for (__pyx_t_1 = (__pyx_v_t_y - 1); __pyx_t_1 > -1; __pyx_t_1-=1) {
    __pyx_v_y = __pyx_t_1;

    /* "TTS/tts/utils/monotonic_align/core.pyx":35
 * 
 *   for y in range(t_y - 1, -1, -1):
 *     path[index, y] = 1             # <<<<<<<<<<<<<<
 *     if index != 0 and (index == y or value[index, y-1] < value[index-1, y-1]):
 *       index = index - 1
 */
    __pyx_t_10 = __pyx_v_index;
    __pyx_t_9 = __pyx_v_y;
    *((int *) ( /* dim=1 */ ((char *) (((int *) ( /* dim=0 */ (__pyx_v_path.data + __pyx_t_10 * __pyx_v_path.strides[0]) )) + __pyx_t_9)) )) = 1;

    /* "TTS/tts/utils/monotonic_align/core.pyx":36
 *   for y in range(t_y - 1, -1, -1):
 *     path[index, y] = 1
 *     if index != 0 and (index == y or value[index, y-1] < value[index-1, y-1]):             # <<<<<<<<<<<<<<
 *       index = index - 1
 * 
 */
    __pyx_t_16 = ((__pyx_v_index != 0) != 0);
    if (__pyx_t_16) {
    } else {
      __pyx_t_8 = __pyx_t_16;
      goto __pyx_L13_bool_binop_done;
    }
    __pyx_t_16 = ((__pyx_v_index == __pyx_v_y) != 0);
    if (!__pyx_t_16) {
    } else {
      __pyx_t_8 = __pyx_t_16;
      goto __pyx_L13_bool_binop_done;
    }
    __pyx_t_9 = __pyx_v_index;
    __pyx_t_10 = (__pyx_v_y - 1);
    __pyx_t_15 = (__pyx_v_index - 1);
    __pyx_t_14 = (__pyx_v_y - 1);
    __pyx_t_16 = (((*((float *) ( /* dim=1 */ ((char *) (((float *) ( /* dim=0 */ (__pyx_v_value.data + __pyx_t_9 * __pyx_v_value.strides[0]) )) + __pyx_t_10)) ))) < (*((float *) ( /* dim=1 */ ((char *) (((float *) ( /* dim=0 */ (__pyx_v_value.data + __pyx_t_15 * __pyx_v_value.strides[0]) )) + __pyx_t_14)) )))) != 0);
    __pyx_t_8 = __pyx_t_16;
    __pyx_L13_bool_binop_done:;
    if (__pyx_t_8) {

      /* "TTS/tts/utils/monotonic_align/core.pyx":37
 *     path[index, y] = 1
 *     if index != 0 and (index == y or value[index, y-1] < value[index-1, y-1]):
 *       index = index - 1             # <<<<<<<<<<<<<<
 * 
 * 
 */
      __pyx_v_index = (__pyx_v_index - 1);

      /* "TTS/tts/utils/monotonic_align/core.pyx":36
 *   for y in range(t_y - 1, -1, -1):
 *     path[index, y] = 1
 *     if index != 0 and (index == y or value[index, y-1] < value[index-1, y-1]):             # <<<<<<<<<<<<<<
 *       index = index - 1
 * 
 */
    }
  }

  /* "TTS/tts/utils/monotonic_align/core.pyx":11
 * @cython.boundscheck(False)
 * @cython.wraparound(False)
 * cdef void maximum_path_each(int[:,::1] path, float[:,::1] value, int t_x, int t_y, float max_neg_val) nogil:             # <<<<<<<<<<<<<<
 *   cdef int x
 *   cdef int y
 */

  /* function exit code */
}

/* "TTS/tts/utils/monotonic_align/core.pyx":42
 * @cython.boundscheck(False)
 * @cython.wraparound(False)
 * cpdef void maximum_path_c(int[:,:,::1] paths, float[:,:,::1] values, int[::1] t_xs, int[::1] t_ys, float max_neg_val=-1e9) nogil:             # <<<<<<<<<<<<<<
 *   cdef int b = values.shape[0]
 * 
 */

static PyObject *__pyx_pw_3TTS_3tts_5utils_15monotonic_align_4core_1maximum_path_c(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static void __pyx_f_3TTS_3tts_5utils_15monotonic_align_4core_maximum_path_c(__Pyx_memviewslice __pyx_v_paths, __Pyx_memviewslice __pyx_v_values, __Pyx_memviewslice __pyx_v_t_xs, __Pyx_memviewslice __pyx_v_t_ys, CYTHON_UNUSED int __pyx_skip_dispatch, struct __pyx_opt_args_3TTS_3tts_5utils_15monotonic_align_4core_maximum_path_c *__pyx_optional_args) {
  float __pyx_v_max_neg_val = __pyx_k_;
  CYTHON_UNUSED int __pyx_v_b;
  int __pyx_v_i;
  int __pyx_t_1;
  int __pyx_t_2;
  int __pyx_t_3;
  __Pyx_memviewslice __pyx_t_4 = { 0, 0, { 0 }, { 0 }, { 0 } };
  __Pyx_memviewslice __pyx_t_5 = { 0, 0, { 0 }, { 0 }, { 0 } };
  Py_ssize_t __pyx_t_6;
  Py_ssize_t __pyx_t_7;
  if (__pyx_optional_args) {
    if (__pyx_optional_args->__pyx_n > 0) {
      __pyx_v_max_neg_val = __pyx_optional_args->max_neg_val;
    }
  }

  /* "TTS/tts/utils/monotonic_align/core.pyx":43
 * @cython.wraparound(False)
 * cpdef void maximum_path_c(int[:,:,::1] paths, float[:,:,::1] values, int[::1] t_xs, int[::1] t_ys, float max_neg_val=-1e9) nogil:
 *   cdef int b = values.shape[0]             # <<<<<<<<<<<<<<
 * 
 *   cdef int i
 */
  __pyx_v_b = (__pyx_v_values.shape[0]);

  /* "TTS/tts/utils/monotonic_align/core.pyx":46
 * 
 *   cdef int i
 *   for i in prange(b, nogil=True):             # <<<<<<<<<<<<<<
 *     maximum_path_each(paths[i], values[i], t_xs[i], t_ys[i], max_neg_val)
 */
  {
      #ifdef WITH_THREAD
      PyThreadState *_save;
      Py_UNBLOCK_THREADS
      __Pyx_FastGIL_Remember();
      #endif
      /*try:*/ {
        __pyx_t_1 = __pyx_v_b;
        if ((1 == 0)) abort();
        {
            #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))))
                #undef likely
                #undef unlikely
                #define likely(x)   (x)
                #define unlikely(x) (x)
            #endif
            __pyx_t_3 = (__pyx_t_1 - 0 + 1 - 1/abs(1)) / 1;
            if (__pyx_t_3 > 0)
            {
                #ifdef _OPENMP
                #pragma omp parallel private(__pyx_t_6, __pyx_t_7) firstprivate(__pyx_t_4, __pyx_t_5)
                #endif /* _OPENMP */
                {
                    #ifdef _OPENMP
                    #pragma omp for firstprivate(__pyx_v_i) lastprivate(__pyx_v_i)
                    #endif /* _OPENMP */
                    for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_3; __pyx_t_2++){
                        {
                            __pyx_v_i = (int)(0 + 1 * __pyx_t_2);

                            /* "TTS/tts/utils/monotonic_align/core.pyx":47
 *   cdef int i
 *   for i in prange(b, nogil=True):
 *     maximum_path_each(paths[i], values[i], t_xs[i], t_ys[i], max_neg_val)             # <<<<<<<<<<<<<<
 */
                            __pyx_t_4.data = __pyx_v_paths.data;
                            __pyx_t_4.memview = __pyx_v_paths.memview;
                            __PYX_INC_MEMVIEW(&__pyx_t_4, 0);
                            {
    Py_ssize_t __pyx_tmp_idx = __pyx_v_i;
    Py_ssize_t __pyx_tmp_stride = __pyx_v_paths.strides[0];
        __pyx_t_4.data += __pyx_tmp_idx * __pyx_tmp_stride;
}

__pyx_t_4.shape[0] = __pyx_v_paths.shape[1];
__pyx_t_4.strides[0] = __pyx_v_paths.strides[1];
    __pyx_t_4.suboffsets[0] = -1;

__pyx_t_4.shape[1] = __pyx_v_paths.shape[2];
__pyx_t_4.strides[1] = __pyx_v_paths.strides[2];
    __pyx_t_4.suboffsets[1] = -1;

__pyx_t_5.data = __pyx_v_values.data;
                            __pyx_t_5.memview = __pyx_v_values.memview;
                            __PYX_INC_MEMVIEW(&__pyx_t_5, 0);
                            {
    Py_ssize_t __pyx_tmp_idx = __pyx_v_i;
    Py_ssize_t __pyx_tmp_stride = __pyx_v_values.strides[0];
        __pyx_t_5.data += __pyx_tmp_idx * __pyx_tmp_stride;
}

__pyx_t_5.shape[0] = __pyx_v_values.shape[1];
__pyx_t_5.strides[0] = __pyx_v_values.strides[1];
    __pyx_t_5.suboffsets[0] = -1;

__pyx_t_5.shape[1] = __pyx_v_values.shape[2];
__pyx_t_5.strides[1] = __pyx_v_values.strides[2];
    __pyx_t_5.suboffsets[1] = -1;

__pyx_t_6 = __pyx_v_i;
                            __pyx_t_7 = __pyx_v_i;
                            __pyx_f_3TTS_3tts_5utils_15monotonic_align_4core_maximum_path_each(__pyx_t_4, __pyx_t_5, (*((int *) ( /* dim=0 */ ((char *) (((int *) __pyx_v_t_xs.data) + __pyx_t_6)) ))), (*((int *) ( /* dim=0 */ ((char *) (((int *) __pyx_v_t_ys.data) + __pyx_t_7)) ))), __pyx_v_max_neg_val);
                            __PYX_XDEC_MEMVIEW(&__pyx_t_4, 0);
                            __pyx_t_4.memview = NULL;
                            __pyx_t_4.data = NULL;
                            __PYX_XDEC_MEMVIEW(&__pyx_t_5, 0);
                            __pyx_t_5.memview = NULL;
                            __pyx_t_5.data = NULL;
                        }
                    }
                }
            }
        }
        #if ((defined(__APPLE__) || defined(__OSX__)) && (defined(__GNUC__) && (__GNUC__ > 2 || (__GNUC__ == 2 && (__GNUC_MINOR__ > 95)))))
            #undef likely
            #undef unlikely
            #define likely(x)   __builtin_expect(!!(x), 1)
            #define unlikely(x) __builtin_expect(!!(x), 0)
        #endif
      }

      /* "TTS/tts/utils/monotonic_align/core.pyx":46
 * 
 *   cdef int i
 *   for i in prange(b, nogil=True):             # <<<<<<<<<<<<<<
 *     maximum_path_each(paths[i], values[i], t_xs[i], t_ys[i], max_neg_val)
 */
      /*finally:*/ {
        /*normal exit:*/{
          #ifdef WITH_THREAD
          __Pyx_FastGIL_Forget();
          Py_BLOCK_THREADS
          #endif
          goto __pyx_L5;
        }
        __pyx_L5:;
      }
  }

  /* "TTS/tts/utils/monotonic_align/core.pyx":42
 * @cython.boundscheck(False)
 * @cython.wraparound(False)
 * cpdef void maximum_path_c(int[:,:,::1] paths, float[:,:,::1] values, int[::1] t_xs, int[::1] t_ys, float max_neg_val=-1e9) nogil:             # <<<<<<<<<<<<<<
 *   cdef int b = values.shape[0]
 * 
 */

  /* function exit code */
}

/* Python wrapper */
static PyObject *__pyx_pw_3TTS_3tts_5utils_15monotonic_align_4core_1maximum_path_c(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static PyObject *__pyx_pw_3TTS_3tts_5utils_15monotonic_align_4core_1maximum_path_c(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
  __Pyx_memviewslice __pyx_v_paths = { 0, 0, { 0 }, { 0 }, { 0 } };
  __Pyx_memviewslice __pyx_v_values = { 0, 0, { 0 }, { 0 }, { 0 } };
  __Pyx_memviewslice __pyx_v_t_xs = { 0, 0, { 0 }, { 0 }, { 0 } };
  __Pyx_memviewslice __pyx_v_t_ys = { 0, 0, { 0 }, { 0 }, { 0 } };
  float __pyx_v_max_neg_val;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("maximum_path_c (wrapper)", 0);
  {
    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_paths,&__pyx_n_s_values,&__pyx_n_s_t_xs,&__pyx_n_s_t_ys,&__pyx_n_s_max_neg_val,0};
    PyObject* values[5] = {0,0,0,0,0};
    if (unlikely(__pyx_kwds)) {
      Py_ssize_t kw_args;
      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);
      switch (pos_args) {
        case  5: values[4] = PyTuple_GET_ITEM(__pyx_args, 4);
        CYTHON_FALLTHROUGH;
        case  4: values[3] = PyTuple_GET_ITEM(__pyx_args, 3);
        CYTHON_FALLTHROUGH;
        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        CYTHON_FALLTHROUGH;
        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        CYTHON_FALLTHROUGH;
        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        CYTHON_FALLTHROUGH;
        case  0: break;
        default: goto __pyx_L5_argtuple_error;
      }
      kw_args = PyDict_Size(__pyx_kwds);
      switch (pos_args) {
        case  0:
        if (likely((values[0] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_paths)) != 0)) kw_args--;
        else goto __pyx_L5_argtuple_error;
        CYTHON_FALLTHROUGH;
        case  1:
        if (likely((values[1] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_values)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("maximum_path_c", 0, 4, 5, 1); __PYX_ERR(0, 42, __pyx_L3_error)
        }
        CYTHON_FALLTHROUGH;
        case  2:
        if (likely((values[2] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_t_xs)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("maximum_path_c", 0, 4, 5, 2); __PYX_ERR(0, 42, __pyx_L3_error)
        }
        CYTHON_FALLTHROUGH;
        case  3:
        if (likely((values[3] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_t_ys)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("maximum_path_c", 0, 4, 5, 3); __PYX_ERR(0, 42, __pyx_L3_error)
        }
        CYTHON_FALLTHROUGH;
        case  4:
        if (kw_args > 0) {
          PyObject* value = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_max_neg_val);
          if (value) { values[4] = value; kw_args--; }
        }
      }
      if (unlikely(kw_args > 0)) {
        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "maximum_path_c") < 0)) __PYX_ERR(0, 42, __pyx_L3_error)
      }
    } else {
      switch (PyTuple_GET_SIZE(__pyx_args)) {
        case  5: values[4] = PyTuple_GET_ITEM(__pyx_args, 4);
        CYTHON_FALLTHROUGH;
        case  4: values[3] = PyTuple_GET_ITEM(__pyx_args, 3);
        values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        break;
        default: goto __pyx_L5_argtuple_error;
      }
    }
    __pyx_v_paths = __Pyx_PyObject_to_MemoryviewSlice_d_d_dc_int(values[0], PyBUF_WRITABLE); if (unlikely(!__pyx_v_paths.memview)) __PYX_ERR(0, 42, __pyx_L3_error)
    __pyx_v_values = __Pyx_PyObject_to_MemoryviewSlice_d_d_dc_float(values[1], PyBUF_WRITABLE); if (unlikely(!__pyx_v_values.memview)) __PYX_ERR(0, 42, __pyx_L3_error)
    __pyx_v_t_xs = __Pyx_PyObject_to_MemoryviewSlice_dc_int(values[2], PyBUF_WRITABLE); if (unlikely(!__pyx_v_t_xs.memview)) __PYX_ERR(0, 42, __pyx_L3_error)
    __pyx_v_t_ys = __Pyx_PyObject_to_MemoryviewSlice_dc_int(values[3], PyBUF_WRITABLE); if (unlikely(!__pyx_v_t_ys.memview)) __PYX_ERR(0, 42, __pyx_L3_error)
    if (values[4]) {
      __pyx_v_max_neg_val = __pyx_PyFloat_AsFloat(values[4]); if (unlikely((__pyx_v_max_neg_val == (float)-1) && PyErr_Occurred())) __PYX_ERR(0, 42, __pyx_L3_error)
    } else {
      __pyx_v_max_neg_val = __pyx_k_;
    }
  }
  goto __pyx_L4_argument_unpacking_done;
  __pyx_L5_argtuple_error:;
  __Pyx_RaiseArgtupleInvalid("maximum_path_c", 0, 4, 5, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 42, __pyx_L3_error)
  __pyx_L3_error:;
  __Pyx_AddTraceback("TTS.tts.utils.monotonic_align.core.maximum_path_c", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __Pyx_RefNannyFinishContext();
  return NULL;
  __pyx_L4_argument_unpacking_done:;
  __pyx_r = __pyx_pf_3TTS_3tts_5utils_15monotonic_align_4core_maximum_path_c(__pyx_self, __pyx_v_paths, __pyx_v_values, __pyx_v_t_xs, __pyx_v_t_ys, __pyx_v_max_neg_val);

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_3TTS_3tts_5utils_15monotonic_align_4core_maximum_path_c(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_paths, __Pyx_memviewslice __pyx_v_values, __Pyx_memviewslice __pyx_v_t_xs, __Pyx_memviewslice __pyx_v_t_ys, float __pyx_v_max_neg_val) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  struct __pyx_opt_args_3TTS_3tts_5utils_15monotonic_align_4core_maximum_path_c __pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("maximum_path_c", 0);
  __Pyx_XDECREF(__pyx_r);
  if (unlikely(!__pyx_v_paths.memview)) { __Pyx_RaiseUnboundLocalError("paths"); __PYX_ERR(0, 42, __pyx_L1_error) }
  if (unlikely(!__pyx_v_values.memview)) { __Pyx_RaiseUnboundLocalError("values"); __PYX_ERR(0, 42, __pyx_L1_error) }
  if (unlikely(!__pyx_v_t_xs.memview)) { __Pyx_RaiseUnboundLocalError("t_xs"); __PYX_ERR(0, 42, __pyx_L1_error) }
  if (unlikely(!__pyx_v_t_ys.memview)) { __Pyx_RaiseUnboundLocalError("t_ys"); __PYX_ERR(0, 42, __pyx_L1_error) }
  __pyx_t_1.__pyx_n = 1;
  __pyx_t_1.max_neg_val = __pyx_v_max_neg_val;
  __pyx_f_3TTS_3tts_5utils_15monotonic_align_4core_maximum_path_c(__pyx_v_paths, __pyx_v_values, __pyx_v_t_xs, __pyx_v_t_ys, 0, &__pyx_t_1); 
  __pyx_t_2 = __Pyx_void_to_None(NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 42, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_r = __pyx_t_2;
  __pyx_t_2 = 0;
  goto __pyx_L0;

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_AddTraceback("TTS.tts.utils.monotonic_align.core.maximum_path_c", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __PYX_XDEC_MEMVIEW(&__pyx_v_paths, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_v_values, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_v_t_xs, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_v_t_ys, 1);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":735
 * ctypedef npy_cdouble     complex_t
 * 
 * cdef inline object PyArray_MultiIterNew1(a):             # <<<<<<<<<<<<<<
 *     return PyArray_MultiIterNew(1, <void*>a)
 * 
 */

static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew1(PyObject *__pyx_v_a) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("PyArray_MultiIterNew1", 0);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":736
 * 
 * cdef inline object PyArray_MultiIterNew1(a):
 *     return PyArray_MultiIterNew(1, <void*>a)             # <<<<<<<<<<<<<<
 * 
 * cdef inline object PyArray_MultiIterNew2(a, b):
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = PyArray_MultiIterNew(1, ((void *)__pyx_v_a)); if (unlikely(!__pyx_t_1)) __PYX_ERR(1, 736, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":735
 * ctypedef npy_cdouble     complex_t
 * 
 * cdef inline object PyArray_MultiIterNew1(a):             # <<<<<<<<<<<<<<
 *     return PyArray_MultiIterNew(1, <void*>a)
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("numpy.PyArray_MultiIterNew1", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":738
 *     return PyArray_MultiIterNew(1, <void*>a)
 * 
 * cdef inline object PyArray_MultiIterNew2(a, b):             # <<<<<<<<<<<<<<
 *     return PyArray_MultiIterNew(2, <void*>a, <void*>b)
 * 
 */

static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew2(PyObject *__pyx_v_a, PyObject *__pyx_v_b) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("PyArray_MultiIterNew2", 0);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":739
 * 
 * cdef inline object PyArray_MultiIterNew2(a, b):
 *     return PyArray_MultiIterNew(2, <void*>a, <void*>b)             # <<<<<<<<<<<<<<
 * 
 * cdef inline object PyArray_MultiIterNew3(a, b, c):
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = PyArray_MultiIterNew(2, ((void *)__pyx_v_a), ((void *)__pyx_v_b)); if (unlikely(!__pyx_t_1)) __PYX_ERR(1, 739, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":738
 *     return PyArray_MultiIterNew(1, <void*>a)
 * 
 * cdef inline object PyArray_MultiIterNew2(a, b):             # <<<<<<<<<<<<<<
 *     return PyArray_MultiIterNew(2, <void*>a, <void*>b)
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("numpy.PyArray_MultiIterNew2", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":741
 *     return PyArray_MultiIterNew(2, <void*>a, <void*>b)
 * 
 * cdef inline object PyArray_MultiIterNew3(a, b, c):             # <<<<<<<<<<<<<<
 *     return PyArray_MultiIterNew(3, <void*>a, <void*>b, <void*> c)
 * 
 */

static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew3(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("PyArray_MultiIterNew3", 0);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":742
 * 
 * cdef inline object PyArray_MultiIterNew3(a, b, c):
 *     return PyArray_MultiIterNew(3, <void*>a, <void*>b, <void*> c)             # <<<<<<<<<<<<<<
 * 
 * cdef inline object PyArray_MultiIterNew4(a, b, c, d):
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = PyArray_MultiIterNew(3, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c)); if (unlikely(!__pyx_t_1)) __PYX_ERR(1, 742, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":741
 *     return PyArray_MultiIterNew(2, <void*>a, <void*>b)
 * 
 * cdef inline object PyArray_MultiIterNew3(a, b, c):             # <<<<<<<<<<<<<<
 *     return PyArray_MultiIterNew(3, <void*>a, <void*>b, <void*> c)
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("numpy.PyArray_MultiIterNew3", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":744
 *     return PyArray_MultiIterNew(3, <void*>a, <void*>b, <void*> c)
 * 
 * cdef inline object PyArray_MultiIterNew4(a, b, c, d):             # <<<<<<<<<<<<<<
 *     return PyArray_MultiIterNew(4, <void*>a, <void*>b, <void*>c, <void*> d)
 * 
 */

static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew4(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c, PyObject *__pyx_v_d) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("PyArray_MultiIterNew4", 0);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":745
 * 
 * cdef inline object PyArray_MultiIterNew4(a, b, c, d):
 *     return PyArray_MultiIterNew(4, <void*>a, <void*>b, <void*>c, <void*> d)             # <<<<<<<<<<<<<<
 * 
 * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e):
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = PyArray_MultiIterNew(4, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c), ((void *)__pyx_v_d)); if (unlikely(!__pyx_t_1)) __PYX_ERR(1, 745, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":744
 *     return PyArray_MultiIterNew(3, <void*>a, <void*>b, <void*> c)
 * 
 * cdef inline object PyArray_MultiIterNew4(a, b, c, d):             # <<<<<<<<<<<<<<
 *     return PyArray_MultiIterNew(4, <void*>a, <void*>b, <void*>c, <void*> d)
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("numpy.PyArray_MultiIterNew4", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":747
 *     return PyArray_MultiIterNew(4, <void*>a, <void*>b, <void*>c, <void*> d)
 * 
 * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e):             # <<<<<<<<<<<<<<
 *     return PyArray_MultiIterNew(5, <void*>a, <void*>b, <void*>c, <void*> d, <void*> e)
 * 
 */

static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew5(PyObject *__pyx_v_a, PyObject *__pyx_v_b, PyObject *__pyx_v_c, PyObject *__pyx_v_d, PyObject *__pyx_v_e) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("PyArray_MultiIterNew5", 0);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":748
 * 
 * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e):
 *     return PyArray_MultiIterNew(5, <void*>a, <void*>b, <void*>c, <void*> d, <void*> e)             # <<<<<<<<<<<<<<
 * 
 * cdef inline tuple PyDataType_SHAPE(dtype d):
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = PyArray_MultiIterNew(5, ((void *)__pyx_v_a), ((void *)__pyx_v_b), ((void *)__pyx_v_c), ((void *)__pyx_v_d), ((void *)__pyx_v_e)); if (unlikely(!__pyx_t_1)) __PYX_ERR(1, 748, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":747
 *     return PyArray_MultiIterNew(4, <void*>a, <void*>b, <void*>c, <void*> d)
 * 
 * cdef inline object PyArray_MultiIterNew5(a, b, c, d, e):             # <<<<<<<<<<<<<<
 *     return PyArray_MultiIterNew(5, <void*>a, <void*>b, <void*>c, <void*> d, <void*> e)
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("numpy.PyArray_MultiIterNew5", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":750
 *     return PyArray_MultiIterNew(5, <void*>a, <void*>b, <void*>c, <void*> d, <void*> e)
 * 
 * cdef inline tuple PyDataType_SHAPE(dtype d):             # <<<<<<<<<<<<<<
 *     if PyDataType_HASSUBARRAY(d):
 *         return <tuple>d.subarray.shape
 */

static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyDataType_SHAPE(PyArray_Descr *__pyx_v_d) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  __Pyx_RefNannySetupContext("PyDataType_SHAPE", 0);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":751
 * 
 * cdef inline tuple PyDataType_SHAPE(dtype d):
 *     if PyDataType_HASSUBARRAY(d):             # <<<<<<<<<<<<<<
 *         return <tuple>d.subarray.shape
 *     else:
 */
  __pyx_t_1 = (PyDataType_HASSUBARRAY(__pyx_v_d) != 0);
  if (__pyx_t_1) {

    /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":752
 * cdef inline tuple PyDataType_SHAPE(dtype d):
 *     if PyDataType_HASSUBARRAY(d):
 *         return <tuple>d.subarray.shape             # <<<<<<<<<<<<<<
 *     else:
 *         return ()
 */
    __Pyx_XDECREF(__pyx_r);
    __Pyx_INCREF(((PyObject*)__pyx_v_d->subarray->shape));
    __pyx_r = ((PyObject*)__pyx_v_d->subarray->shape);
    goto __pyx_L0;

    /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":751
 * 
 * cdef inline tuple PyDataType_SHAPE(dtype d):
 *     if PyDataType_HASSUBARRAY(d):             # <<<<<<<<<<<<<<
 *         return <tuple>d.subarray.shape
 *     else:
 */
  }

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":754
 *         return <tuple>d.subarray.shape
 *     else:
 *         return ()             # <<<<<<<<<<<<<<
 * 
 * 
 */
  /*else*/ {
    __Pyx_XDECREF(__pyx_r);
    __Pyx_INCREF(__pyx_empty_tuple);
    __pyx_r = __pyx_empty_tuple;
    goto __pyx_L0;
  }

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":750
 *     return PyArray_MultiIterNew(5, <void*>a, <void*>b, <void*>c, <void*> d, <void*> e)
 * 
 * cdef inline tuple PyDataType_SHAPE(dtype d):             # <<<<<<<<<<<<<<
 *     if PyDataType_HASSUBARRAY(d):
 *         return <tuple>d.subarray.shape
 */

  /* function exit code */
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":929
 *     int _import_umath() except -1
 * 
 * cdef inline void set_array_base(ndarray arr, object base):             # <<<<<<<<<<<<<<
 *     Py_INCREF(base) # important to do this before stealing the reference below!
 *     PyArray_SetBaseObject(arr, base)
 */

static CYTHON_INLINE void __pyx_f_5numpy_set_array_base(PyArrayObject *__pyx_v_arr, PyObject *__pyx_v_base) {
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("set_array_base", 0);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":930
 * 
 * cdef inline void set_array_base(ndarray arr, object base):
 *     Py_INCREF(base) # important to do this before stealing the reference below!             # <<<<<<<<<<<<<<
 *     PyArray_SetBaseObject(arr, base)
 * 
 */
  Py_INCREF(__pyx_v_base);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":931
 * cdef inline void set_array_base(ndarray arr, object base):
 *     Py_INCREF(base) # important to do this before stealing the reference below!
 *     PyArray_SetBaseObject(arr, base)             # <<<<<<<<<<<<<<
 * 
 * cdef inline object get_array_base(ndarray arr):
 */
  (void)(PyArray_SetBaseObject(__pyx_v_arr, __pyx_v_base));

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":929
 *     int _import_umath() except -1
 * 
 * cdef inline void set_array_base(ndarray arr, object base):             # <<<<<<<<<<<<<<
 *     Py_INCREF(base) # important to do this before stealing the reference below!
 *     PyArray_SetBaseObject(arr, base)
 */

  /* function exit code */
  __Pyx_RefNannyFinishContext();
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":933
 *     PyArray_SetBaseObject(arr, base)
 * 
 * cdef inline object get_array_base(ndarray arr):             # <<<<<<<<<<<<<<
 *     base = PyArray_BASE(arr)
 *     if base is NULL:
 */

static CYTHON_INLINE PyObject *__pyx_f_5numpy_get_array_base(PyArrayObject *__pyx_v_arr) {
  PyObject *__pyx_v_base;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  __Pyx_RefNannySetupContext("get_array_base", 0);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":934
 * 
 * cdef inline object get_array_base(ndarray arr):
 *     base = PyArray_BASE(arr)             # <<<<<<<<<<<<<<
 *     if base is NULL:
 *         return None
 */
  __pyx_v_base = PyArray_BASE(__pyx_v_arr);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":935
 * cdef inline object get_array_base(ndarray arr):
 *     base = PyArray_BASE(arr)
 *     if base is NULL:             # <<<<<<<<<<<<<<
 *         return None
 *     return <object>base
 */
  __pyx_t_1 = ((__pyx_v_base == NULL) != 0);
  if (__pyx_t_1) {

    /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":936
 *     base = PyArray_BASE(arr)
 *     if base is NULL:
 *         return None             # <<<<<<<<<<<<<<
 *     return <object>base
 * 
 */
    __Pyx_XDECREF(__pyx_r);
    __pyx_r = Py_None; __Pyx_INCREF(Py_None);
    goto __pyx_L0;

    /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":935
 * cdef inline object get_array_base(ndarray arr):
 *     base = PyArray_BASE(arr)
 *     if base is NULL:             # <<<<<<<<<<<<<<
 *         return None
 *     return <object>base
 */
  }

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":937
 *     if base is NULL:
 *         return None
 *     return <object>base             # <<<<<<<<<<<<<<
 * 
 * # Versions of the import_* functions which are more suitable for
 */
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(((PyObject *)__pyx_v_base));
  __pyx_r = ((PyObject *)__pyx_v_base);
  goto __pyx_L0;

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":933
 *     PyArray_SetBaseObject(arr, base)
 * 
 * cdef inline object get_array_base(ndarray arr):             # <<<<<<<<<<<<<<
 *     base = PyArray_BASE(arr)
 *     if base is NULL:
 */

  /* function exit code */
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":941
 * # Versions of the import_* functions which are more suitable for
 * # Cython code.
 * cdef inline int import_array() except -1:             # <<<<<<<<<<<<<<
 *     try:
 *         __pyx_import_array()
 */

static CYTHON_INLINE int __pyx_f_5numpy_import_array(void) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  int __pyx_t_4;
  PyObject *__pyx_t_5 = NULL;
  PyObject *__pyx_t_6 = NULL;
  PyObject *__pyx_t_7 = NULL;
  PyObject *__pyx_t_8 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("import_array", 0);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":942
 * # Cython code.
 * cdef inline int import_array() except -1:
 *     try:             # <<<<<<<<<<<<<<
 *         __pyx_import_array()
 *     except Exception:
 */
  {
    __Pyx_PyThreadState_declare
    __Pyx_PyThreadState_assign
    __Pyx_ExceptionSave(&__pyx_t_1, &__pyx_t_2, &__pyx_t_3);
    __Pyx_XGOTREF(__pyx_t_1);
    __Pyx_XGOTREF(__pyx_t_2);
    __Pyx_XGOTREF(__pyx_t_3);
    /*try:*/ {

      /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":943
 * cdef inline int import_array() except -1:
 *     try:
 *         __pyx_import_array()             # <<<<<<<<<<<<<<
 *     except Exception:
 *         raise ImportError("numpy.core.multiarray failed to import")
 */
      __pyx_t_4 = _import_array(); if (unlikely(__pyx_t_4 == ((int)-1))) __PYX_ERR(1, 943, __pyx_L3_error)

      /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":942
 * # Cython code.
 * cdef inline int import_array() except -1:
 *     try:             # <<<<<<<<<<<<<<
 *         __pyx_import_array()
 *     except Exception:
 */
    }
    __Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
    __Pyx_XDECREF(__pyx_t_2); __pyx_t_2 = 0;
    __Pyx_XDECREF(__pyx_t_3); __pyx_t_3 = 0;
    goto __pyx_L8_try_end;
    __pyx_L3_error:;

    /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":944
 *     try:
 *         __pyx_import_array()
 *     except Exception:             # <<<<<<<<<<<<<<
 *         raise ImportError("numpy.core.multiarray failed to import")
 * 
 */
    __pyx_t_4 = __Pyx_PyErr_ExceptionMatches(((PyObject *)(&((PyTypeObject*)PyExc_Exception)[0])));
    if (__pyx_t_4) {
      __Pyx_AddTraceback("numpy.import_array", __pyx_clineno, __pyx_lineno, __pyx_filename);
      if (__Pyx_GetException(&__pyx_t_5, &__pyx_t_6, &__pyx_t_7) < 0) __PYX_ERR(1, 944, __pyx_L5_except_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_GOTREF(__pyx_t_6);
      __Pyx_GOTREF(__pyx_t_7);

      /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":945
 *         __pyx_import_array()
 *     except Exception:
 *         raise ImportError("numpy.core.multiarray failed to import")             # <<<<<<<<<<<<<<
 * 
 * cdef inline int import_umath() except -1:
 */
      __pyx_t_8 = __Pyx_PyObject_Call(__pyx_builtin_ImportError, __pyx_tuple__2, NULL); if (unlikely(!__pyx_t_8)) __PYX_ERR(1, 945, __pyx_L5_except_error)
      __Pyx_GOTREF(__pyx_t_8);
      __Pyx_Raise(__pyx_t_8, 0, 0, 0);
      __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0;
      __PYX_ERR(1, 945, __pyx_L5_except_error)
    }
    goto __pyx_L5_except_error;
    __pyx_L5_except_error:;

    /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":942
 * # Cython code.
 * cdef inline int import_array() except -1:
 *     try:             # <<<<<<<<<<<<<<
 *         __pyx_import_array()
 *     except Exception:
 */
    __Pyx_XGIVEREF(__pyx_t_1);
    __Pyx_XGIVEREF(__pyx_t_2);
    __Pyx_XGIVEREF(__pyx_t_3);
    __Pyx_ExceptionReset(__pyx_t_1, __pyx_t_2, __pyx_t_3);
    goto __pyx_L1_error;
    __pyx_L8_try_end:;
  }

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":941
 * # Versions of the import_* functions which are more suitable for
 * # Cython code.
 * cdef inline int import_array() except -1:             # <<<<<<<<<<<<<<
 *     try:
 *         __pyx_import_array()
 */

  /* function exit code */
  __pyx_r = 0;
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_XDECREF(__pyx_t_6);
  __Pyx_XDECREF(__pyx_t_7);
  __Pyx_XDECREF(__pyx_t_8);
  __Pyx_AddTraceback("numpy.import_array", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = -1;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":947
 *         raise ImportError("numpy.core.multiarray failed to import")
 * 
 * cdef inline int import_umath() except -1:             # <<<<<<<<<<<<<<
 *     try:
 *         _import_umath()
 */

static CYTHON_INLINE int __pyx_f_5numpy_import_umath(void) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  int __pyx_t_4;
  PyObject *__pyx_t_5 = NULL;
  PyObject *__pyx_t_6 = NULL;
  PyObject *__pyx_t_7 = NULL;
  PyObject *__pyx_t_8 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("import_umath", 0);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":948
 * 
 * cdef inline int import_umath() except -1:
 *     try:             # <<<<<<<<<<<<<<
 *         _import_umath()
 *     except Exception:
 */
  {
    __Pyx_PyThreadState_declare
    __Pyx_PyThreadState_assign
    __Pyx_ExceptionSave(&__pyx_t_1, &__pyx_t_2, &__pyx_t_3);
    __Pyx_XGOTREF(__pyx_t_1);
    __Pyx_XGOTREF(__pyx_t_2);
    __Pyx_XGOTREF(__pyx_t_3);
    /*try:*/ {

      /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":949
 * cdef inline int import_umath() except -1:
 *     try:
 *         _import_umath()             # <<<<<<<<<<<<<<
 *     except Exception:
 *         raise ImportError("numpy.core.umath failed to import")
 */
      __pyx_t_4 = _import_umath(); if (unlikely(__pyx_t_4 == ((int)-1))) __PYX_ERR(1, 949, __pyx_L3_error)

      /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":948
 * 
 * cdef inline int import_umath() except -1:
 *     try:             # <<<<<<<<<<<<<<
 *         _import_umath()
 *     except Exception:
 */
    }
    __Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
    __Pyx_XDECREF(__pyx_t_2); __pyx_t_2 = 0;
    __Pyx_XDECREF(__pyx_t_3); __pyx_t_3 = 0;
    goto __pyx_L8_try_end;
    __pyx_L3_error:;

    /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":950
 *     try:
 *         _import_umath()
 *     except Exception:             # <<<<<<<<<<<<<<
 *         raise ImportError("numpy.core.umath failed to import")
 * 
 */
    __pyx_t_4 = __Pyx_PyErr_ExceptionMatches(((PyObject *)(&((PyTypeObject*)PyExc_Exception)[0])));
    if (__pyx_t_4) {
      __Pyx_AddTraceback("numpy.import_umath", __pyx_clineno, __pyx_lineno, __pyx_filename);
      if (__Pyx_GetException(&__pyx_t_5, &__pyx_t_6, &__pyx_t_7) < 0) __PYX_ERR(1, 950, __pyx_L5_except_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_GOTREF(__pyx_t_6);
      __Pyx_GOTREF(__pyx_t_7);

      /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":951
 *         _import_umath()
 *     except Exception:
 *         raise ImportError("numpy.core.umath failed to import")             # <<<<<<<<<<<<<<
 * 
 * cdef inline int import_ufunc() except -1:
 */
      __pyx_t_8 = __Pyx_PyObject_Call(__pyx_builtin_ImportError, __pyx_tuple__3, NULL); if (unlikely(!__pyx_t_8)) __PYX_ERR(1, 951, __pyx_L5_except_error)
      __Pyx_GOTREF(__pyx_t_8);
      __Pyx_Raise(__pyx_t_8, 0, 0, 0);
      __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0;
      __PYX_ERR(1, 951, __pyx_L5_except_error)
    }
    goto __pyx_L5_except_error;
    __pyx_L5_except_error:;

    /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":948
 * 
 * cdef inline int import_umath() except -1:
 *     try:             # <<<<<<<<<<<<<<
 *         _import_umath()
 *     except Exception:
 */
    __Pyx_XGIVEREF(__pyx_t_1);
    __Pyx_XGIVEREF(__pyx_t_2);
    __Pyx_XGIVEREF(__pyx_t_3);
    __Pyx_ExceptionReset(__pyx_t_1, __pyx_t_2, __pyx_t_3);
    goto __pyx_L1_error;
    __pyx_L8_try_end:;
  }

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":947
 *         raise ImportError("numpy.core.multiarray failed to import")
 * 
 * cdef inline int import_umath() except -1:             # <<<<<<<<<<<<<<
 *     try:
 *         _import_umath()
 */

  /* function exit code */
  __pyx_r = 0;
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_XDECREF(__pyx_t_6);
  __Pyx_XDECREF(__pyx_t_7);
  __Pyx_XDECREF(__pyx_t_8);
  __Pyx_AddTraceback("numpy.import_umath", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = -1;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":953
 *         raise ImportError("numpy.core.umath failed to import")
 * 
 * cdef inline int import_ufunc() except -1:             # <<<<<<<<<<<<<<
 *     try:
 *         _import_umath()
 */

static CYTHON_INLINE int __pyx_f_5numpy_import_ufunc(void) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  int __pyx_t_4;
  PyObject *__pyx_t_5 = NULL;
  PyObject *__pyx_t_6 = NULL;
  PyObject *__pyx_t_7 = NULL;
  PyObject *__pyx_t_8 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("import_ufunc", 0);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":954
 * 
 * cdef inline int import_ufunc() except -1:
 *     try:             # <<<<<<<<<<<<<<
 *         _import_umath()
 *     except Exception:
 */
  {
    __Pyx_PyThreadState_declare
    __Pyx_PyThreadState_assign
    __Pyx_ExceptionSave(&__pyx_t_1, &__pyx_t_2, &__pyx_t_3);
    __Pyx_XGOTREF(__pyx_t_1);
    __Pyx_XGOTREF(__pyx_t_2);
    __Pyx_XGOTREF(__pyx_t_3);
    /*try:*/ {

      /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":955
 * cdef inline int import_ufunc() except -1:
 *     try:
 *         _import_umath()             # <<<<<<<<<<<<<<
 *     except Exception:
 *         raise ImportError("numpy.core.umath failed to import")
 */
      __pyx_t_4 = _import_umath(); if (unlikely(__pyx_t_4 == ((int)-1))) __PYX_ERR(1, 955, __pyx_L3_error)

      /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":954
 * 
 * cdef inline int import_ufunc() except -1:
 *     try:             # <<<<<<<<<<<<<<
 *         _import_umath()
 *     except Exception:
 */
    }
    __Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
    __Pyx_XDECREF(__pyx_t_2); __pyx_t_2 = 0;
    __Pyx_XDECREF(__pyx_t_3); __pyx_t_3 = 0;
    goto __pyx_L8_try_end;
    __pyx_L3_error:;

    /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":956
 *     try:
 *         _import_umath()
 *     except Exception:             # <<<<<<<<<<<<<<
 *         raise ImportError("numpy.core.umath failed to import")
 * 
 */
    __pyx_t_4 = __Pyx_PyErr_ExceptionMatches(((PyObject *)(&((PyTypeObject*)PyExc_Exception)[0])));
    if (__pyx_t_4) {
      __Pyx_AddTraceback("numpy.import_ufunc", __pyx_clineno, __pyx_lineno, __pyx_filename);
      if (__Pyx_GetException(&__pyx_t_5, &__pyx_t_6, &__pyx_t_7) < 0) __PYX_ERR(1, 956, __pyx_L5_except_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_GOTREF(__pyx_t_6);
      __Pyx_GOTREF(__pyx_t_7);

      /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":957
 *         _import_umath()
 *     except Exception:
 *         raise ImportError("numpy.core.umath failed to import")             # <<<<<<<<<<<<<<
 * 
 * cdef extern from *:
 */
      __pyx_t_8 = __Pyx_PyObject_Call(__pyx_builtin_ImportError, __pyx_tuple__3, NULL); if (unlikely(!__pyx_t_8)) __PYX_ERR(1, 957, __pyx_L5_except_error)
      __Pyx_GOTREF(__pyx_t_8);
      __Pyx_Raise(__pyx_t_8, 0, 0, 0);
      __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0;
      __PYX_ERR(1, 957, __pyx_L5_except_error)
    }
    goto __pyx_L5_except_error;
    __pyx_L5_except_error:;

    /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":954
 * 
 * cdef inline int import_ufunc() except -1:
 *     try:             # <<<<<<<<<<<<<<
 *         _import_umath()
 *     except Exception:
 */
    __Pyx_XGIVEREF(__pyx_t_1);
    __Pyx_XGIVEREF(__pyx_t_2);
    __Pyx_XGIVEREF(__pyx_t_3);
    __Pyx_ExceptionReset(__pyx_t_1, __pyx_t_2, __pyx_t_3);
    goto __pyx_L1_error;
    __pyx_L8_try_end:;
  }

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":953
 *         raise ImportError("numpy.core.umath failed to import")
 * 
 * cdef inline int import_ufunc() except -1:             # <<<<<<<<<<<<<<
 *     try:
 *         _import_umath()
 */

  /* function exit code */
  __pyx_r = 0;
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_XDECREF(__pyx_t_6);
  __Pyx_XDECREF(__pyx_t_7);
  __Pyx_XDECREF(__pyx_t_8);
  __Pyx_AddTraceback("numpy.import_ufunc", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = -1;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":967
 * 
 * 
 * cdef inline bint is_timedelta64_object(object obj):             # <<<<<<<<<<<<<<
 *     """
 *     Cython equivalent of `isinstance(obj, np.timedelta64)`
 */

static CYTHON_INLINE int __pyx_f_5numpy_is_timedelta64_object(PyObject *__pyx_v_obj) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("is_timedelta64_object", 0);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":979
 *     bool
 *     """
 *     return PyObject_TypeCheck(obj, &PyTimedeltaArrType_Type)             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_r = PyObject_TypeCheck(__pyx_v_obj, (&PyTimedeltaArrType_Type));
  goto __pyx_L0;

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":967
 * 
 * 
 * cdef inline bint is_timedelta64_object(object obj):             # <<<<<<<<<<<<<<
 *     """
 *     Cython equivalent of `isinstance(obj, np.timedelta64)`
 */

  /* function exit code */
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":982
 * 
 * 
 * cdef inline bint is_datetime64_object(object obj):             # <<<<<<<<<<<<<<
 *     """
 *     Cython equivalent of `isinstance(obj, np.datetime64)`
 */

static CYTHON_INLINE int __pyx_f_5numpy_is_datetime64_object(PyObject *__pyx_v_obj) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("is_datetime64_object", 0);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":994
 *     bool
 *     """
 *     return PyObject_TypeCheck(obj, &PyDatetimeArrType_Type)             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_r = PyObject_TypeCheck(__pyx_v_obj, (&PyDatetimeArrType_Type));
  goto __pyx_L0;

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":982
 * 
 * 
 * cdef inline bint is_datetime64_object(object obj):             # <<<<<<<<<<<<<<
 *     """
 *     Cython equivalent of `isinstance(obj, np.datetime64)`
 */

  /* function exit code */
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":997
 * 
 * 
 * cdef inline npy_datetime get_datetime64_value(object obj) nogil:             # <<<<<<<<<<<<<<
 *     """
 *     returns the int64 value underlying scalar numpy datetime64 object
 */

static CYTHON_INLINE npy_datetime __pyx_f_5numpy_get_datetime64_value(PyObject *__pyx_v_obj) {
  npy_datetime __pyx_r;

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":1004
 *     also needed.  That can be found using `get_datetime64_unit`.
 *     """
 *     return (<PyDatetimeScalarObject*>obj).obval             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_r = ((PyDatetimeScalarObject *)__pyx_v_obj)->obval;
  goto __pyx_L0;

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":997
 * 
 * 
 * cdef inline npy_datetime get_datetime64_value(object obj) nogil:             # <<<<<<<<<<<<<<
 *     """
 *     returns the int64 value underlying scalar numpy datetime64 object
 */

  /* function exit code */
  __pyx_L0:;
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":1007
 * 
 * 
 * cdef inline npy_timedelta get_timedelta64_value(object obj) nogil:             # <<<<<<<<<<<<<<
 *     """
 *     returns the int64 value underlying scalar numpy timedelta64 object
 */

static CYTHON_INLINE npy_timedelta __pyx_f_5numpy_get_timedelta64_value(PyObject *__pyx_v_obj) {
  npy_timedelta __pyx_r;

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":1011
 *     returns the int64 value underlying scalar numpy timedelta64 object
 *     """
 *     return (<PyTimedeltaScalarObject*>obj).obval             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_r = ((PyTimedeltaScalarObject *)__pyx_v_obj)->obval;
  goto __pyx_L0;

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":1007
 * 
 * 
 * cdef inline npy_timedelta get_timedelta64_value(object obj) nogil:             # <<<<<<<<<<<<<<
 *     """
 *     returns the int64 value underlying scalar numpy timedelta64 object
 */

  /* function exit code */
  __pyx_L0:;
  return __pyx_r;
}

/* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":1014
 * 
 * 
 * cdef inline NPY_DATETIMEUNIT get_datetime64_unit(object obj) nogil:             # <<<<<<<<<<<<<<
 *     """
 *     returns the unit part of the dtype for a numpy datetime64 object.
 */

static CYTHON_INLINE NPY_DATETIMEUNIT __pyx_f_5numpy_get_datetime64_unit(PyObject *__pyx_v_obj) {
  NPY_DATETIMEUNIT __pyx_r;

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":1018
 *     returns the unit part of the dtype for a numpy datetime64 object.
 *     """
 *     return <NPY_DATETIMEUNIT>(<PyDatetimeScalarObject*>obj).obmeta.base             # <<<<<<<<<<<<<<
 */
  __pyx_r = ((NPY_DATETIMEUNIT)((PyDatetimeScalarObject *)__pyx_v_obj)->obmeta.base);
  goto __pyx_L0;

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":1014
 * 
 * 
 * cdef inline NPY_DATETIMEUNIT get_datetime64_unit(object obj) nogil:             # <<<<<<<<<<<<<<
 *     """
 *     returns the unit part of the dtype for a numpy datetime64 object.
 */

  /* function exit code */
  __pyx_L0:;
  return __pyx_r;
}

/* "View.MemoryView":122
 *         cdef bint dtype_is_object
 * 
 *     def __cinit__(array self, tuple shape, Py_ssize_t itemsize, format not None,             # <<<<<<<<<<<<<<
 *                   mode="c", bint allocate_buffer=True):
 * 
 */

/* Python wrapper */
static int __pyx_array___cinit__(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static int __pyx_array___cinit__(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
  PyObject *__pyx_v_shape = 0;
  Py_ssize_t __pyx_v_itemsize;
  PyObject *__pyx_v_format = 0;
  PyObject *__pyx_v_mode = 0;
  int __pyx_v_allocate_buffer;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__cinit__ (wrapper)", 0);
  {
    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_shape,&__pyx_n_s_itemsize,&__pyx_n_s_format,&__pyx_n_s_mode,&__pyx_n_s_allocate_buffer,0};
    PyObject* values[5] = {0,0,0,0,0};
    values[3] = ((PyObject *)__pyx_n_s_c);
    if (unlikely(__pyx_kwds)) {
      Py_ssize_t kw_args;
      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);
      switch (pos_args) {
        case  5: values[4] = PyTuple_GET_ITEM(__pyx_args, 4);
        CYTHON_FALLTHROUGH;
        case  4: values[3] = PyTuple_GET_ITEM(__pyx_args, 3);
        CYTHON_FALLTHROUGH;
        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        CYTHON_FALLTHROUGH;
        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        CYTHON_FALLTHROUGH;
        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        CYTHON_FALLTHROUGH;
        case  0: break;
        default: goto __pyx_L5_argtuple_error;
      }
      kw_args = PyDict_Size(__pyx_kwds);
      switch (pos_args) {
        case  0:
        if (likely((values[0] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_shape)) != 0)) kw_args--;
        else goto __pyx_L5_argtuple_error;
        CYTHON_FALLTHROUGH;
        case  1:
        if (likely((values[1] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_itemsize)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("__cinit__", 0, 3, 5, 1); __PYX_ERR(2, 122, __pyx_L3_error)
        }
        CYTHON_FALLTHROUGH;
        case  2:
        if (likely((values[2] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_format)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("__cinit__", 0, 3, 5, 2); __PYX_ERR(2, 122, __pyx_L3_error)
        }
        CYTHON_FALLTHROUGH;
        case  3:
        if (kw_args > 0) {
          PyObject* value = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_mode);
          if (value) { values[3] = value; kw_args--; }
        }
        CYTHON_FALLTHROUGH;
        case  4:
        if (kw_args > 0) {
          PyObject* value = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_allocate_buffer);
          if (value) { values[4] = value; kw_args--; }
        }
      }
      if (unlikely(kw_args > 0)) {
        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "__cinit__") < 0)) __PYX_ERR(2, 122, __pyx_L3_error)
      }
    } else {
      switch (PyTuple_GET_SIZE(__pyx_args)) {
        case  5: values[4] = PyTuple_GET_ITEM(__pyx_args, 4);
        CYTHON_FALLTHROUGH;
        case  4: values[3] = PyTuple_GET_ITEM(__pyx_args, 3);
        CYTHON_FALLTHROUGH;
        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        break;
        default: goto __pyx_L5_argtuple_error;
      }
    }
    __pyx_v_shape = ((PyObject*)values[0]);
    __pyx_v_itemsize = __Pyx_PyIndex_AsSsize_t(values[1]); if (unlikely((__pyx_v_itemsize == (Py_ssize_t)-1) && PyErr_Occurred())) __PYX_ERR(2, 122, __pyx_L3_error)
    __pyx_v_format = values[2];
    __pyx_v_mode = values[3];
    if (values[4]) {
      __pyx_v_allocate_buffer = __Pyx_PyObject_IsTrue(values[4]); if (unlikely((__pyx_v_allocate_buffer == (int)-1) && PyErr_Occurred())) __PYX_ERR(2, 123, __pyx_L3_error)
    } else {

      /* "View.MemoryView":123
 * 
 *     def __cinit__(array self, tuple shape, Py_ssize_t itemsize, format not None,
 *                   mode="c", bint allocate_buffer=True):             # <<<<<<<<<<<<<<
 * 
 *         cdef int idx
 */
      __pyx_v_allocate_buffer = ((int)1);
    }
  }
  goto __pyx_L4_argument_unpacking_done;
  __pyx_L5_argtuple_error:;
  __Pyx_RaiseArgtupleInvalid("__cinit__", 0, 3, 5, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(2, 122, __pyx_L3_error)
  __pyx_L3_error:;
  __Pyx_AddTraceback("View.MemoryView.array.__cinit__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __Pyx_RefNannyFinishContext();
  return -1;
  __pyx_L4_argument_unpacking_done:;
  if (unlikely(!__Pyx_ArgTypeTest(((PyObject *)__pyx_v_shape), (&PyTuple_Type), 1, "shape", 1))) __PYX_ERR(2, 122, __pyx_L1_error)
  if (unlikely(((PyObject *)__pyx_v_format) == Py_None)) {
    PyErr_Format(PyExc_TypeError, "Argument '%.200s' must not be None", "format"); __PYX_ERR(2, 122, __pyx_L1_error)
  }
  __pyx_r = __pyx_array___pyx_pf_15View_dot_MemoryView_5array___cinit__(((struct __pyx_array_obj *)__pyx_v_self), __pyx_v_shape, __pyx_v_itemsize, __pyx_v_format, __pyx_v_mode, __pyx_v_allocate_buffer);

  /* "View.MemoryView":122
 *         cdef bint dtype_is_object
 * 
 *     def __cinit__(array self, tuple shape, Py_ssize_t itemsize, format not None,             # <<<<<<<<<<<<<<
 *                   mode="c", bint allocate_buffer=True):
 * 
 */

  /* function exit code */
  goto __pyx_L0;
  __pyx_L1_error:;
  __pyx_r = -1;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static int __pyx_array___pyx_pf_15View_dot_MemoryView_5array___cinit__(struct __pyx_array_obj *__pyx_v_self, PyObject *__pyx_v_shape, Py_ssize_t __pyx_v_itemsize, PyObject *__pyx_v_format, PyObject *__pyx_v_mode, int __pyx_v_allocate_buffer) {
  int __pyx_v_idx;
  Py_ssize_t __pyx_v_i;
  Py_ssize_t __pyx_v_dim;
  PyObject **__pyx_v_p;
  char __pyx_v_order;
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  Py_ssize_t __pyx_t_1;
  int __pyx_t_2;
  PyObject *__pyx_t_3 = NULL;
  int __pyx_t_4;
  PyObject *__pyx_t_5 = NULL;
  PyObject *__pyx_t_6 = NULL;
  char *__pyx_t_7;
  int __pyx_t_8;
  Py_ssize_t __pyx_t_9;
  PyObject *__pyx_t_10 = NULL;
  Py_ssize_t __pyx_t_11;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__cinit__", 0);
  __Pyx_INCREF(__pyx_v_format);

  /* "View.MemoryView":129
 *         cdef PyObject **p
 * 
 *         self.ndim = <int> len(shape)             # <<<<<<<<<<<<<<
 *         self.itemsize = itemsize
 * 
 */
  if (unlikely(__pyx_v_shape == Py_None)) {
    PyErr_SetString(PyExc_TypeError, "object of type 'NoneType' has no len()");
    __PYX_ERR(2, 129, __pyx_L1_error)
  }
  __pyx_t_1 = PyTuple_GET_SIZE(__pyx_v_shape); if (unlikely(__pyx_t_1 == ((Py_ssize_t)-1))) __PYX_ERR(2, 129, __pyx_L1_error)
  __pyx_v_self->ndim = ((int)__pyx_t_1);

  /* "View.MemoryView":130
 * 
 *         self.ndim = <int> len(shape)
 *         self.itemsize = itemsize             # <<<<<<<<<<<<<<
 * 
 *         if not self.ndim:
 */
  __pyx_v_self->itemsize = __pyx_v_itemsize;

  /* "View.MemoryView":132
 *         self.itemsize = itemsize
 * 
 *         if not self.ndim:             # <<<<<<<<<<<<<<
 *             raise ValueError("Empty shape tuple for cython.array")
 * 
 */
  __pyx_t_2 = ((!(__pyx_v_self->ndim != 0)) != 0);
  if (unlikely(__pyx_t_2)) {

    /* "View.MemoryView":133
 * 
 *         if not self.ndim:
 *             raise ValueError("Empty shape tuple for cython.array")             # <<<<<<<<<<<<<<
 * 
 *         if itemsize <= 0:
 */
    __pyx_t_3 = __Pyx_PyObject_Call(__pyx_builtin_ValueError, __pyx_tuple__4, NULL); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 133, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_Raise(__pyx_t_3, 0, 0, 0);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __PYX_ERR(2, 133, __pyx_L1_error)

    /* "View.MemoryView":132
 *         self.itemsize = itemsize
 * 
 *         if not self.ndim:             # <<<<<<<<<<<<<<
 *             raise ValueError("Empty shape tuple for cython.array")
 * 
 */
  }

  /* "View.MemoryView":135
 *             raise ValueError("Empty shape tuple for cython.array")
 * 
 *         if itemsize <= 0:             # <<<<<<<<<<<<<<
 *             raise ValueError("itemsize <= 0 for cython.array")
 * 
 */
  __pyx_t_2 = ((__pyx_v_itemsize <= 0) != 0);
  if (unlikely(__pyx_t_2)) {

    /* "View.MemoryView":136
 * 
 *         if itemsize <= 0:
 *             raise ValueError("itemsize <= 0 for cython.array")             # <<<<<<<<<<<<<<
 * 
 *         if not isinstance(format, bytes):
 */
    __pyx_t_3 = __Pyx_PyObject_Call(__pyx_builtin_ValueError, __pyx_tuple__5, NULL); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 136, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_Raise(__pyx_t_3, 0, 0, 0);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __PYX_ERR(2, 136, __pyx_L1_error)

    /* "View.MemoryView":135
 *             raise ValueError("Empty shape tuple for cython.array")
 * 
 *         if itemsize <= 0:             # <<<<<<<<<<<<<<
 *             raise ValueError("itemsize <= 0 for cython.array")
 * 
 */
  }

  /* "View.MemoryView":138
 *             raise ValueError("itemsize <= 0 for cython.array")
 * 
 *         if not isinstance(format, bytes):             # <<<<<<<<<<<<<<
 *             format = format.encode('ASCII')
 *         self._format = format  # keep a reference to the byte string
 */
  __pyx_t_2 = PyBytes_Check(__pyx_v_format); 
  __pyx_t_4 = ((!(__pyx_t_2 != 0)) != 0);
  if (__pyx_t_4) {

    /* "View.MemoryView":139
 * 
 *         if not isinstance(format, bytes):
 *             format = format.encode('ASCII')             # <<<<<<<<<<<<<<
 *         self._format = format  # keep a reference to the byte string
 *         self.format = self._format
 */
    __pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_v_format, __pyx_n_s_encode); if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 139, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
    __pyx_t_6 = NULL;
    if (CYTHON_UNPACK_METHODS && likely(PyMethod_Check(__pyx_t_5))) {
      __pyx_t_6 = PyMethod_GET_SELF(__pyx_t_5);
      if (likely(__pyx_t_6)) {
        PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_5);
        __Pyx_INCREF(__pyx_t_6);
        __Pyx_INCREF(function);
        __Pyx_DECREF_SET(__pyx_t_5, function);
      }
    }
    __pyx_t_3 = (__pyx_t_6) ? __Pyx_PyObject_Call2Args(__pyx_t_5, __pyx_t_6, __pyx_n_s_ASCII) : __Pyx_PyObject_CallOneArg(__pyx_t_5, __pyx_n_s_ASCII);
    __Pyx_XDECREF(__pyx_t_6); __pyx_t_6 = 0;
    if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 139, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
    __Pyx_DECREF_SET(__pyx_v_format, __pyx_t_3);
    __pyx_t_3 = 0;

    /* "View.MemoryView":138
 *             raise ValueError("itemsize <= 0 for cython.array")
 * 
 *         if not isinstance(format, bytes):             # <<<<<<<<<<<<<<
 *             format = format.encode('ASCII')
 *         self._format = format  # keep a reference to the byte string
 */
  }

  /* "View.MemoryView":140
 *         if not isinstance(format, bytes):
 *             format = format.encode('ASCII')
 *         self._format = format  # keep a reference to the byte string             # <<<<<<<<<<<<<<
 *         self.format = self._format
 * 
 */
  if (!(likely(PyBytes_CheckExact(__pyx_v_format))||((__pyx_v_format) == Py_None)||(PyErr_Format(PyExc_TypeError, "Expected %.16s, got %.200s", "bytes", Py_TYPE(__pyx_v_format)->tp_name), 0))) __PYX_ERR(2, 140, __pyx_L1_error)
  __pyx_t_3 = __pyx_v_format;
  __Pyx_INCREF(__pyx_t_3);
  __Pyx_GIVEREF(__pyx_t_3);
  __Pyx_GOTREF(__pyx_v_self->_format);
  __Pyx_DECREF(__pyx_v_self->_format);
  __pyx_v_self->_format = ((PyObject*)__pyx_t_3);
  __pyx_t_3 = 0;

  /* "View.MemoryView":141
 *             format = format.encode('ASCII')
 *         self._format = format  # keep a reference to the byte string
 *         self.format = self._format             # <<<<<<<<<<<<<<
 * 
 * 
 */
  if (unlikely(__pyx_v_self->_format == Py_None)) {
    PyErr_SetString(PyExc_TypeError, "expected bytes, NoneType found");
    __PYX_ERR(2, 141, __pyx_L1_error)
  }
  __pyx_t_7 = __Pyx_PyBytes_AsWritableString(__pyx_v_self->_format); if (unlikely((!__pyx_t_7) && PyErr_Occurred())) __PYX_ERR(2, 141, __pyx_L1_error)
  __pyx_v_self->format = __pyx_t_7;

  /* "View.MemoryView":144
 * 
 * 
 *         self._shape = <Py_ssize_t *> PyObject_Malloc(sizeof(Py_ssize_t)*self.ndim*2)             # <<<<<<<<<<<<<<
 *         self._strides = self._shape + self.ndim
 * 
 */
  __pyx_v_self->_shape = ((Py_ssize_t *)PyObject_Malloc((((sizeof(Py_ssize_t)) * __pyx_v_self->ndim) * 2)));

  /* "View.MemoryView":145
 * 
 *         self._shape = <Py_ssize_t *> PyObject_Malloc(sizeof(Py_ssize_t)*self.ndim*2)
 *         self._strides = self._shape + self.ndim             # <<<<<<<<<<<<<<
 * 
 *         if not self._shape:
 */
  __pyx_v_self->_strides = (__pyx_v_self->_shape + __pyx_v_self->ndim);

  /* "View.MemoryView":147
 *         self._strides = self._shape + self.ndim
 * 
 *         if not self._shape:             # <<<<<<<<<<<<<<
 *             raise MemoryError("unable to allocate shape and strides.")
 * 
 */
  __pyx_t_4 = ((!(__pyx_v_self->_shape != 0)) != 0);
  if (unlikely(__pyx_t_4)) {

    /* "View.MemoryView":148
 * 
 *         if not self._shape:
 *             raise MemoryError("unable to allocate shape and strides.")             # <<<<<<<<<<<<<<
 * 
 * 
 */
    __pyx_t_3 = __Pyx_PyObject_Call(__pyx_builtin_MemoryError, __pyx_tuple__6, NULL); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 148, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_Raise(__pyx_t_3, 0, 0, 0);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __PYX_ERR(2, 148, __pyx_L1_error)

    /* "View.MemoryView":147
 *         self._strides = self._shape + self.ndim
 * 
 *         if not self._shape:             # <<<<<<<<<<<<<<
 *             raise MemoryError("unable to allocate shape and strides.")
 * 
 */
  }

  /* "View.MemoryView":151
 * 
 * 
 *         for idx, dim in enumerate(shape):             # <<<<<<<<<<<<<<
 *             if dim <= 0:
 *                 raise ValueError("Invalid shape in axis %d: %d." % (idx, dim))
 */
  __pyx_t_8 = 0;
  __pyx_t_3 = __pyx_v_shape; __Pyx_INCREF(__pyx_t_3); __pyx_t_1 = 0;
  for (;;) {
    if (__pyx_t_1 >= PyTuple_GET_SIZE(__pyx_t_3)) break;
    #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS
    __pyx_t_5 = PyTuple_GET_ITEM(__pyx_t_3, __pyx_t_1); __Pyx_INCREF(__pyx_t_5); __pyx_t_1++; if (unlikely(0 < 0)) __PYX_ERR(2, 151, __pyx_L1_error)
    #else
    __pyx_t_5 = PySequence_ITEM(__pyx_t_3, __pyx_t_1); __pyx_t_1++; if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 151, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
    #endif
    __pyx_t_9 = __Pyx_PyIndex_AsSsize_t(__pyx_t_5); if (unlikely((__pyx_t_9 == (Py_ssize_t)-1) && PyErr_Occurred())) __PYX_ERR(2, 151, __pyx_L1_error)
    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
    __pyx_v_dim = __pyx_t_9;
    __pyx_v_idx = __pyx_t_8;
    __pyx_t_8 = (__pyx_t_8 + 1);

    /* "View.MemoryView":152
 * 
 *         for idx, dim in enumerate(shape):
 *             if dim <= 0:             # <<<<<<<<<<<<<<
 *                 raise ValueError("Invalid shape in axis %d: %d." % (idx, dim))
 *             self._shape[idx] = dim
 */
    __pyx_t_4 = ((__pyx_v_dim <= 0) != 0);
    if (unlikely(__pyx_t_4)) {

      /* "View.MemoryView":153
 *         for idx, dim in enumerate(shape):
 *             if dim <= 0:
 *                 raise ValueError("Invalid shape in axis %d: %d." % (idx, dim))             # <<<<<<<<<<<<<<
 *             self._shape[idx] = dim
 * 
 */
      __pyx_t_5 = __Pyx_PyInt_From_int(__pyx_v_idx); if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 153, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __pyx_t_6 = PyInt_FromSsize_t(__pyx_v_dim); if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 153, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_6);
      __pyx_t_10 = PyTuple_New(2); if (unlikely(!__pyx_t_10)) __PYX_ERR(2, 153, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_10);
      __Pyx_GIVEREF(__pyx_t_5);
      PyTuple_SET_ITEM(__pyx_t_10, 0, __pyx_t_5);
      __Pyx_GIVEREF(__pyx_t_6);
      PyTuple_SET_ITEM(__pyx_t_10, 1, __pyx_t_6);
      __pyx_t_5 = 0;
      __pyx_t_6 = 0;
      __pyx_t_6 = __Pyx_PyString_Format(__pyx_kp_s_Invalid_shape_in_axis_d_d, __pyx_t_10); if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 153, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_6);
      __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
      __pyx_t_10 = __Pyx_PyObject_CallOneArg(__pyx_builtin_ValueError, __pyx_t_6); if (unlikely(!__pyx_t_10)) __PYX_ERR(2, 153, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_10);
      __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
      __Pyx_Raise(__pyx_t_10, 0, 0, 0);
      __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
      __PYX_ERR(2, 153, __pyx_L1_error)

      /* "View.MemoryView":152
 * 
 *         for idx, dim in enumerate(shape):
 *             if dim <= 0:             # <<<<<<<<<<<<<<
 *                 raise ValueError("Invalid shape in axis %d: %d." % (idx, dim))
 *             self._shape[idx] = dim
 */
    }

    /* "View.MemoryView":154
 *             if dim <= 0:
 *                 raise ValueError("Invalid shape in axis %d: %d." % (idx, dim))
 *             self._shape[idx] = dim             # <<<<<<<<<<<<<<
 * 
 *         cdef char order
 */
    (__pyx_v_self->_shape[__pyx_v_idx]) = __pyx_v_dim;

    /* "View.MemoryView":151
 * 
 * 
 *         for idx, dim in enumerate(shape):             # <<<<<<<<<<<<<<
 *             if dim <= 0:
 *                 raise ValueError("Invalid shape in axis %d: %d." % (idx, dim))
 */
  }
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;

  /* "View.MemoryView":157
 * 
 *         cdef char order
 *         if mode == 'fortran':             # <<<<<<<<<<<<<<
 *             order = b'F'
 *             self.mode = u'fortran'
 */
  __pyx_t_4 = (__Pyx_PyString_Equals(__pyx_v_mode, __pyx_n_s_fortran, Py_EQ)); if (unlikely(__pyx_t_4 < 0)) __PYX_ERR(2, 157, __pyx_L1_error)
  if (__pyx_t_4) {

    /* "View.MemoryView":158
 *         cdef char order
 *         if mode == 'fortran':
 *             order = b'F'             # <<<<<<<<<<<<<<
 *             self.mode = u'fortran'
 *         elif mode == 'c':
 */
    __pyx_v_order = 'F';

    /* "View.MemoryView":159
 *         if mode == 'fortran':
 *             order = b'F'
 *             self.mode = u'fortran'             # <<<<<<<<<<<<<<
 *         elif mode == 'c':
 *             order = b'C'
 */
    __Pyx_INCREF(__pyx_n_u_fortran);
    __Pyx_GIVEREF(__pyx_n_u_fortran);
    __Pyx_GOTREF(__pyx_v_self->mode);
    __Pyx_DECREF(__pyx_v_self->mode);
    __pyx_v_self->mode = __pyx_n_u_fortran;

    /* "View.MemoryView":157
 * 
 *         cdef char order
 *         if mode == 'fortran':             # <<<<<<<<<<<<<<
 *             order = b'F'
 *             self.mode = u'fortran'
 */
    goto __pyx_L10;
  }

  /* "View.MemoryView":160
 *             order = b'F'
 *             self.mode = u'fortran'
 *         elif mode == 'c':             # <<<<<<<<<<<<<<
 *             order = b'C'
 *             self.mode = u'c'
 */
  __pyx_t_4 = (__Pyx_PyString_Equals(__pyx_v_mode, __pyx_n_s_c, Py_EQ)); if (unlikely(__pyx_t_4 < 0)) __PYX_ERR(2, 160, __pyx_L1_error)
  if (likely(__pyx_t_4)) {

    /* "View.MemoryView":161
 *             self.mode = u'fortran'
 *         elif mode == 'c':
 *             order = b'C'             # <<<<<<<<<<<<<<
 *             self.mode = u'c'
 *         else:
 */
    __pyx_v_order = 'C';

    /* "View.MemoryView":162
 *         elif mode == 'c':
 *             order = b'C'
 *             self.mode = u'c'             # <<<<<<<<<<<<<<
 *         else:
 *             raise ValueError("Invalid mode, expected 'c' or 'fortran', got %s" % mode)
 */
    __Pyx_INCREF(__pyx_n_u_c);
    __Pyx_GIVEREF(__pyx_n_u_c);
    __Pyx_GOTREF(__pyx_v_self->mode);
    __Pyx_DECREF(__pyx_v_self->mode);
    __pyx_v_self->mode = __pyx_n_u_c;

    /* "View.MemoryView":160
 *             order = b'F'
 *             self.mode = u'fortran'
 *         elif mode == 'c':             # <<<<<<<<<<<<<<
 *             order = b'C'
 *             self.mode = u'c'
 */
    goto __pyx_L10;
  }

  /* "View.MemoryView":164
 *             self.mode = u'c'
 *         else:
 *             raise ValueError("Invalid mode, expected 'c' or 'fortran', got %s" % mode)             # <<<<<<<<<<<<<<
 * 
 *         self.len = fill_contig_strides_array(self._shape, self._strides,
 */
  /*else*/ {
    __pyx_t_3 = __Pyx_PyString_FormatSafe(__pyx_kp_s_Invalid_mode_expected_c_or_fortr, __pyx_v_mode); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 164, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_t_10 = __Pyx_PyObject_CallOneArg(__pyx_builtin_ValueError, __pyx_t_3); if (unlikely(!__pyx_t_10)) __PYX_ERR(2, 164, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_10);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __Pyx_Raise(__pyx_t_10, 0, 0, 0);
    __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
    __PYX_ERR(2, 164, __pyx_L1_error)
  }
  __pyx_L10:;

  /* "View.MemoryView":166
 *             raise ValueError("Invalid mode, expected 'c' or 'fortran', got %s" % mode)
 * 
 *         self.len = fill_contig_strides_array(self._shape, self._strides,             # <<<<<<<<<<<<<<
 *                                              itemsize, self.ndim, order)
 * 
 */
  __pyx_v_self->len = __pyx_fill_contig_strides_array(__pyx_v_self->_shape, __pyx_v_self->_strides, __pyx_v_itemsize, __pyx_v_self->ndim, __pyx_v_order);

  /* "View.MemoryView":169
 *                                              itemsize, self.ndim, order)
 * 
 *         self.free_data = allocate_buffer             # <<<<<<<<<<<<<<
 *         self.dtype_is_object = format == b'O'
 *         if allocate_buffer:
 */
  __pyx_v_self->free_data = __pyx_v_allocate_buffer;

  /* "View.MemoryView":170
 * 
 *         self.free_data = allocate_buffer
 *         self.dtype_is_object = format == b'O'             # <<<<<<<<<<<<<<
 *         if allocate_buffer:
 * 
 */
  __pyx_t_10 = PyObject_RichCompare(__pyx_v_format, __pyx_n_b_O, Py_EQ); __Pyx_XGOTREF(__pyx_t_10); if (unlikely(!__pyx_t_10)) __PYX_ERR(2, 170, __pyx_L1_error)
  __pyx_t_4 = __Pyx_PyObject_IsTrue(__pyx_t_10); if (unlikely((__pyx_t_4 == (int)-1) && PyErr_Occurred())) __PYX_ERR(2, 170, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
  __pyx_v_self->dtype_is_object = __pyx_t_4;

  /* "View.MemoryView":171
 *         self.free_data = allocate_buffer
 *         self.dtype_is_object = format == b'O'
 *         if allocate_buffer:             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_t_4 = (__pyx_v_allocate_buffer != 0);
  if (__pyx_t_4) {

    /* "View.MemoryView":174
 * 
 * 
 *             self.data = <char *>malloc(self.len)             # <<<<<<<<<<<<<<
 *             if not self.data:
 *                 raise MemoryError("unable to allocate array data.")
 */
    __pyx_v_self->data = ((char *)malloc(__pyx_v_self->len));

    /* "View.MemoryView":175
 * 
 *             self.data = <char *>malloc(self.len)
 *             if not self.data:             # <<<<<<<<<<<<<<
 *                 raise MemoryError("unable to allocate array data.")
 * 
 */
    __pyx_t_4 = ((!(__pyx_v_self->data != 0)) != 0);
    if (unlikely(__pyx_t_4)) {

      /* "View.MemoryView":176
 *             self.data = <char *>malloc(self.len)
 *             if not self.data:
 *                 raise MemoryError("unable to allocate array data.")             # <<<<<<<<<<<<<<
 * 
 *             if self.dtype_is_object:
 */
      __pyx_t_10 = __Pyx_PyObject_Call(__pyx_builtin_MemoryError, __pyx_tuple__7, NULL); if (unlikely(!__pyx_t_10)) __PYX_ERR(2, 176, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_10);
      __Pyx_Raise(__pyx_t_10, 0, 0, 0);
      __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;
      __PYX_ERR(2, 176, __pyx_L1_error)

      /* "View.MemoryView":175
 * 
 *             self.data = <char *>malloc(self.len)
 *             if not self.data:             # <<<<<<<<<<<<<<
 *                 raise MemoryError("unable to allocate array data.")
 * 
 */
    }

    /* "View.MemoryView":178
 *                 raise MemoryError("unable to allocate array data.")
 * 
 *             if self.dtype_is_object:             # <<<<<<<<<<<<<<
 *                 p = <PyObject **> self.data
 *                 for i in range(self.len / itemsize):
 */
    __pyx_t_4 = (__pyx_v_self->dtype_is_object != 0);
    if (__pyx_t_4) {

      /* "View.MemoryView":179
 * 
 *             if self.dtype_is_object:
 *                 p = <PyObject **> self.data             # <<<<<<<<<<<<<<
 *                 for i in range(self.len / itemsize):
 *                     p[i] = Py_None
 */
      __pyx_v_p = ((PyObject **)__pyx_v_self->data);

      /* "View.MemoryView":180
 *             if self.dtype_is_object:
 *                 p = <PyObject **> self.data
 *                 for i in range(self.len / itemsize):             # <<<<<<<<<<<<<<
 *                     p[i] = Py_None
 *                     Py_INCREF(Py_None)
 */
      if (unlikely(__pyx_v_itemsize == 0)) {
        PyErr_SetString(PyExc_ZeroDivisionError, "integer division or modulo by zero");
        __PYX_ERR(2, 180, __pyx_L1_error)
      }
      else if (sizeof(Py_ssize_t) == sizeof(long) && (!(((Py_ssize_t)-1) > 0)) && unlikely(__pyx_v_itemsize == (Py_ssize_t)-1)  && unlikely(UNARY_NEG_WOULD_OVERFLOW(__pyx_v_self->len))) {
        PyErr_SetString(PyExc_OverflowError, "value too large to perform division");
        __PYX_ERR(2, 180, __pyx_L1_error)
      }
      __pyx_t_1 = __Pyx_div_Py_ssize_t(__pyx_v_self->len, __pyx_v_itemsize);
      __pyx_t_9 = __pyx_t_1;
      for (__pyx_t_11 = 0; __pyx_t_11 < __pyx_t_9; __pyx_t_11+=1) {
        __pyx_v_i = __pyx_t_11;

        /* "View.MemoryView":181
 *                 p = <PyObject **> self.data
 *                 for i in range(self.len / itemsize):
 *                     p[i] = Py_None             # <<<<<<<<<<<<<<
 *                     Py_INCREF(Py_None)
 * 
 */
        (__pyx_v_p[__pyx_v_i]) = Py_None;

        /* "View.MemoryView":182
 *                 for i in range(self.len / itemsize):
 *                     p[i] = Py_None
 *                     Py_INCREF(Py_None)             # <<<<<<<<<<<<<<
 * 
 *     @cname('getbuffer')
 */
        Py_INCREF(Py_None);
      }

      /* "View.MemoryView":178
 *                 raise MemoryError("unable to allocate array data.")
 * 
 *             if self.dtype_is_object:             # <<<<<<<<<<<<<<
 *                 p = <PyObject **> self.data
 *                 for i in range(self.len / itemsize):
 */
    }

    /* "View.MemoryView":171
 *         self.free_data = allocate_buffer
 *         self.dtype_is_object = format == b'O'
 *         if allocate_buffer:             # <<<<<<<<<<<<<<
 * 
 * 
 */
  }

  /* "View.MemoryView":122
 *         cdef bint dtype_is_object
 * 
 *     def __cinit__(array self, tuple shape, Py_ssize_t itemsize, format not None,             # <<<<<<<<<<<<<<
 *                   mode="c", bint allocate_buffer=True):
 * 
 */

  /* function exit code */
  __pyx_r = 0;
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_XDECREF(__pyx_t_6);
  __Pyx_XDECREF(__pyx_t_10);
  __Pyx_AddTraceback("View.MemoryView.array.__cinit__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = -1;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_format);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":185
 * 
 *     @cname('getbuffer')
 *     def __getbuffer__(self, Py_buffer *info, int flags):             # <<<<<<<<<<<<<<
 *         cdef int bufmode = -1
 *         if self.mode == u"c":
 */

/* Python wrapper */
static CYTHON_UNUSED int __pyx_array_getbuffer(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /*proto*/
static CYTHON_UNUSED int __pyx_array_getbuffer(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__getbuffer__ (wrapper)", 0);
  __pyx_r = __pyx_array___pyx_pf_15View_dot_MemoryView_5array_2__getbuffer__(((struct __pyx_array_obj *)__pyx_v_self), ((Py_buffer *)__pyx_v_info), ((int)__pyx_v_flags));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static int __pyx_array___pyx_pf_15View_dot_MemoryView_5array_2__getbuffer__(struct __pyx_array_obj *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags) {
  int __pyx_v_bufmode;
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  int __pyx_t_2;
  PyObject *__pyx_t_3 = NULL;
  char *__pyx_t_4;
  Py_ssize_t __pyx_t_5;
  int __pyx_t_6;
  Py_ssize_t *__pyx_t_7;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  if (__pyx_v_info == NULL) {
    PyErr_SetString(PyExc_BufferError, "PyObject_GetBuffer: view==NULL argument is obsolete");
    return -1;
  }
  __Pyx_RefNannySetupContext("__getbuffer__", 0);
  __pyx_v_info->obj = Py_None; __Pyx_INCREF(Py_None);
  __Pyx_GIVEREF(__pyx_v_info->obj);

  /* "View.MemoryView":186
 *     @cname('getbuffer')
 *     def __getbuffer__(self, Py_buffer *info, int flags):
 *         cdef int bufmode = -1             # <<<<<<<<<<<<<<
 *         if self.mode == u"c":
 *             bufmode = PyBUF_C_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS
 */
  __pyx_v_bufmode = -1;

  /* "View.MemoryView":187
 *     def __getbuffer__(self, Py_buffer *info, int flags):
 *         cdef int bufmode = -1
 *         if self.mode == u"c":             # <<<<<<<<<<<<<<
 *             bufmode = PyBUF_C_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS
 *         elif self.mode == u"fortran":
 */
  __pyx_t_1 = (__Pyx_PyUnicode_Equals(__pyx_v_self->mode, __pyx_n_u_c, Py_EQ)); if (unlikely(__pyx_t_1 < 0)) __PYX_ERR(2, 187, __pyx_L1_error)
  __pyx_t_2 = (__pyx_t_1 != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":188
 *         cdef int bufmode = -1
 *         if self.mode == u"c":
 *             bufmode = PyBUF_C_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS             # <<<<<<<<<<<<<<
 *         elif self.mode == u"fortran":
 *             bufmode = PyBUF_F_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS
 */
    __pyx_v_bufmode = (PyBUF_C_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS);

    /* "View.MemoryView":187
 *     def __getbuffer__(self, Py_buffer *info, int flags):
 *         cdef int bufmode = -1
 *         if self.mode == u"c":             # <<<<<<<<<<<<<<
 *             bufmode = PyBUF_C_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS
 *         elif self.mode == u"fortran":
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":189
 *         if self.mode == u"c":
 *             bufmode = PyBUF_C_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS
 *         elif self.mode == u"fortran":             # <<<<<<<<<<<<<<
 *             bufmode = PyBUF_F_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS
 *         if not (flags & bufmode):
 */
  __pyx_t_2 = (__Pyx_PyUnicode_Equals(__pyx_v_self->mode, __pyx_n_u_fortran, Py_EQ)); if (unlikely(__pyx_t_2 < 0)) __PYX_ERR(2, 189, __pyx_L1_error)
  __pyx_t_1 = (__pyx_t_2 != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":190
 *             bufmode = PyBUF_C_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS
 *         elif self.mode == u"fortran":
 *             bufmode = PyBUF_F_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS             # <<<<<<<<<<<<<<
 *         if not (flags & bufmode):
 *             raise ValueError("Can only create a buffer that is contiguous in memory.")
 */
    __pyx_v_bufmode = (PyBUF_F_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS);

    /* "View.MemoryView":189
 *         if self.mode == u"c":
 *             bufmode = PyBUF_C_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS
 *         elif self.mode == u"fortran":             # <<<<<<<<<<<<<<
 *             bufmode = PyBUF_F_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS
 *         if not (flags & bufmode):
 */
  }
  __pyx_L3:;

  /* "View.MemoryView":191
 *         elif self.mode == u"fortran":
 *             bufmode = PyBUF_F_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS
 *         if not (flags & bufmode):             # <<<<<<<<<<<<<<
 *             raise ValueError("Can only create a buffer that is contiguous in memory.")
 *         info.buf = self.data
 */
  __pyx_t_1 = ((!((__pyx_v_flags & __pyx_v_bufmode) != 0)) != 0);
  if (unlikely(__pyx_t_1)) {

    /* "View.MemoryView":192
 *             bufmode = PyBUF_F_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS
 *         if not (flags & bufmode):
 *             raise ValueError("Can only create a buffer that is contiguous in memory.")             # <<<<<<<<<<<<<<
 *         info.buf = self.data
 *         info.len = self.len
 */
    __pyx_t_3 = __Pyx_PyObject_Call(__pyx_builtin_ValueError, __pyx_tuple__8, NULL); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 192, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_Raise(__pyx_t_3, 0, 0, 0);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __PYX_ERR(2, 192, __pyx_L1_error)

    /* "View.MemoryView":191
 *         elif self.mode == u"fortran":
 *             bufmode = PyBUF_F_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS
 *         if not (flags & bufmode):             # <<<<<<<<<<<<<<
 *             raise ValueError("Can only create a buffer that is contiguous in memory.")
 *         info.buf = self.data
 */
  }

  /* "View.MemoryView":193
 *         if not (flags & bufmode):
 *             raise ValueError("Can only create a buffer that is contiguous in memory.")
 *         info.buf = self.data             # <<<<<<<<<<<<<<
 *         info.len = self.len
 *         info.ndim = self.ndim
 */
  __pyx_t_4 = __pyx_v_self->data;
  __pyx_v_info->buf = __pyx_t_4;

  /* "View.MemoryView":194
 *             raise ValueError("Can only create a buffer that is contiguous in memory.")
 *         info.buf = self.data
 *         info.len = self.len             # <<<<<<<<<<<<<<
 *         info.ndim = self.ndim
 *         info.shape = self._shape
 */
  __pyx_t_5 = __pyx_v_self->len;
  __pyx_v_info->len = __pyx_t_5;

  /* "View.MemoryView":195
 *         info.buf = self.data
 *         info.len = self.len
 *         info.ndim = self.ndim             # <<<<<<<<<<<<<<
 *         info.shape = self._shape
 *         info.strides = self._strides
 */
  __pyx_t_6 = __pyx_v_self->ndim;
  __pyx_v_info->ndim = __pyx_t_6;

  /* "View.MemoryView":196
 *         info.len = self.len
 *         info.ndim = self.ndim
 *         info.shape = self._shape             # <<<<<<<<<<<<<<
 *         info.strides = self._strides
 *         info.suboffsets = NULL
 */
  __pyx_t_7 = __pyx_v_self->_shape;
  __pyx_v_info->shape = __pyx_t_7;

  /* "View.MemoryView":197
 *         info.ndim = self.ndim
 *         info.shape = self._shape
 *         info.strides = self._strides             # <<<<<<<<<<<<<<
 *         info.suboffsets = NULL
 *         info.itemsize = self.itemsize
 */
  __pyx_t_7 = __pyx_v_self->_strides;
  __pyx_v_info->strides = __pyx_t_7;

  /* "View.MemoryView":198
 *         info.shape = self._shape
 *         info.strides = self._strides
 *         info.suboffsets = NULL             # <<<<<<<<<<<<<<
 *         info.itemsize = self.itemsize
 *         info.readonly = 0
 */
  __pyx_v_info->suboffsets = NULL;

  /* "View.MemoryView":199
 *         info.strides = self._strides
 *         info.suboffsets = NULL
 *         info.itemsize = self.itemsize             # <<<<<<<<<<<<<<
 *         info.readonly = 0
 * 
 */
  __pyx_t_5 = __pyx_v_self->itemsize;
  __pyx_v_info->itemsize = __pyx_t_5;

  /* "View.MemoryView":200
 *         info.suboffsets = NULL
 *         info.itemsize = self.itemsize
 *         info.readonly = 0             # <<<<<<<<<<<<<<
 * 
 *         if flags & PyBUF_FORMAT:
 */
  __pyx_v_info->readonly = 0;

  /* "View.MemoryView":202
 *         info.readonly = 0
 * 
 *         if flags & PyBUF_FORMAT:             # <<<<<<<<<<<<<<
 *             info.format = self.format
 *         else:
 */
  __pyx_t_1 = ((__pyx_v_flags & PyBUF_FORMAT) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":203
 * 
 *         if flags & PyBUF_FORMAT:
 *             info.format = self.format             # <<<<<<<<<<<<<<
 *         else:
 *             info.format = NULL
 */
    __pyx_t_4 = __pyx_v_self->format;
    __pyx_v_info->format = __pyx_t_4;

    /* "View.MemoryView":202
 *         info.readonly = 0
 * 
 *         if flags & PyBUF_FORMAT:             # <<<<<<<<<<<<<<
 *             info.format = self.format
 *         else:
 */
    goto __pyx_L5;
  }

  /* "View.MemoryView":205
 *             info.format = self.format
 *         else:
 *             info.format = NULL             # <<<<<<<<<<<<<<
 * 
 *         info.obj = self
 */
  /*else*/ {
    __pyx_v_info->format = NULL;
  }
  __pyx_L5:;

  /* "View.MemoryView":207
 *             info.format = NULL
 * 
 *         info.obj = self             # <<<<<<<<<<<<<<
 * 
 *     __pyx_getbuffer = capsule(<void *> &__pyx_array_getbuffer, "getbuffer(obj, view, flags)")
 */
  __Pyx_INCREF(((PyObject *)__pyx_v_self));
  __Pyx_GIVEREF(((PyObject *)__pyx_v_self));
  __Pyx_GOTREF(__pyx_v_info->obj);
  __Pyx_DECREF(__pyx_v_info->obj);
  __pyx_v_info->obj = ((PyObject *)__pyx_v_self);

  /* "View.MemoryView":185
 * 
 *     @cname('getbuffer')
 *     def __getbuffer__(self, Py_buffer *info, int flags):             # <<<<<<<<<<<<<<
 *         cdef int bufmode = -1
 *         if self.mode == u"c":
 */

  /* function exit code */
  __pyx_r = 0;
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_AddTraceback("View.MemoryView.array.__getbuffer__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = -1;
  if (__pyx_v_info->obj != NULL) {
    __Pyx_GOTREF(__pyx_v_info->obj);
    __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = 0;
  }
  goto __pyx_L2;
  __pyx_L0:;
  if (__pyx_v_info->obj == Py_None) {
    __Pyx_GOTREF(__pyx_v_info->obj);
    __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = 0;
  }
  __pyx_L2:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":211
 *     __pyx_getbuffer = capsule(<void *> &__pyx_array_getbuffer, "getbuffer(obj, view, flags)")
 * 
 *     def __dealloc__(array self):             # <<<<<<<<<<<<<<
 *         if self.callback_free_data != NULL:
 *             self.callback_free_data(self.data)
 */

/* Python wrapper */
static void __pyx_array___dealloc__(PyObject *__pyx_v_self); /*proto*/
static void __pyx_array___dealloc__(PyObject *__pyx_v_self) {
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__dealloc__ (wrapper)", 0);
  __pyx_array___pyx_pf_15View_dot_MemoryView_5array_4__dealloc__(((struct __pyx_array_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
}

static void __pyx_array___pyx_pf_15View_dot_MemoryView_5array_4__dealloc__(struct __pyx_array_obj *__pyx_v_self) {
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  __Pyx_RefNannySetupContext("__dealloc__", 0);

  /* "View.MemoryView":212
 * 
 *     def __dealloc__(array self):
 *         if self.callback_free_data != NULL:             # <<<<<<<<<<<<<<
 *             self.callback_free_data(self.data)
 *         elif self.free_data:
 */
  __pyx_t_1 = ((__pyx_v_self->callback_free_data != NULL) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":213
 *     def __dealloc__(array self):
 *         if self.callback_free_data != NULL:
 *             self.callback_free_data(self.data)             # <<<<<<<<<<<<<<
 *         elif self.free_data:
 *             if self.dtype_is_object:
 */
    __pyx_v_self->callback_free_data(__pyx_v_self->data);

    /* "View.MemoryView":212
 * 
 *     def __dealloc__(array self):
 *         if self.callback_free_data != NULL:             # <<<<<<<<<<<<<<
 *             self.callback_free_data(self.data)
 *         elif self.free_data:
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":214
 *         if self.callback_free_data != NULL:
 *             self.callback_free_data(self.data)
 *         elif self.free_data:             # <<<<<<<<<<<<<<
 *             if self.dtype_is_object:
 *                 refcount_objects_in_slice(self.data, self._shape,
 */
  __pyx_t_1 = (__pyx_v_self->free_data != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":215
 *             self.callback_free_data(self.data)
 *         elif self.free_data:
 *             if self.dtype_is_object:             # <<<<<<<<<<<<<<
 *                 refcount_objects_in_slice(self.data, self._shape,
 *                                           self._strides, self.ndim, False)
 */
    __pyx_t_1 = (__pyx_v_self->dtype_is_object != 0);
    if (__pyx_t_1) {

      /* "View.MemoryView":216
 *         elif self.free_data:
 *             if self.dtype_is_object:
 *                 refcount_objects_in_slice(self.data, self._shape,             # <<<<<<<<<<<<<<
 *                                           self._strides, self.ndim, False)
 *             free(self.data)
 */
      __pyx_memoryview_refcount_objects_in_slice(__pyx_v_self->data, __pyx_v_self->_shape, __pyx_v_self->_strides, __pyx_v_self->ndim, 0);

      /* "View.MemoryView":215
 *             self.callback_free_data(self.data)
 *         elif self.free_data:
 *             if self.dtype_is_object:             # <<<<<<<<<<<<<<
 *                 refcount_objects_in_slice(self.data, self._shape,
 *                                           self._strides, self.ndim, False)
 */
    }

    /* "View.MemoryView":218
 *                 refcount_objects_in_slice(self.data, self._shape,
 *                                           self._strides, self.ndim, False)
 *             free(self.data)             # <<<<<<<<<<<<<<
 *         PyObject_Free(self._shape)
 * 
 */
    free(__pyx_v_self->data);

    /* "View.MemoryView":214
 *         if self.callback_free_data != NULL:
 *             self.callback_free_data(self.data)
 *         elif self.free_data:             # <<<<<<<<<<<<<<
 *             if self.dtype_is_object:
 *                 refcount_objects_in_slice(self.data, self._shape,
 */
  }
  __pyx_L3:;

  /* "View.MemoryView":219
 *                                           self._strides, self.ndim, False)
 *             free(self.data)
 *         PyObject_Free(self._shape)             # <<<<<<<<<<<<<<
 * 
 *     @property
 */
  PyObject_Free(__pyx_v_self->_shape);

  /* "View.MemoryView":211
 *     __pyx_getbuffer = capsule(<void *> &__pyx_array_getbuffer, "getbuffer(obj, view, flags)")
 * 
 *     def __dealloc__(array self):             # <<<<<<<<<<<<<<
 *         if self.callback_free_data != NULL:
 *             self.callback_free_data(self.data)
 */

  /* function exit code */
  __Pyx_RefNannyFinishContext();
}

/* "View.MemoryView":222
 * 
 *     @property
 *     def memview(self):             # <<<<<<<<<<<<<<
 *         return self.get_memview()
 * 
 */

/* Python wrapper */
static PyObject *__pyx_pw_15View_dot_MemoryView_5array_7memview_1__get__(PyObject *__pyx_v_self); /*proto*/
static PyObject *__pyx_pw_15View_dot_MemoryView_5array_7memview_1__get__(PyObject *__pyx_v_self) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__get__ (wrapper)", 0);
  __pyx_r = __pyx_pf_15View_dot_MemoryView_5array_7memview___get__(((struct __pyx_array_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_15View_dot_MemoryView_5array_7memview___get__(struct __pyx_array_obj *__pyx_v_self) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__get__", 0);

  /* "View.MemoryView":223
 *     @property
 *     def memview(self):
 *         return self.get_memview()             # <<<<<<<<<<<<<<
 * 
 *     @cname('get_memview')
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = ((struct __pyx_vtabstruct_array *)__pyx_v_self->__pyx_vtab)->get_memview(__pyx_v_self); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 223, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":222
 * 
 *     @property
 *     def memview(self):             # <<<<<<<<<<<<<<
 *         return self.get_memview()
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("View.MemoryView.array.memview.__get__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":226
 * 
 *     @cname('get_memview')
 *     cdef get_memview(self):             # <<<<<<<<<<<<<<
 *         flags =  PyBUF_ANY_CONTIGUOUS|PyBUF_FORMAT|PyBUF_WRITABLE
 *         return  memoryview(self, flags, self.dtype_is_object)
 */

static PyObject *__pyx_array_get_memview(struct __pyx_array_obj *__pyx_v_self) {
  int __pyx_v_flags;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("get_memview", 0);

  /* "View.MemoryView":227
 *     @cname('get_memview')
 *     cdef get_memview(self):
 *         flags =  PyBUF_ANY_CONTIGUOUS|PyBUF_FORMAT|PyBUF_WRITABLE             # <<<<<<<<<<<<<<
 *         return  memoryview(self, flags, self.dtype_is_object)
 * 
 */
  __pyx_v_flags = ((PyBUF_ANY_CONTIGUOUS | PyBUF_FORMAT) | PyBUF_WRITABLE);

  /* "View.MemoryView":228
 *     cdef get_memview(self):
 *         flags =  PyBUF_ANY_CONTIGUOUS|PyBUF_FORMAT|PyBUF_WRITABLE
 *         return  memoryview(self, flags, self.dtype_is_object)             # <<<<<<<<<<<<<<
 * 
 *     def __len__(self):
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = __Pyx_PyInt_From_int(__pyx_v_flags); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 228, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_PyBool_FromLong(__pyx_v_self->dtype_is_object); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 228, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = PyTuple_New(3); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 228, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_INCREF(((PyObject *)__pyx_v_self));
  __Pyx_GIVEREF(((PyObject *)__pyx_v_self));
  PyTuple_SET_ITEM(__pyx_t_3, 0, ((PyObject *)__pyx_v_self));
  __Pyx_GIVEREF(__pyx_t_1);
  PyTuple_SET_ITEM(__pyx_t_3, 1, __pyx_t_1);
  __Pyx_GIVEREF(__pyx_t_2);
  PyTuple_SET_ITEM(__pyx_t_3, 2, __pyx_t_2);
  __pyx_t_1 = 0;
  __pyx_t_2 = 0;
  __pyx_t_2 = __Pyx_PyObject_Call(((PyObject *)__pyx_memoryview_type), __pyx_t_3, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 228, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_r = __pyx_t_2;
  __pyx_t_2 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":226
 * 
 *     @cname('get_memview')
 *     cdef get_memview(self):             # <<<<<<<<<<<<<<
 *         flags =  PyBUF_ANY_CONTIGUOUS|PyBUF_FORMAT|PyBUF_WRITABLE
 *         return  memoryview(self, flags, self.dtype_is_object)
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_AddTraceback("View.MemoryView.array.get_memview", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":230
 *         return  memoryview(self, flags, self.dtype_is_object)
 * 
 *     def __len__(self):             # <<<<<<<<<<<<<<
 *         return self._shape[0]
 * 
 */

/* Python wrapper */
static Py_ssize_t __pyx_array___len__(PyObject *__pyx_v_self); /*proto*/
static Py_ssize_t __pyx_array___len__(PyObject *__pyx_v_self) {
  Py_ssize_t __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__len__ (wrapper)", 0);
  __pyx_r = __pyx_array___pyx_pf_15View_dot_MemoryView_5array_6__len__(((struct __pyx_array_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static Py_ssize_t __pyx_array___pyx_pf_15View_dot_MemoryView_5array_6__len__(struct __pyx_array_obj *__pyx_v_self) {
  Py_ssize_t __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__len__", 0);

  /* "View.MemoryView":231
 * 
 *     def __len__(self):
 *         return self._shape[0]             # <<<<<<<<<<<<<<
 * 
 *     def __getattr__(self, attr):
 */
  __pyx_r = (__pyx_v_self->_shape[0]);
  goto __pyx_L0;

  /* "View.MemoryView":230
 *         return  memoryview(self, flags, self.dtype_is_object)
 * 
 *     def __len__(self):             # <<<<<<<<<<<<<<
 *         return self._shape[0]
 * 
 */

  /* function exit code */
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":233
 *         return self._shape[0]
 * 
 *     def __getattr__(self, attr):             # <<<<<<<<<<<<<<
 *         return getattr(self.memview, attr)
 * 
 */

/* Python wrapper */
static PyObject *__pyx_array___getattr__(PyObject *__pyx_v_self, PyObject *__pyx_v_attr); /*proto*/
static PyObject *__pyx_array___getattr__(PyObject *__pyx_v_self, PyObject *__pyx_v_attr) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__getattr__ (wrapper)", 0);
  __pyx_r = __pyx_array___pyx_pf_15View_dot_MemoryView_5array_8__getattr__(((struct __pyx_array_obj *)__pyx_v_self), ((PyObject *)__pyx_v_attr));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_array___pyx_pf_15View_dot_MemoryView_5array_8__getattr__(struct __pyx_array_obj *__pyx_v_self, PyObject *__pyx_v_attr) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  PyObject *__pyx_t_2 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__getattr__", 0);

  /* "View.MemoryView":234
 * 
 *     def __getattr__(self, attr):
 *         return getattr(self.memview, attr)             # <<<<<<<<<<<<<<
 * 
 *     def __getitem__(self, item):
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = __Pyx_PyObject_GetAttrStr(((PyObject *)__pyx_v_self), __pyx_n_s_memview); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 234, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_GetAttr(__pyx_t_1, __pyx_v_attr); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 234, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_r = __pyx_t_2;
  __pyx_t_2 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":233
 *         return self._shape[0]
 * 
 *     def __getattr__(self, attr):             # <<<<<<<<<<<<<<
 *         return getattr(self.memview, attr)
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_AddTraceback("View.MemoryView.array.__getattr__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":236
 *         return getattr(self.memview, attr)
 * 
 *     def __getitem__(self, item):             # <<<<<<<<<<<<<<
 *         return self.memview[item]
 * 
 */

/* Python wrapper */
static PyObject *__pyx_array___getitem__(PyObject *__pyx_v_self, PyObject *__pyx_v_item); /*proto*/
static PyObject *__pyx_array___getitem__(PyObject *__pyx_v_self, PyObject *__pyx_v_item) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__getitem__ (wrapper)", 0);
  __pyx_r = __pyx_array___pyx_pf_15View_dot_MemoryView_5array_10__getitem__(((struct __pyx_array_obj *)__pyx_v_self), ((PyObject *)__pyx_v_item));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_array___pyx_pf_15View_dot_MemoryView_5array_10__getitem__(struct __pyx_array_obj *__pyx_v_self, PyObject *__pyx_v_item) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  PyObject *__pyx_t_2 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__getitem__", 0);

  /* "View.MemoryView":237
 * 
 *     def __getitem__(self, item):
 *         return self.memview[item]             # <<<<<<<<<<<<<<
 * 
 *     def __setitem__(self, item, value):
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = __Pyx_PyObject_GetAttrStr(((PyObject *)__pyx_v_self), __pyx_n_s_memview); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 237, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_PyObject_GetItem(__pyx_t_1, __pyx_v_item); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 237, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_r = __pyx_t_2;
  __pyx_t_2 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":236
 *         return getattr(self.memview, attr)
 * 
 *     def __getitem__(self, item):             # <<<<<<<<<<<<<<
 *         return self.memview[item]
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_AddTraceback("View.MemoryView.array.__getitem__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":239
 *         return self.memview[item]
 * 
 *     def __setitem__(self, item, value):             # <<<<<<<<<<<<<<
 *         self.memview[item] = value
 * 
 */

/* Python wrapper */
static int __pyx_array___setitem__(PyObject *__pyx_v_self, PyObject *__pyx_v_item, PyObject *__pyx_v_value); /*proto*/
static int __pyx_array___setitem__(PyObject *__pyx_v_self, PyObject *__pyx_v_item, PyObject *__pyx_v_value) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__setitem__ (wrapper)", 0);
  __pyx_r = __pyx_array___pyx_pf_15View_dot_MemoryView_5array_12__setitem__(((struct __pyx_array_obj *)__pyx_v_self), ((PyObject *)__pyx_v_item), ((PyObject *)__pyx_v_value));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static int __pyx_array___pyx_pf_15View_dot_MemoryView_5array_12__setitem__(struct __pyx_array_obj *__pyx_v_self, PyObject *__pyx_v_item, PyObject *__pyx_v_value) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__setitem__", 0);

  /* "View.MemoryView":240
 * 
 *     def __setitem__(self, item, value):
 *         self.memview[item] = value             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_t_1 = __Pyx_PyObject_GetAttrStr(((PyObject *)__pyx_v_self), __pyx_n_s_memview); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 240, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (unlikely(PyObject_SetItem(__pyx_t_1, __pyx_v_item, __pyx_v_value) < 0)) __PYX_ERR(2, 240, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

  /* "View.MemoryView":239
 *         return self.memview[item]
 * 
 *     def __setitem__(self, item, value):             # <<<<<<<<<<<<<<
 *         self.memview[item] = value
 * 
 */

  /* function exit code */
  __pyx_r = 0;
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("View.MemoryView.array.__setitem__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = -1;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "(tree fragment)":1
 * def __reduce_cython__(self):             # <<<<<<<<<<<<<<
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):
 */

/* Python wrapper */
static PyObject *__pyx_pw___pyx_array_1__reduce_cython__(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused); /*proto*/
static PyObject *__pyx_pw___pyx_array_1__reduce_cython__(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__reduce_cython__ (wrapper)", 0);
  __pyx_r = __pyx_pf___pyx_array___reduce_cython__(((struct __pyx_array_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf___pyx_array___reduce_cython__(CYTHON_UNUSED struct __pyx_array_obj *__pyx_v_self) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__reduce_cython__", 0);

  /* "(tree fragment)":2
 * def __reduce_cython__(self):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")             # <<<<<<<<<<<<<<
 * def __setstate_cython__(self, __pyx_state):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 */
  __pyx_t_1 = __Pyx_PyObject_Call(__pyx_builtin_TypeError, __pyx_tuple__9, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 2, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_Raise(__pyx_t_1, 0, 0, 0);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __PYX_ERR(2, 2, __pyx_L1_error)

  /* "(tree fragment)":1
 * def __reduce_cython__(self):             # <<<<<<<<<<<<<<
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("View.MemoryView.array.__reduce_cython__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "(tree fragment)":3
 * def __reduce_cython__(self):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):             # <<<<<<<<<<<<<<
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 */

/* Python wrapper */
static PyObject *__pyx_pw___pyx_array_3__setstate_cython__(PyObject *__pyx_v_self, PyObject *__pyx_v___pyx_state); /*proto*/
static PyObject *__pyx_pw___pyx_array_3__setstate_cython__(PyObject *__pyx_v_self, PyObject *__pyx_v___pyx_state) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__setstate_cython__ (wrapper)", 0);
  __pyx_r = __pyx_pf___pyx_array_2__setstate_cython__(((struct __pyx_array_obj *)__pyx_v_self), ((PyObject *)__pyx_v___pyx_state));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf___pyx_array_2__setstate_cython__(CYTHON_UNUSED struct __pyx_array_obj *__pyx_v_self, CYTHON_UNUSED PyObject *__pyx_v___pyx_state) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__setstate_cython__", 0);

  /* "(tree fragment)":4
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")             # <<<<<<<<<<<<<<
 */
  __pyx_t_1 = __Pyx_PyObject_Call(__pyx_builtin_TypeError, __pyx_tuple__10, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_Raise(__pyx_t_1, 0, 0, 0);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __PYX_ERR(2, 4, __pyx_L1_error)

  /* "(tree fragment)":3
 * def __reduce_cython__(self):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):             # <<<<<<<<<<<<<<
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("View.MemoryView.array.__setstate_cython__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":244
 * 
 * @cname("__pyx_array_new")
 * cdef array array_cwrapper(tuple shape, Py_ssize_t itemsize, char *format,             # <<<<<<<<<<<<<<
 *                           char *mode, char *buf):
 *     cdef array result
 */

static struct __pyx_array_obj *__pyx_array_new(PyObject *__pyx_v_shape, Py_ssize_t __pyx_v_itemsize, char *__pyx_v_format, char *__pyx_v_mode, char *__pyx_v_buf) {
  struct __pyx_array_obj *__pyx_v_result = 0;
  struct __pyx_array_obj *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  PyObject *__pyx_t_4 = NULL;
  PyObject *__pyx_t_5 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("array_cwrapper", 0);

  /* "View.MemoryView":248
 *     cdef array result
 * 
 *     if buf == NULL:             # <<<<<<<<<<<<<<
 *         result = array(shape, itemsize, format, mode.decode('ASCII'))
 *     else:
 */
  __pyx_t_1 = ((__pyx_v_buf == NULL) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":249
 * 
 *     if buf == NULL:
 *         result = array(shape, itemsize, format, mode.decode('ASCII'))             # <<<<<<<<<<<<<<
 *     else:
 *         result = array(shape, itemsize, format, mode.decode('ASCII'),
 */
    __pyx_t_2 = PyInt_FromSsize_t(__pyx_v_itemsize); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 249, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __pyx_t_3 = __Pyx_PyBytes_FromString(__pyx_v_format); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 249, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_t_4 = __Pyx_decode_c_string(__pyx_v_mode, 0, strlen(__pyx_v_mode), NULL, NULL, PyUnicode_DecodeASCII); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 249, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __pyx_t_5 = PyTuple_New(4); if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 249, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
    __Pyx_INCREF(__pyx_v_shape);
    __Pyx_GIVEREF(__pyx_v_shape);
    PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_v_shape);
    __Pyx_GIVEREF(__pyx_t_2);
    PyTuple_SET_ITEM(__pyx_t_5, 1, __pyx_t_2);
    __Pyx_GIVEREF(__pyx_t_3);
    PyTuple_SET_ITEM(__pyx_t_5, 2, __pyx_t_3);
    __Pyx_GIVEREF(__pyx_t_4);
    PyTuple_SET_ITEM(__pyx_t_5, 3, __pyx_t_4);
    __pyx_t_2 = 0;
    __pyx_t_3 = 0;
    __pyx_t_4 = 0;
    __pyx_t_4 = __Pyx_PyObject_Call(((PyObject *)__pyx_array_type), __pyx_t_5, NULL); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 249, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
    __pyx_v_result = ((struct __pyx_array_obj *)__pyx_t_4);
    __pyx_t_4 = 0;

    /* "View.MemoryView":248
 *     cdef array result
 * 
 *     if buf == NULL:             # <<<<<<<<<<<<<<
 *         result = array(shape, itemsize, format, mode.decode('ASCII'))
 *     else:
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":251
 *         result = array(shape, itemsize, format, mode.decode('ASCII'))
 *     else:
 *         result = array(shape, itemsize, format, mode.decode('ASCII'),             # <<<<<<<<<<<<<<
 *                        allocate_buffer=False)
 *         result.data = buf
 */
  /*else*/ {
    __pyx_t_4 = PyInt_FromSsize_t(__pyx_v_itemsize); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 251, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __pyx_t_5 = __Pyx_PyBytes_FromString(__pyx_v_format); if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 251, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
    __pyx_t_3 = __Pyx_decode_c_string(__pyx_v_mode, 0, strlen(__pyx_v_mode), NULL, NULL, PyUnicode_DecodeASCII); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 251, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_t_2 = PyTuple_New(4); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 251, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __Pyx_INCREF(__pyx_v_shape);
    __Pyx_GIVEREF(__pyx_v_shape);
    PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_v_shape);
    __Pyx_GIVEREF(__pyx_t_4);
    PyTuple_SET_ITEM(__pyx_t_2, 1, __pyx_t_4);
    __Pyx_GIVEREF(__pyx_t_5);
    PyTuple_SET_ITEM(__pyx_t_2, 2, __pyx_t_5);
    __Pyx_GIVEREF(__pyx_t_3);
    PyTuple_SET_ITEM(__pyx_t_2, 3, __pyx_t_3);
    __pyx_t_4 = 0;
    __pyx_t_5 = 0;
    __pyx_t_3 = 0;

    /* "View.MemoryView":252
 *     else:
 *         result = array(shape, itemsize, format, mode.decode('ASCII'),
 *                        allocate_buffer=False)             # <<<<<<<<<<<<<<
 *         result.data = buf
 * 
 */
    __pyx_t_3 = __Pyx_PyDict_NewPresized(1); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 252, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    if (PyDict_SetItem(__pyx_t_3, __pyx_n_s_allocate_buffer, Py_False) < 0) __PYX_ERR(2, 252, __pyx_L1_error)

    /* "View.MemoryView":251
 *         result = array(shape, itemsize, format, mode.decode('ASCII'))
 *     else:
 *         result = array(shape, itemsize, format, mode.decode('ASCII'),             # <<<<<<<<<<<<<<
 *                        allocate_buffer=False)
 *         result.data = buf
 */
    __pyx_t_5 = __Pyx_PyObject_Call(((PyObject *)__pyx_array_type), __pyx_t_2, __pyx_t_3); if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 251, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __pyx_v_result = ((struct __pyx_array_obj *)__pyx_t_5);
    __pyx_t_5 = 0;

    /* "View.MemoryView":253
 *         result = array(shape, itemsize, format, mode.decode('ASCII'),
 *                        allocate_buffer=False)
 *         result.data = buf             # <<<<<<<<<<<<<<
 * 
 *     return result
 */
    __pyx_v_result->data = __pyx_v_buf;
  }
  __pyx_L3:;

  /* "View.MemoryView":255
 *         result.data = buf
 * 
 *     return result             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __Pyx_XDECREF(((PyObject *)__pyx_r));
  __Pyx_INCREF(((PyObject *)__pyx_v_result));
  __pyx_r = __pyx_v_result;
  goto __pyx_L0;

  /* "View.MemoryView":244
 * 
 * @cname("__pyx_array_new")
 * cdef array array_cwrapper(tuple shape, Py_ssize_t itemsize, char *format,             # <<<<<<<<<<<<<<
 *                           char *mode, char *buf):
 *     cdef array result
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_AddTraceback("View.MemoryView.array_cwrapper", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XDECREF((PyObject *)__pyx_v_result);
  __Pyx_XGIVEREF((PyObject *)__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":281
 * cdef class Enum(object):
 *     cdef object name
 *     def __init__(self, name):             # <<<<<<<<<<<<<<
 *         self.name = name
 *     def __repr__(self):
 */

/* Python wrapper */
static int __pyx_MemviewEnum___init__(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static int __pyx_MemviewEnum___init__(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
  PyObject *__pyx_v_name = 0;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__init__ (wrapper)", 0);
  {
    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_name,0};
    PyObject* values[1] = {0};
    if (unlikely(__pyx_kwds)) {
      Py_ssize_t kw_args;
      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);
      switch (pos_args) {
        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        CYTHON_FALLTHROUGH;
        case  0: break;
        default: goto __pyx_L5_argtuple_error;
      }
      kw_args = PyDict_Size(__pyx_kwds);
      switch (pos_args) {
        case  0:
        if (likely((values[0] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_name)) != 0)) kw_args--;
        else goto __pyx_L5_argtuple_error;
      }
      if (unlikely(kw_args > 0)) {
        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "__init__") < 0)) __PYX_ERR(2, 281, __pyx_L3_error)
      }
    } else if (PyTuple_GET_SIZE(__pyx_args) != 1) {
      goto __pyx_L5_argtuple_error;
    } else {
      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
    }
    __pyx_v_name = values[0];
  }
  goto __pyx_L4_argument_unpacking_done;
  __pyx_L5_argtuple_error:;
  __Pyx_RaiseArgtupleInvalid("__init__", 1, 1, 1, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(2, 281, __pyx_L3_error)
  __pyx_L3_error:;
  __Pyx_AddTraceback("View.MemoryView.Enum.__init__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __Pyx_RefNannyFinishContext();
  return -1;
  __pyx_L4_argument_unpacking_done:;
  __pyx_r = __pyx_MemviewEnum___pyx_pf_15View_dot_MemoryView_4Enum___init__(((struct __pyx_MemviewEnum_obj *)__pyx_v_self), __pyx_v_name);

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static int __pyx_MemviewEnum___pyx_pf_15View_dot_MemoryView_4Enum___init__(struct __pyx_MemviewEnum_obj *__pyx_v_self, PyObject *__pyx_v_name) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__init__", 0);

  /* "View.MemoryView":282
 *     cdef object name
 *     def __init__(self, name):
 *         self.name = name             # <<<<<<<<<<<<<<
 *     def __repr__(self):
 *         return self.name
 */
  __Pyx_INCREF(__pyx_v_name);
  __Pyx_GIVEREF(__pyx_v_name);
  __Pyx_GOTREF(__pyx_v_self->name);
  __Pyx_DECREF(__pyx_v_self->name);
  __pyx_v_self->name = __pyx_v_name;

  /* "View.MemoryView":281
 * cdef class Enum(object):
 *     cdef object name
 *     def __init__(self, name):             # <<<<<<<<<<<<<<
 *         self.name = name
 *     def __repr__(self):
 */

  /* function exit code */
  __pyx_r = 0;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":283
 *     def __init__(self, name):
 *         self.name = name
 *     def __repr__(self):             # <<<<<<<<<<<<<<
 *         return self.name
 * 
 */

/* Python wrapper */
static PyObject *__pyx_MemviewEnum___repr__(PyObject *__pyx_v_self); /*proto*/
static PyObject *__pyx_MemviewEnum___repr__(PyObject *__pyx_v_self) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__repr__ (wrapper)", 0);
  __pyx_r = __pyx_MemviewEnum___pyx_pf_15View_dot_MemoryView_4Enum_2__repr__(((struct __pyx_MemviewEnum_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_MemviewEnum___pyx_pf_15View_dot_MemoryView_4Enum_2__repr__(struct __pyx_MemviewEnum_obj *__pyx_v_self) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__repr__", 0);

  /* "View.MemoryView":284
 *         self.name = name
 *     def __repr__(self):
 *         return self.name             # <<<<<<<<<<<<<<
 * 
 * cdef generic = Enum("<strided and direct or indirect>")
 */
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_self->name);
  __pyx_r = __pyx_v_self->name;
  goto __pyx_L0;

  /* "View.MemoryView":283
 *     def __init__(self, name):
 *         self.name = name
 *     def __repr__(self):             # <<<<<<<<<<<<<<
 *         return self.name
 * 
 */

  /* function exit code */
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "(tree fragment)":1
 * def __reduce_cython__(self):             # <<<<<<<<<<<<<<
 *     cdef tuple state
 *     cdef object _dict
 */

/* Python wrapper */
static PyObject *__pyx_pw___pyx_MemviewEnum_1__reduce_cython__(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused); /*proto*/
static PyObject *__pyx_pw___pyx_MemviewEnum_1__reduce_cython__(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__reduce_cython__ (wrapper)", 0);
  __pyx_r = __pyx_pf___pyx_MemviewEnum___reduce_cython__(((struct __pyx_MemviewEnum_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf___pyx_MemviewEnum___reduce_cython__(struct __pyx_MemviewEnum_obj *__pyx_v_self) {
  PyObject *__pyx_v_state = 0;
  PyObject *__pyx_v__dict = 0;
  int __pyx_v_use_setstate;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_t_2;
  int __pyx_t_3;
  PyObject *__pyx_t_4 = NULL;
  PyObject *__pyx_t_5 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__reduce_cython__", 0);

  /* "(tree fragment)":5
 *     cdef object _dict
 *     cdef bint use_setstate
 *     state = (self.name,)             # <<<<<<<<<<<<<<
 *     _dict = getattr(self, '__dict__', None)
 *     if _dict is not None:
 */
  __pyx_t_1 = PyTuple_New(1); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 5, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_INCREF(__pyx_v_self->name);
  __Pyx_GIVEREF(__pyx_v_self->name);
  PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_v_self->name);
  __pyx_v_state = ((PyObject*)__pyx_t_1);
  __pyx_t_1 = 0;

  /* "(tree fragment)":6
 *     cdef bint use_setstate
 *     state = (self.name,)
 *     _dict = getattr(self, '__dict__', None)             # <<<<<<<<<<<<<<
 *     if _dict is not None:
 *         state += (_dict,)
 */
  __pyx_t_1 = __Pyx_GetAttr3(((PyObject *)__pyx_v_self), __pyx_n_s_dict, Py_None); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 6, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_v__dict = __pyx_t_1;
  __pyx_t_1 = 0;

  /* "(tree fragment)":7
 *     state = (self.name,)
 *     _dict = getattr(self, '__dict__', None)
 *     if _dict is not None:             # <<<<<<<<<<<<<<
 *         state += (_dict,)
 *         use_setstate = True
 */
  __pyx_t_2 = (__pyx_v__dict != Py_None);
  __pyx_t_3 = (__pyx_t_2 != 0);
  if (__pyx_t_3) {

    /* "(tree fragment)":8
 *     _dict = getattr(self, '__dict__', None)
 *     if _dict is not None:
 *         state += (_dict,)             # <<<<<<<<<<<<<<
 *         use_setstate = True
 *     else:
 */
    __pyx_t_1 = PyTuple_New(1); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 8, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    __Pyx_INCREF(__pyx_v__dict);
    __Pyx_GIVEREF(__pyx_v__dict);
    PyTuple_SET_ITEM(__pyx_t_1, 0, __pyx_v__dict);
    __pyx_t_4 = PyNumber_InPlaceAdd(__pyx_v_state, __pyx_t_1); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 8, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
    __Pyx_DECREF_SET(__pyx_v_state, ((PyObject*)__pyx_t_4));
    __pyx_t_4 = 0;

    /* "(tree fragment)":9
 *     if _dict is not None:
 *         state += (_dict,)
 *         use_setstate = True             # <<<<<<<<<<<<<<
 *     else:
 *         use_setstate = self.name is not None
 */
    __pyx_v_use_setstate = 1;

    /* "(tree fragment)":7
 *     state = (self.name,)
 *     _dict = getattr(self, '__dict__', None)
 *     if _dict is not None:             # <<<<<<<<<<<<<<
 *         state += (_dict,)
 *         use_setstate = True
 */
    goto __pyx_L3;
  }

  /* "(tree fragment)":11
 *         use_setstate = True
 *     else:
 *         use_setstate = self.name is not None             # <<<<<<<<<<<<<<
 *     if use_setstate:
 *         return __pyx_unpickle_Enum, (type(self), 0xb068931, None), state
 */
  /*else*/ {
    __pyx_t_3 = (__pyx_v_self->name != Py_None);
    __pyx_v_use_setstate = __pyx_t_3;
  }
  __pyx_L3:;

  /* "(tree fragment)":12
 *     else:
 *         use_setstate = self.name is not None
 *     if use_setstate:             # <<<<<<<<<<<<<<
 *         return __pyx_unpickle_Enum, (type(self), 0xb068931, None), state
 *     else:
 */
  __pyx_t_3 = (__pyx_v_use_setstate != 0);
  if (__pyx_t_3) {

    /* "(tree fragment)":13
 *         use_setstate = self.name is not None
 *     if use_setstate:
 *         return __pyx_unpickle_Enum, (type(self), 0xb068931, None), state             # <<<<<<<<<<<<<<
 *     else:
 *         return __pyx_unpickle_Enum, (type(self), 0xb068931, state)
 */
    __Pyx_XDECREF(__pyx_r);
    __Pyx_GetModuleGlobalName(__pyx_t_4, __pyx_n_s_pyx_unpickle_Enum); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 13, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __pyx_t_1 = PyTuple_New(3); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 13, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    __Pyx_INCREF(((PyObject *)Py_TYPE(((PyObject *)__pyx_v_self))));
    __Pyx_GIVEREF(((PyObject *)Py_TYPE(((PyObject *)__pyx_v_self))));
    PyTuple_SET_ITEM(__pyx_t_1, 0, ((PyObject *)Py_TYPE(((PyObject *)__pyx_v_self))));
    __Pyx_INCREF(__pyx_int_184977713);
    __Pyx_GIVEREF(__pyx_int_184977713);
    PyTuple_SET_ITEM(__pyx_t_1, 1, __pyx_int_184977713);
    __Pyx_INCREF(Py_None);
    __Pyx_GIVEREF(Py_None);
    PyTuple_SET_ITEM(__pyx_t_1, 2, Py_None);
    __pyx_t_5 = PyTuple_New(3); if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 13, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
    __Pyx_GIVEREF(__pyx_t_4);
    PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_4);
    __Pyx_GIVEREF(__pyx_t_1);
    PyTuple_SET_ITEM(__pyx_t_5, 1, __pyx_t_1);
    __Pyx_INCREF(__pyx_v_state);
    __Pyx_GIVEREF(__pyx_v_state);
    PyTuple_SET_ITEM(__pyx_t_5, 2, __pyx_v_state);
    __pyx_t_4 = 0;
    __pyx_t_1 = 0;
    __pyx_r = __pyx_t_5;
    __pyx_t_5 = 0;
    goto __pyx_L0;

    /* "(tree fragment)":12
 *     else:
 *         use_setstate = self.name is not None
 *     if use_setstate:             # <<<<<<<<<<<<<<
 *         return __pyx_unpickle_Enum, (type(self), 0xb068931, None), state
 *     else:
 */
  }

  /* "(tree fragment)":15
 *         return __pyx_unpickle_Enum, (type(self), 0xb068931, None), state
 *     else:
 *         return __pyx_unpickle_Enum, (type(self), 0xb068931, state)             # <<<<<<<<<<<<<<
 * def __setstate_cython__(self, __pyx_state):
 *     __pyx_unpickle_Enum__set_state(self, __pyx_state)
 */
  /*else*/ {
    __Pyx_XDECREF(__pyx_r);
    __Pyx_GetModuleGlobalName(__pyx_t_5, __pyx_n_s_pyx_unpickle_Enum); if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 15, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
    __pyx_t_1 = PyTuple_New(3); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 15, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    __Pyx_INCREF(((PyObject *)Py_TYPE(((PyObject *)__pyx_v_self))));
    __Pyx_GIVEREF(((PyObject *)Py_TYPE(((PyObject *)__pyx_v_self))));
    PyTuple_SET_ITEM(__pyx_t_1, 0, ((PyObject *)Py_TYPE(((PyObject *)__pyx_v_self))));
    __Pyx_INCREF(__pyx_int_184977713);
    __Pyx_GIVEREF(__pyx_int_184977713);
    PyTuple_SET_ITEM(__pyx_t_1, 1, __pyx_int_184977713);
    __Pyx_INCREF(__pyx_v_state);
    __Pyx_GIVEREF(__pyx_v_state);
    PyTuple_SET_ITEM(__pyx_t_1, 2, __pyx_v_state);
    __pyx_t_4 = PyTuple_New(2); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 15, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __Pyx_GIVEREF(__pyx_t_5);
    PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_5);
    __Pyx_GIVEREF(__pyx_t_1);
    PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_t_1);
    __pyx_t_5 = 0;
    __pyx_t_1 = 0;
    __pyx_r = __pyx_t_4;
    __pyx_t_4 = 0;
    goto __pyx_L0;
  }

  /* "(tree fragment)":1
 * def __reduce_cython__(self):             # <<<<<<<<<<<<<<
 *     cdef tuple state
 *     cdef object _dict
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_AddTraceback("View.MemoryView.Enum.__reduce_cython__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_state);
  __Pyx_XDECREF(__pyx_v__dict);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "(tree fragment)":16
 *     else:
 *         return __pyx_unpickle_Enum, (type(self), 0xb068931, state)
 * def __setstate_cython__(self, __pyx_state):             # <<<<<<<<<<<<<<
 *     __pyx_unpickle_Enum__set_state(self, __pyx_state)
 */

/* Python wrapper */
static PyObject *__pyx_pw___pyx_MemviewEnum_3__setstate_cython__(PyObject *__pyx_v_self, PyObject *__pyx_v___pyx_state); /*proto*/
static PyObject *__pyx_pw___pyx_MemviewEnum_3__setstate_cython__(PyObject *__pyx_v_self, PyObject *__pyx_v___pyx_state) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__setstate_cython__ (wrapper)", 0);
  __pyx_r = __pyx_pf___pyx_MemviewEnum_2__setstate_cython__(((struct __pyx_MemviewEnum_obj *)__pyx_v_self), ((PyObject *)__pyx_v___pyx_state));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf___pyx_MemviewEnum_2__setstate_cython__(struct __pyx_MemviewEnum_obj *__pyx_v_self, PyObject *__pyx_v___pyx_state) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__setstate_cython__", 0);

  /* "(tree fragment)":17
 *         return __pyx_unpickle_Enum, (type(self), 0xb068931, state)
 * def __setstate_cython__(self, __pyx_state):
 *     __pyx_unpickle_Enum__set_state(self, __pyx_state)             # <<<<<<<<<<<<<<
 */
  if (!(likely(PyTuple_CheckExact(__pyx_v___pyx_state))||((__pyx_v___pyx_state) == Py_None)||(PyErr_Format(PyExc_TypeError, "Expected %.16s, got %.200s", "tuple", Py_TYPE(__pyx_v___pyx_state)->tp_name), 0))) __PYX_ERR(2, 17, __pyx_L1_error)
  __pyx_t_1 = __pyx_unpickle_Enum__set_state(__pyx_v_self, ((PyObject*)__pyx_v___pyx_state)); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 17, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

  /* "(tree fragment)":16
 *     else:
 *         return __pyx_unpickle_Enum, (type(self), 0xb068931, state)
 * def __setstate_cython__(self, __pyx_state):             # <<<<<<<<<<<<<<
 *     __pyx_unpickle_Enum__set_state(self, __pyx_state)
 */

  /* function exit code */
  __pyx_r = Py_None; __Pyx_INCREF(Py_None);
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("View.MemoryView.Enum.__setstate_cython__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":298
 * 
 * @cname('__pyx_align_pointer')
 * cdef void *align_pointer(void *memory, size_t alignment) nogil:             # <<<<<<<<<<<<<<
 *     "Align pointer memory on a given boundary"
 *     cdef Py_intptr_t aligned_p = <Py_intptr_t> memory
 */

static void *__pyx_align_pointer(void *__pyx_v_memory, size_t __pyx_v_alignment) {
  Py_intptr_t __pyx_v_aligned_p;
  size_t __pyx_v_offset;
  void *__pyx_r;
  int __pyx_t_1;

  /* "View.MemoryView":300
 * cdef void *align_pointer(void *memory, size_t alignment) nogil:
 *     "Align pointer memory on a given boundary"
 *     cdef Py_intptr_t aligned_p = <Py_intptr_t> memory             # <<<<<<<<<<<<<<
 *     cdef size_t offset
 * 
 */
  __pyx_v_aligned_p = ((Py_intptr_t)__pyx_v_memory);

  /* "View.MemoryView":304
 * 
 *     with cython.cdivision(True):
 *         offset = aligned_p % alignment             # <<<<<<<<<<<<<<
 * 
 *     if offset > 0:
 */
  __pyx_v_offset = (__pyx_v_aligned_p % __pyx_v_alignment);

  /* "View.MemoryView":306
 *         offset = aligned_p % alignment
 * 
 *     if offset > 0:             # <<<<<<<<<<<<<<
 *         aligned_p += alignment - offset
 * 
 */
  __pyx_t_1 = ((__pyx_v_offset > 0) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":307
 * 
 *     if offset > 0:
 *         aligned_p += alignment - offset             # <<<<<<<<<<<<<<
 * 
 *     return <void *> aligned_p
 */
    __pyx_v_aligned_p = (__pyx_v_aligned_p + (__pyx_v_alignment - __pyx_v_offset));

    /* "View.MemoryView":306
 *         offset = aligned_p % alignment
 * 
 *     if offset > 0:             # <<<<<<<<<<<<<<
 *         aligned_p += alignment - offset
 * 
 */
  }

  /* "View.MemoryView":309
 *         aligned_p += alignment - offset
 * 
 *     return <void *> aligned_p             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_r = ((void *)__pyx_v_aligned_p);
  goto __pyx_L0;

  /* "View.MemoryView":298
 * 
 * @cname('__pyx_align_pointer')
 * cdef void *align_pointer(void *memory, size_t alignment) nogil:             # <<<<<<<<<<<<<<
 *     "Align pointer memory on a given boundary"
 *     cdef Py_intptr_t aligned_p = <Py_intptr_t> memory
 */

  /* function exit code */
  __pyx_L0:;
  return __pyx_r;
}

/* "View.MemoryView":345
 *     cdef __Pyx_TypeInfo *typeinfo
 * 
 *     def __cinit__(memoryview self, object obj, int flags, bint dtype_is_object=False):             # <<<<<<<<<<<<<<
 *         self.obj = obj
 *         self.flags = flags
 */

/* Python wrapper */
static int __pyx_memoryview___cinit__(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static int __pyx_memoryview___cinit__(PyObject *__pyx_v_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
  PyObject *__pyx_v_obj = 0;
  int __pyx_v_flags;
  int __pyx_v_dtype_is_object;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__cinit__ (wrapper)", 0);
  {
    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_obj,&__pyx_n_s_flags,&__pyx_n_s_dtype_is_object,0};
    PyObject* values[3] = {0,0,0};
    if (unlikely(__pyx_kwds)) {
      Py_ssize_t kw_args;
      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);
      switch (pos_args) {
        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        CYTHON_FALLTHROUGH;
        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        CYTHON_FALLTHROUGH;
        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        CYTHON_FALLTHROUGH;
        case  0: break;
        default: goto __pyx_L5_argtuple_error;
      }
      kw_args = PyDict_Size(__pyx_kwds);
      switch (pos_args) {
        case  0:
        if (likely((values[0] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_obj)) != 0)) kw_args--;
        else goto __pyx_L5_argtuple_error;
        CYTHON_FALLTHROUGH;
        case  1:
        if (likely((values[1] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_flags)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("__cinit__", 0, 2, 3, 1); __PYX_ERR(2, 345, __pyx_L3_error)
        }
        CYTHON_FALLTHROUGH;
        case  2:
        if (kw_args > 0) {
          PyObject* value = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_dtype_is_object);
          if (value) { values[2] = value; kw_args--; }
        }
      }
      if (unlikely(kw_args > 0)) {
        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "__cinit__") < 0)) __PYX_ERR(2, 345, __pyx_L3_error)
      }
    } else {
      switch (PyTuple_GET_SIZE(__pyx_args)) {
        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        CYTHON_FALLTHROUGH;
        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        break;
        default: goto __pyx_L5_argtuple_error;
      }
    }
    __pyx_v_obj = values[0];
    __pyx_v_flags = __Pyx_PyInt_As_int(values[1]); if (unlikely((__pyx_v_flags == (int)-1) && PyErr_Occurred())) __PYX_ERR(2, 345, __pyx_L3_error)
    if (values[2]) {
      __pyx_v_dtype_is_object = __Pyx_PyObject_IsTrue(values[2]); if (unlikely((__pyx_v_dtype_is_object == (int)-1) && PyErr_Occurred())) __PYX_ERR(2, 345, __pyx_L3_error)
    } else {
      __pyx_v_dtype_is_object = ((int)0);
    }
  }
  goto __pyx_L4_argument_unpacking_done;
  __pyx_L5_argtuple_error:;
  __Pyx_RaiseArgtupleInvalid("__cinit__", 0, 2, 3, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(2, 345, __pyx_L3_error)
  __pyx_L3_error:;
  __Pyx_AddTraceback("View.MemoryView.memoryview.__cinit__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __Pyx_RefNannyFinishContext();
  return -1;
  __pyx_L4_argument_unpacking_done:;
  __pyx_r = __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview___cinit__(((struct __pyx_memoryview_obj *)__pyx_v_self), __pyx_v_obj, __pyx_v_flags, __pyx_v_dtype_is_object);

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static int __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview___cinit__(struct __pyx_memoryview_obj *__pyx_v_self, PyObject *__pyx_v_obj, int __pyx_v_flags, int __pyx_v_dtype_is_object) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  int __pyx_t_2;
  int __pyx_t_3;
  int __pyx_t_4;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__cinit__", 0);

  /* "View.MemoryView":346
 * 
 *     def __cinit__(memoryview self, object obj, int flags, bint dtype_is_object=False):
 *         self.obj = obj             # <<<<<<<<<<<<<<
 *         self.flags = flags
 *         if type(self) is memoryview or obj is not None:
 */
  __Pyx_INCREF(__pyx_v_obj);
  __Pyx_GIVEREF(__pyx_v_obj);
  __Pyx_GOTREF(__pyx_v_self->obj);
  __Pyx_DECREF(__pyx_v_self->obj);
  __pyx_v_self->obj = __pyx_v_obj;

  /* "View.MemoryView":347
 *     def __cinit__(memoryview self, object obj, int flags, bint dtype_is_object=False):
 *         self.obj = obj
 *         self.flags = flags             # <<<<<<<<<<<<<<
 *         if type(self) is memoryview or obj is not None:
 *             __Pyx_GetBuffer(obj, &self.view, flags)
 */
  __pyx_v_self->flags = __pyx_v_flags;

  /* "View.MemoryView":348
 *         self.obj = obj
 *         self.flags = flags
 *         if type(self) is memoryview or obj is not None:             # <<<<<<<<<<<<<<
 *             __Pyx_GetBuffer(obj, &self.view, flags)
 *             if <PyObject *> self.view.obj == NULL:
 */
  __pyx_t_2 = (((PyObject *)Py_TYPE(((PyObject *)__pyx_v_self))) == ((PyObject *)__pyx_memoryview_type));
  __pyx_t_3 = (__pyx_t_2 != 0);
  if (!__pyx_t_3) {
  } else {
    __pyx_t_1 = __pyx_t_3;
    goto __pyx_L4_bool_binop_done;
  }
  __pyx_t_3 = (__pyx_v_obj != Py_None);
  __pyx_t_2 = (__pyx_t_3 != 0);
  __pyx_t_1 = __pyx_t_2;
  __pyx_L4_bool_binop_done:;
  if (__pyx_t_1) {

    /* "View.MemoryView":349
 *         self.flags = flags
 *         if type(self) is memoryview or obj is not None:
 *             __Pyx_GetBuffer(obj, &self.view, flags)             # <<<<<<<<<<<<<<
 *             if <PyObject *> self.view.obj == NULL:
 *                 (<__pyx_buffer *> &self.view).obj = Py_None
 */
    __pyx_t_4 = __Pyx_GetBuffer(__pyx_v_obj, (&__pyx_v_self->view), __pyx_v_flags); if (unlikely(__pyx_t_4 == ((int)-1))) __PYX_ERR(2, 349, __pyx_L1_error)

    /* "View.MemoryView":350
 *         if type(self) is memoryview or obj is not None:
 *             __Pyx_GetBuffer(obj, &self.view, flags)
 *             if <PyObject *> self.view.obj == NULL:             # <<<<<<<<<<<<<<
 *                 (<__pyx_buffer *> &self.view).obj = Py_None
 *                 Py_INCREF(Py_None)
 */
    __pyx_t_1 = ((((PyObject *)__pyx_v_self->view.obj) == NULL) != 0);
    if (__pyx_t_1) {

      /* "View.MemoryView":351
 *             __Pyx_GetBuffer(obj, &self.view, flags)
 *             if <PyObject *> self.view.obj == NULL:
 *                 (<__pyx_buffer *> &self.view).obj = Py_None             # <<<<<<<<<<<<<<
 *                 Py_INCREF(Py_None)
 * 
 */
      ((Py_buffer *)(&__pyx_v_self->view))->obj = Py_None;

      /* "View.MemoryView":352
 *             if <PyObject *> self.view.obj == NULL:
 *                 (<__pyx_buffer *> &self.view).obj = Py_None
 *                 Py_INCREF(Py_None)             # <<<<<<<<<<<<<<
 * 
 *         global __pyx_memoryview_thread_locks_used
 */
      Py_INCREF(Py_None);

      /* "View.MemoryView":350
 *         if type(self) is memoryview or obj is not None:
 *             __Pyx_GetBuffer(obj, &self.view, flags)
 *             if <PyObject *> self.view.obj == NULL:             # <<<<<<<<<<<<<<
 *                 (<__pyx_buffer *> &self.view).obj = Py_None
 *                 Py_INCREF(Py_None)
 */
    }

    /* "View.MemoryView":348
 *         self.obj = obj
 *         self.flags = flags
 *         if type(self) is memoryview or obj is not None:             # <<<<<<<<<<<<<<
 *             __Pyx_GetBuffer(obj, &self.view, flags)
 *             if <PyObject *> self.view.obj == NULL:
 */
  }

  /* "View.MemoryView":355
 * 
 *         global __pyx_memoryview_thread_locks_used
 *         if __pyx_memoryview_thread_locks_used < THREAD_LOCKS_PREALLOCATED:             # <<<<<<<<<<<<<<
 *             self.lock = __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used]
 *             __pyx_memoryview_thread_locks_used += 1
 */
  __pyx_t_1 = ((__pyx_memoryview_thread_locks_used < 8) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":356
 *         global __pyx_memoryview_thread_locks_used
 *         if __pyx_memoryview_thread_locks_used < THREAD_LOCKS_PREALLOCATED:
 *             self.lock = __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used]             # <<<<<<<<<<<<<<
 *             __pyx_memoryview_thread_locks_used += 1
 *         if self.lock is NULL:
 */
    __pyx_v_self->lock = (__pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used]);

    /* "View.MemoryView":357
 *         if __pyx_memoryview_thread_locks_used < THREAD_LOCKS_PREALLOCATED:
 *             self.lock = __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used]
 *             __pyx_memoryview_thread_locks_used += 1             # <<<<<<<<<<<<<<
 *         if self.lock is NULL:
 *             self.lock = PyThread_allocate_lock()
 */
    __pyx_memoryview_thread_locks_used = (__pyx_memoryview_thread_locks_used + 1);

    /* "View.MemoryView":355
 * 
 *         global __pyx_memoryview_thread_locks_used
 *         if __pyx_memoryview_thread_locks_used < THREAD_LOCKS_PREALLOCATED:             # <<<<<<<<<<<<<<
 *             self.lock = __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used]
 *             __pyx_memoryview_thread_locks_used += 1
 */
  }

  /* "View.MemoryView":358
 *             self.lock = __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used]
 *             __pyx_memoryview_thread_locks_used += 1
 *         if self.lock is NULL:             # <<<<<<<<<<<<<<
 *             self.lock = PyThread_allocate_lock()
 *             if self.lock is NULL:
 */
  __pyx_t_1 = ((__pyx_v_self->lock == NULL) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":359
 *             __pyx_memoryview_thread_locks_used += 1
 *         if self.lock is NULL:
 *             self.lock = PyThread_allocate_lock()             # <<<<<<<<<<<<<<
 *             if self.lock is NULL:
 *                 raise MemoryError
 */
    __pyx_v_self->lock = PyThread_allocate_lock();

    /* "View.MemoryView":360
 *         if self.lock is NULL:
 *             self.lock = PyThread_allocate_lock()
 *             if self.lock is NULL:             # <<<<<<<<<<<<<<
 *                 raise MemoryError
 * 
 */
    __pyx_t_1 = ((__pyx_v_self->lock == NULL) != 0);
    if (unlikely(__pyx_t_1)) {

      /* "View.MemoryView":361
 *             self.lock = PyThread_allocate_lock()
 *             if self.lock is NULL:
 *                 raise MemoryError             # <<<<<<<<<<<<<<
 * 
 *         if flags & PyBUF_FORMAT:
 */
      PyErr_NoMemory(); __PYX_ERR(2, 361, __pyx_L1_error)

      /* "View.MemoryView":360
 *         if self.lock is NULL:
 *             self.lock = PyThread_allocate_lock()
 *             if self.lock is NULL:             # <<<<<<<<<<<<<<
 *                 raise MemoryError
 * 
 */
    }

    /* "View.MemoryView":358
 *             self.lock = __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used]
 *             __pyx_memoryview_thread_locks_used += 1
 *         if self.lock is NULL:             # <<<<<<<<<<<<<<
 *             self.lock = PyThread_allocate_lock()
 *             if self.lock is NULL:
 */
  }

  /* "View.MemoryView":363
 *                 raise MemoryError
 * 
 *         if flags & PyBUF_FORMAT:             # <<<<<<<<<<<<<<
 *             self.dtype_is_object = (self.view.format[0] == b'O' and self.view.format[1] == b'\0')
 *         else:
 */
  __pyx_t_1 = ((__pyx_v_flags & PyBUF_FORMAT) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":364
 * 
 *         if flags & PyBUF_FORMAT:
 *             self.dtype_is_object = (self.view.format[0] == b'O' and self.view.format[1] == b'\0')             # <<<<<<<<<<<<<<
 *         else:
 *             self.dtype_is_object = dtype_is_object
 */
    __pyx_t_2 = (((__pyx_v_self->view.format[0]) == 'O') != 0);
    if (__pyx_t_2) {
    } else {
      __pyx_t_1 = __pyx_t_2;
      goto __pyx_L11_bool_binop_done;
    }
    __pyx_t_2 = (((__pyx_v_self->view.format[1]) == '\x00') != 0);
    __pyx_t_1 = __pyx_t_2;
    __pyx_L11_bool_binop_done:;
    __pyx_v_self->dtype_is_object = __pyx_t_1;

    /* "View.MemoryView":363
 *                 raise MemoryError
 * 
 *         if flags & PyBUF_FORMAT:             # <<<<<<<<<<<<<<
 *             self.dtype_is_object = (self.view.format[0] == b'O' and self.view.format[1] == b'\0')
 *         else:
 */
    goto __pyx_L10;
  }

  /* "View.MemoryView":366
 *             self.dtype_is_object = (self.view.format[0] == b'O' and self.view.format[1] == b'\0')
 *         else:
 *             self.dtype_is_object = dtype_is_object             # <<<<<<<<<<<<<<
 * 
 *         self.acquisition_count_aligned_p = <__pyx_atomic_int *> align_pointer(
 */
  /*else*/ {
    __pyx_v_self->dtype_is_object = __pyx_v_dtype_is_object;
  }
  __pyx_L10:;

  /* "View.MemoryView":368
 *             self.dtype_is_object = dtype_is_object
 * 
 *         self.acquisition_count_aligned_p = <__pyx_atomic_int *> align_pointer(             # <<<<<<<<<<<<<<
 *                   <void *> &self.acquisition_count[0], sizeof(__pyx_atomic_int))
 *         self.typeinfo = NULL
 */
  __pyx_v_self->acquisition_count_aligned_p = ((__pyx_atomic_int *)__pyx_align_pointer(((void *)(&(__pyx_v_self->acquisition_count[0]))), (sizeof(__pyx_atomic_int))));

  /* "View.MemoryView":370
 *         self.acquisition_count_aligned_p = <__pyx_atomic_int *> align_pointer(
 *                   <void *> &self.acquisition_count[0], sizeof(__pyx_atomic_int))
 *         self.typeinfo = NULL             # <<<<<<<<<<<<<<
 * 
 *     def __dealloc__(memoryview self):
 */
  __pyx_v_self->typeinfo = NULL;

  /* "View.MemoryView":345
 *     cdef __Pyx_TypeInfo *typeinfo
 * 
 *     def __cinit__(memoryview self, object obj, int flags, bint dtype_is_object=False):             # <<<<<<<<<<<<<<
 *         self.obj = obj
 *         self.flags = flags
 */

  /* function exit code */
  __pyx_r = 0;
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_AddTraceback("View.MemoryView.memoryview.__cinit__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = -1;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":372
 *         self.typeinfo = NULL
 * 
 *     def __dealloc__(memoryview self):             # <<<<<<<<<<<<<<
 *         if self.obj is not None:
 *             __Pyx_ReleaseBuffer(&self.view)
 */

/* Python wrapper */
static void __pyx_memoryview___dealloc__(PyObject *__pyx_v_self); /*proto*/
static void __pyx_memoryview___dealloc__(PyObject *__pyx_v_self) {
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__dealloc__ (wrapper)", 0);
  __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_2__dealloc__(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
}

static void __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_2__dealloc__(struct __pyx_memoryview_obj *__pyx_v_self) {
  int __pyx_v_i;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  int __pyx_t_2;
  int __pyx_t_3;
  int __pyx_t_4;
  int __pyx_t_5;
  PyThread_type_lock __pyx_t_6;
  PyThread_type_lock __pyx_t_7;
  __Pyx_RefNannySetupContext("__dealloc__", 0);

  /* "View.MemoryView":373
 * 
 *     def __dealloc__(memoryview self):
 *         if self.obj is not None:             # <<<<<<<<<<<<<<
 *             __Pyx_ReleaseBuffer(&self.view)
 *         elif (<__pyx_buffer *> &self.view).obj == Py_None:
 */
  __pyx_t_1 = (__pyx_v_self->obj != Py_None);
  __pyx_t_2 = (__pyx_t_1 != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":374
 *     def __dealloc__(memoryview self):
 *         if self.obj is not None:
 *             __Pyx_ReleaseBuffer(&self.view)             # <<<<<<<<<<<<<<
 *         elif (<__pyx_buffer *> &self.view).obj == Py_None:
 * 
 */
    __Pyx_ReleaseBuffer((&__pyx_v_self->view));

    /* "View.MemoryView":373
 * 
 *     def __dealloc__(memoryview self):
 *         if self.obj is not None:             # <<<<<<<<<<<<<<
 *             __Pyx_ReleaseBuffer(&self.view)
 *         elif (<__pyx_buffer *> &self.view).obj == Py_None:
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":375
 *         if self.obj is not None:
 *             __Pyx_ReleaseBuffer(&self.view)
 *         elif (<__pyx_buffer *> &self.view).obj == Py_None:             # <<<<<<<<<<<<<<
 * 
 *             (<__pyx_buffer *> &self.view).obj = NULL
 */
  __pyx_t_2 = ((((Py_buffer *)(&__pyx_v_self->view))->obj == Py_None) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":377
 *         elif (<__pyx_buffer *> &self.view).obj == Py_None:
 * 
 *             (<__pyx_buffer *> &self.view).obj = NULL             # <<<<<<<<<<<<<<
 *             Py_DECREF(Py_None)
 * 
 */
    ((Py_buffer *)(&__pyx_v_self->view))->obj = NULL;

    /* "View.MemoryView":378
 * 
 *             (<__pyx_buffer *> &self.view).obj = NULL
 *             Py_DECREF(Py_None)             # <<<<<<<<<<<<<<
 * 
 *         cdef int i
 */
    Py_DECREF(Py_None);

    /* "View.MemoryView":375
 *         if self.obj is not None:
 *             __Pyx_ReleaseBuffer(&self.view)
 *         elif (<__pyx_buffer *> &self.view).obj == Py_None:             # <<<<<<<<<<<<<<
 * 
 *             (<__pyx_buffer *> &self.view).obj = NULL
 */
  }
  __pyx_L3:;

  /* "View.MemoryView":382
 *         cdef int i
 *         global __pyx_memoryview_thread_locks_used
 *         if self.lock != NULL:             # <<<<<<<<<<<<<<
 *             for i in range(__pyx_memoryview_thread_locks_used):
 *                 if __pyx_memoryview_thread_locks[i] is self.lock:
 */
  __pyx_t_2 = ((__pyx_v_self->lock != NULL) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":383
 *         global __pyx_memoryview_thread_locks_used
 *         if self.lock != NULL:
 *             for i in range(__pyx_memoryview_thread_locks_used):             # <<<<<<<<<<<<<<
 *                 if __pyx_memoryview_thread_locks[i] is self.lock:
 *                     __pyx_memoryview_thread_locks_used -= 1
 */
    __pyx_t_3 = __pyx_memoryview_thread_locks_used;
    __pyx_t_4 = __pyx_t_3;
    for (__pyx_t_5 = 0; __pyx_t_5 < __pyx_t_4; __pyx_t_5+=1) {
      __pyx_v_i = __pyx_t_5;

      /* "View.MemoryView":384
 *         if self.lock != NULL:
 *             for i in range(__pyx_memoryview_thread_locks_used):
 *                 if __pyx_memoryview_thread_locks[i] is self.lock:             # <<<<<<<<<<<<<<
 *                     __pyx_memoryview_thread_locks_used -= 1
 *                     if i != __pyx_memoryview_thread_locks_used:
 */
      __pyx_t_2 = (((__pyx_memoryview_thread_locks[__pyx_v_i]) == __pyx_v_self->lock) != 0);
      if (__pyx_t_2) {

        /* "View.MemoryView":385
 *             for i in range(__pyx_memoryview_thread_locks_used):
 *                 if __pyx_memoryview_thread_locks[i] is self.lock:
 *                     __pyx_memoryview_thread_locks_used -= 1             # <<<<<<<<<<<<<<
 *                     if i != __pyx_memoryview_thread_locks_used:
 *                         __pyx_memoryview_thread_locks[i], __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used] = (
 */
        __pyx_memoryview_thread_locks_used = (__pyx_memoryview_thread_locks_used - 1);

        /* "View.MemoryView":386
 *                 if __pyx_memoryview_thread_locks[i] is self.lock:
 *                     __pyx_memoryview_thread_locks_used -= 1
 *                     if i != __pyx_memoryview_thread_locks_used:             # <<<<<<<<<<<<<<
 *                         __pyx_memoryview_thread_locks[i], __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used] = (
 *                             __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used], __pyx_memoryview_thread_locks[i])
 */
        __pyx_t_2 = ((__pyx_v_i != __pyx_memoryview_thread_locks_used) != 0);
        if (__pyx_t_2) {

          /* "View.MemoryView":388
 *                     if i != __pyx_memoryview_thread_locks_used:
 *                         __pyx_memoryview_thread_locks[i], __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used] = (
 *                             __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used], __pyx_memoryview_thread_locks[i])             # <<<<<<<<<<<<<<
 *                     break
 *             else:
 */
          __pyx_t_6 = (__pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used]);
          __pyx_t_7 = (__pyx_memoryview_thread_locks[__pyx_v_i]);

          /* "View.MemoryView":387
 *                     __pyx_memoryview_thread_locks_used -= 1
 *                     if i != __pyx_memoryview_thread_locks_used:
 *                         __pyx_memoryview_thread_locks[i], __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used] = (             # <<<<<<<<<<<<<<
 *                             __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used], __pyx_memoryview_thread_locks[i])
 *                     break
 */
          (__pyx_memoryview_thread_locks[__pyx_v_i]) = __pyx_t_6;
          (__pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used]) = __pyx_t_7;

          /* "View.MemoryView":386
 *                 if __pyx_memoryview_thread_locks[i] is self.lock:
 *                     __pyx_memoryview_thread_locks_used -= 1
 *                     if i != __pyx_memoryview_thread_locks_used:             # <<<<<<<<<<<<<<
 *                         __pyx_memoryview_thread_locks[i], __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used] = (
 *                             __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used], __pyx_memoryview_thread_locks[i])
 */
        }

        /* "View.MemoryView":389
 *                         __pyx_memoryview_thread_locks[i], __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used] = (
 *                             __pyx_memoryview_thread_locks[__pyx_memoryview_thread_locks_used], __pyx_memoryview_thread_locks[i])
 *                     break             # <<<<<<<<<<<<<<
 *             else:
 *                 PyThread_free_lock(self.lock)
 */
        goto __pyx_L6_break;

        /* "View.MemoryView":384
 *         if self.lock != NULL:
 *             for i in range(__pyx_memoryview_thread_locks_used):
 *                 if __pyx_memoryview_thread_locks[i] is self.lock:             # <<<<<<<<<<<<<<
 *                     __pyx_memoryview_thread_locks_used -= 1
 *                     if i != __pyx_memoryview_thread_locks_used:
 */
      }
    }
    /*else*/ {

      /* "View.MemoryView":391
 *                     break
 *             else:
 *                 PyThread_free_lock(self.lock)             # <<<<<<<<<<<<<<
 * 
 *     cdef char *get_item_pointer(memoryview self, object index) except NULL:
 */
      PyThread_free_lock(__pyx_v_self->lock);
    }
    __pyx_L6_break:;

    /* "View.MemoryView":382
 *         cdef int i
 *         global __pyx_memoryview_thread_locks_used
 *         if self.lock != NULL:             # <<<<<<<<<<<<<<
 *             for i in range(__pyx_memoryview_thread_locks_used):
 *                 if __pyx_memoryview_thread_locks[i] is self.lock:
 */
  }

  /* "View.MemoryView":372
 *         self.typeinfo = NULL
 * 
 *     def __dealloc__(memoryview self):             # <<<<<<<<<<<<<<
 *         if self.obj is not None:
 *             __Pyx_ReleaseBuffer(&self.view)
 */

  /* function exit code */
  __Pyx_RefNannyFinishContext();
}

/* "View.MemoryView":393
 *                 PyThread_free_lock(self.lock)
 * 
 *     cdef char *get_item_pointer(memoryview self, object index) except NULL:             # <<<<<<<<<<<<<<
 *         cdef Py_ssize_t dim
 *         cdef char *itemp = <char *> self.view.buf
 */

static char *__pyx_memoryview_get_item_pointer(struct __pyx_memoryview_obj *__pyx_v_self, PyObject *__pyx_v_index) {
  Py_ssize_t __pyx_v_dim;
  char *__pyx_v_itemp;
  PyObject *__pyx_v_idx = NULL;
  char *__pyx_r;
  __Pyx_RefNannyDeclarations
  Py_ssize_t __pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  Py_ssize_t __pyx_t_3;
  PyObject *(*__pyx_t_4)(PyObject *);
  PyObject *__pyx_t_5 = NULL;
  Py_ssize_t __pyx_t_6;
  char *__pyx_t_7;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("get_item_pointer", 0);

  /* "View.MemoryView":395
 *     cdef char *get_item_pointer(memoryview self, object index) except NULL:
 *         cdef Py_ssize_t dim
 *         cdef char *itemp = <char *> self.view.buf             # <<<<<<<<<<<<<<
 * 
 *         for dim, idx in enumerate(index):
 */
  __pyx_v_itemp = ((char *)__pyx_v_self->view.buf);

  /* "View.MemoryView":397
 *         cdef char *itemp = <char *> self.view.buf
 * 
 *         for dim, idx in enumerate(index):             # <<<<<<<<<<<<<<
 *             itemp = pybuffer_index(&self.view, itemp, idx, dim)
 * 
 */
  __pyx_t_1 = 0;
  if (likely(PyList_CheckExact(__pyx_v_index)) || PyTuple_CheckExact(__pyx_v_index)) {
    __pyx_t_2 = __pyx_v_index; __Pyx_INCREF(__pyx_t_2); __pyx_t_3 = 0;
    __pyx_t_4 = NULL;
  } else {
    __pyx_t_3 = -1; __pyx_t_2 = PyObject_GetIter(__pyx_v_index); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 397, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __pyx_t_4 = Py_TYPE(__pyx_t_2)->tp_iternext; if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 397, __pyx_L1_error)
  }
  for (;;) {
    if (likely(!__pyx_t_4)) {
      if (likely(PyList_CheckExact(__pyx_t_2))) {
        if (__pyx_t_3 >= PyList_GET_SIZE(__pyx_t_2)) break;
        #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS
        __pyx_t_5 = PyList_GET_ITEM(__pyx_t_2, __pyx_t_3); __Pyx_INCREF(__pyx_t_5); __pyx_t_3++; if (unlikely(0 < 0)) __PYX_ERR(2, 397, __pyx_L1_error)
        #else
        __pyx_t_5 = PySequence_ITEM(__pyx_t_2, __pyx_t_3); __pyx_t_3++; if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 397, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_5);
        #endif
      } else {
        if (__pyx_t_3 >= PyTuple_GET_SIZE(__pyx_t_2)) break;
        #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS
        __pyx_t_5 = PyTuple_GET_ITEM(__pyx_t_2, __pyx_t_3); __Pyx_INCREF(__pyx_t_5); __pyx_t_3++; if (unlikely(0 < 0)) __PYX_ERR(2, 397, __pyx_L1_error)
        #else
        __pyx_t_5 = PySequence_ITEM(__pyx_t_2, __pyx_t_3); __pyx_t_3++; if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 397, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_5);
        #endif
      }
    } else {
      __pyx_t_5 = __pyx_t_4(__pyx_t_2);
      if (unlikely(!__pyx_t_5)) {
        PyObject* exc_type = PyErr_Occurred();
        if (exc_type) {
          if (likely(__Pyx_PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear();
          else __PYX_ERR(2, 397, __pyx_L1_error)
        }
        break;
      }
      __Pyx_GOTREF(__pyx_t_5);
    }
    __Pyx_XDECREF_SET(__pyx_v_idx, __pyx_t_5);
    __pyx_t_5 = 0;
    __pyx_v_dim = __pyx_t_1;
    __pyx_t_1 = (__pyx_t_1 + 1);

    /* "View.MemoryView":398
 * 
 *         for dim, idx in enumerate(index):
 *             itemp = pybuffer_index(&self.view, itemp, idx, dim)             # <<<<<<<<<<<<<<
 * 
 *         return itemp
 */
    __pyx_t_6 = __Pyx_PyIndex_AsSsize_t(__pyx_v_idx); if (unlikely((__pyx_t_6 == (Py_ssize_t)-1) && PyErr_Occurred())) __PYX_ERR(2, 398, __pyx_L1_error)
    __pyx_t_7 = __pyx_pybuffer_index((&__pyx_v_self->view), __pyx_v_itemp, __pyx_t_6, __pyx_v_dim); if (unlikely(__pyx_t_7 == ((char *)NULL))) __PYX_ERR(2, 398, __pyx_L1_error)
    __pyx_v_itemp = __pyx_t_7;

    /* "View.MemoryView":397
 *         cdef char *itemp = <char *> self.view.buf
 * 
 *         for dim, idx in enumerate(index):             # <<<<<<<<<<<<<<
 *             itemp = pybuffer_index(&self.view, itemp, idx, dim)
 * 
 */
  }
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;

  /* "View.MemoryView":400
 *             itemp = pybuffer_index(&self.view, itemp, idx, dim)
 * 
 *         return itemp             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_r = __pyx_v_itemp;
  goto __pyx_L0;

  /* "View.MemoryView":393
 *                 PyThread_free_lock(self.lock)
 * 
 *     cdef char *get_item_pointer(memoryview self, object index) except NULL:             # <<<<<<<<<<<<<<
 *         cdef Py_ssize_t dim
 *         cdef char *itemp = <char *> self.view.buf
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_AddTraceback("View.MemoryView.memoryview.get_item_pointer", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_idx);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":403
 * 
 * 
 *     def __getitem__(memoryview self, object index):             # <<<<<<<<<<<<<<
 *         if index is Ellipsis:
 *             return self
 */

/* Python wrapper */
static PyObject *__pyx_memoryview___getitem__(PyObject *__pyx_v_self, PyObject *__pyx_v_index); /*proto*/
static PyObject *__pyx_memoryview___getitem__(PyObject *__pyx_v_self, PyObject *__pyx_v_index) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__getitem__ (wrapper)", 0);
  __pyx_r = __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_4__getitem__(((struct __pyx_memoryview_obj *)__pyx_v_self), ((PyObject *)__pyx_v_index));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_4__getitem__(struct __pyx_memoryview_obj *__pyx_v_self, PyObject *__pyx_v_index) {
  PyObject *__pyx_v_have_slices = NULL;
  PyObject *__pyx_v_indices = NULL;
  char *__pyx_v_itemp;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  int __pyx_t_2;
  PyObject *__pyx_t_3 = NULL;
  PyObject *__pyx_t_4 = NULL;
  PyObject *__pyx_t_5 = NULL;
  char *__pyx_t_6;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__getitem__", 0);

  /* "View.MemoryView":404
 * 
 *     def __getitem__(memoryview self, object index):
 *         if index is Ellipsis:             # <<<<<<<<<<<<<<
 *             return self
 * 
 */
  __pyx_t_1 = (__pyx_v_index == __pyx_builtin_Ellipsis);
  __pyx_t_2 = (__pyx_t_1 != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":405
 *     def __getitem__(memoryview self, object index):
 *         if index is Ellipsis:
 *             return self             # <<<<<<<<<<<<<<
 * 
 *         have_slices, indices = _unellipsify(index, self.view.ndim)
 */
    __Pyx_XDECREF(__pyx_r);
    __Pyx_INCREF(((PyObject *)__pyx_v_self));
    __pyx_r = ((PyObject *)__pyx_v_self);
    goto __pyx_L0;

    /* "View.MemoryView":404
 * 
 *     def __getitem__(memoryview self, object index):
 *         if index is Ellipsis:             # <<<<<<<<<<<<<<
 *             return self
 * 
 */
  }

  /* "View.MemoryView":407
 *             return self
 * 
 *         have_slices, indices = _unellipsify(index, self.view.ndim)             # <<<<<<<<<<<<<<
 * 
 *         cdef char *itemp
 */
  __pyx_t_3 = _unellipsify(__pyx_v_index, __pyx_v_self->view.ndim); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 407, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  if (likely(__pyx_t_3 != Py_None)) {
    PyObject* sequence = __pyx_t_3;
    Py_ssize_t size = __Pyx_PySequence_SIZE(sequence);
    if (unlikely(size != 2)) {
      if (size > 2) __Pyx_RaiseTooManyValuesError(2);
      else if (size >= 0) __Pyx_RaiseNeedMoreValuesError(size);
      __PYX_ERR(2, 407, __pyx_L1_error)
    }
    #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS
    __pyx_t_4 = PyTuple_GET_ITEM(sequence, 0); 
    __pyx_t_5 = PyTuple_GET_ITEM(sequence, 1); 
    __Pyx_INCREF(__pyx_t_4);
    __Pyx_INCREF(__pyx_t_5);
    #else
    __pyx_t_4 = PySequence_ITEM(sequence, 0); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 407, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __pyx_t_5 = PySequence_ITEM(sequence, 1); if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 407, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
    #endif
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  } else {
    __Pyx_RaiseNoneNotIterableError(); __PYX_ERR(2, 407, __pyx_L1_error)
  }
  __pyx_v_have_slices = __pyx_t_4;
  __pyx_t_4 = 0;
  __pyx_v_indices = __pyx_t_5;
  __pyx_t_5 = 0;

  /* "View.MemoryView":410
 * 
 *         cdef char *itemp
 *         if have_slices:             # <<<<<<<<<<<<<<
 *             return memview_slice(self, indices)
 *         else:
 */
  __pyx_t_2 = __Pyx_PyObject_IsTrue(__pyx_v_have_slices); if (unlikely(__pyx_t_2 < 0)) __PYX_ERR(2, 410, __pyx_L1_error)
  if (__pyx_t_2) {

    /* "View.MemoryView":411
 *         cdef char *itemp
 *         if have_slices:
 *             return memview_slice(self, indices)             # <<<<<<<<<<<<<<
 *         else:
 *             itemp = self.get_item_pointer(indices)
 */
    __Pyx_XDECREF(__pyx_r);
    __pyx_t_3 = ((PyObject *)__pyx_memview_slice(__pyx_v_self, __pyx_v_indices)); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 411, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_r = __pyx_t_3;
    __pyx_t_3 = 0;
    goto __pyx_L0;

    /* "View.MemoryView":410
 * 
 *         cdef char *itemp
 *         if have_slices:             # <<<<<<<<<<<<<<
 *             return memview_slice(self, indices)
 *         else:
 */
  }

  /* "View.MemoryView":413
 *             return memview_slice(self, indices)
 *         else:
 *             itemp = self.get_item_pointer(indices)             # <<<<<<<<<<<<<<
 *             return self.convert_item_to_object(itemp)
 * 
 */
  /*else*/ {
    __pyx_t_6 = ((struct __pyx_vtabstruct_memoryview *)__pyx_v_self->__pyx_vtab)->get_item_pointer(__pyx_v_self, __pyx_v_indices); if (unlikely(__pyx_t_6 == ((char *)NULL))) __PYX_ERR(2, 413, __pyx_L1_error)
    __pyx_v_itemp = __pyx_t_6;

    /* "View.MemoryView":414
 *         else:
 *             itemp = self.get_item_pointer(indices)
 *             return self.convert_item_to_object(itemp)             # <<<<<<<<<<<<<<
 * 
 *     def __setitem__(memoryview self, object index, object value):
 */
    __Pyx_XDECREF(__pyx_r);
    __pyx_t_3 = ((struct __pyx_vtabstruct_memoryview *)__pyx_v_self->__pyx_vtab)->convert_item_to_object(__pyx_v_self, __pyx_v_itemp); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 414, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_r = __pyx_t_3;
    __pyx_t_3 = 0;
    goto __pyx_L0;
  }

  /* "View.MemoryView":403
 * 
 * 
 *     def __getitem__(memoryview self, object index):             # <<<<<<<<<<<<<<
 *         if index is Ellipsis:
 *             return self
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_AddTraceback("View.MemoryView.memoryview.__getitem__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_have_slices);
  __Pyx_XDECREF(__pyx_v_indices);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":416
 *             return self.convert_item_to_object(itemp)
 * 
 *     def __setitem__(memoryview self, object index, object value):             # <<<<<<<<<<<<<<
 *         if self.view.readonly:
 *             raise TypeError("Cannot assign to read-only memoryview")
 */

/* Python wrapper */
static int __pyx_memoryview___setitem__(PyObject *__pyx_v_self, PyObject *__pyx_v_index, PyObject *__pyx_v_value); /*proto*/
static int __pyx_memoryview___setitem__(PyObject *__pyx_v_self, PyObject *__pyx_v_index, PyObject *__pyx_v_value) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__setitem__ (wrapper)", 0);
  __pyx_r = __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_6__setitem__(((struct __pyx_memoryview_obj *)__pyx_v_self), ((PyObject *)__pyx_v_index), ((PyObject *)__pyx_v_value));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static int __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_6__setitem__(struct __pyx_memoryview_obj *__pyx_v_self, PyObject *__pyx_v_index, PyObject *__pyx_v_value) {
  PyObject *__pyx_v_have_slices = NULL;
  PyObject *__pyx_v_obj = NULL;
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  PyObject *__pyx_t_4 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__setitem__", 0);
  __Pyx_INCREF(__pyx_v_index);

  /* "View.MemoryView":417
 * 
 *     def __setitem__(memoryview self, object index, object value):
 *         if self.view.readonly:             # <<<<<<<<<<<<<<
 *             raise TypeError("Cannot assign to read-only memoryview")
 * 
 */
  __pyx_t_1 = (__pyx_v_self->view.readonly != 0);
  if (unlikely(__pyx_t_1)) {

    /* "View.MemoryView":418
 *     def __setitem__(memoryview self, object index, object value):
 *         if self.view.readonly:
 *             raise TypeError("Cannot assign to read-only memoryview")             # <<<<<<<<<<<<<<
 * 
 *         have_slices, index = _unellipsify(index, self.view.ndim)
 */
    __pyx_t_2 = __Pyx_PyObject_Call(__pyx_builtin_TypeError, __pyx_tuple__11, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 418, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __Pyx_Raise(__pyx_t_2, 0, 0, 0);
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    __PYX_ERR(2, 418, __pyx_L1_error)

    /* "View.MemoryView":417
 * 
 *     def __setitem__(memoryview self, object index, object value):
 *         if self.view.readonly:             # <<<<<<<<<<<<<<
 *             raise TypeError("Cannot assign to read-only memoryview")
 * 
 */
  }

  /* "View.MemoryView":420
 *             raise TypeError("Cannot assign to read-only memoryview")
 * 
 *         have_slices, index = _unellipsify(index, self.view.ndim)             # <<<<<<<<<<<<<<
 * 
 *         if have_slices:
 */
  __pyx_t_2 = _unellipsify(__pyx_v_index, __pyx_v_self->view.ndim); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 420, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  if (likely(__pyx_t_2 != Py_None)) {
    PyObject* sequence = __pyx_t_2;
    Py_ssize_t size = __Pyx_PySequence_SIZE(sequence);
    if (unlikely(size != 2)) {
      if (size > 2) __Pyx_RaiseTooManyValuesError(2);
      else if (size >= 0) __Pyx_RaiseNeedMoreValuesError(size);
      __PYX_ERR(2, 420, __pyx_L1_error)
    }
    #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS
    __pyx_t_3 = PyTuple_GET_ITEM(sequence, 0); 
    __pyx_t_4 = PyTuple_GET_ITEM(sequence, 1); 
    __Pyx_INCREF(__pyx_t_3);
    __Pyx_INCREF(__pyx_t_4);
    #else
    __pyx_t_3 = PySequence_ITEM(sequence, 0); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 420, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_t_4 = PySequence_ITEM(sequence, 1); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 420, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    #endif
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  } else {
    __Pyx_RaiseNoneNotIterableError(); __PYX_ERR(2, 420, __pyx_L1_error)
  }
  __pyx_v_have_slices = __pyx_t_3;
  __pyx_t_3 = 0;
  __Pyx_DECREF_SET(__pyx_v_index, __pyx_t_4);
  __pyx_t_4 = 0;

  /* "View.MemoryView":422
 *         have_slices, index = _unellipsify(index, self.view.ndim)
 * 
 *         if have_slices:             # <<<<<<<<<<<<<<
 *             obj = self.is_slice(value)
 *             if obj:
 */
  __pyx_t_1 = __Pyx_PyObject_IsTrue(__pyx_v_have_slices); if (unlikely(__pyx_t_1 < 0)) __PYX_ERR(2, 422, __pyx_L1_error)
  if (__pyx_t_1) {

    /* "View.MemoryView":423
 * 
 *         if have_slices:
 *             obj = self.is_slice(value)             # <<<<<<<<<<<<<<
 *             if obj:
 *                 self.setitem_slice_assignment(self[index], obj)
 */
    __pyx_t_2 = ((struct __pyx_vtabstruct_memoryview *)__pyx_v_self->__pyx_vtab)->is_slice(__pyx_v_self, __pyx_v_value); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 423, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __pyx_v_obj = __pyx_t_2;
    __pyx_t_2 = 0;

    /* "View.MemoryView":424
 *         if have_slices:
 *             obj = self.is_slice(value)
 *             if obj:             # <<<<<<<<<<<<<<
 *                 self.setitem_slice_assignment(self[index], obj)
 *             else:
 */
    __pyx_t_1 = __Pyx_PyObject_IsTrue(__pyx_v_obj); if (unlikely(__pyx_t_1 < 0)) __PYX_ERR(2, 424, __pyx_L1_error)
    if (__pyx_t_1) {

      /* "View.MemoryView":425
 *             obj = self.is_slice(value)
 *             if obj:
 *                 self.setitem_slice_assignment(self[index], obj)             # <<<<<<<<<<<<<<
 *             else:
 *                 self.setitem_slice_assign_scalar(self[index], value)
 */
      __pyx_t_2 = __Pyx_PyObject_GetItem(((PyObject *)__pyx_v_self), __pyx_v_index); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 425, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __pyx_t_4 = ((struct __pyx_vtabstruct_memoryview *)__pyx_v_self->__pyx_vtab)->setitem_slice_assignment(__pyx_v_self, __pyx_t_2, __pyx_v_obj); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 425, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_4);
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
      __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;

      /* "View.MemoryView":424
 *         if have_slices:
 *             obj = self.is_slice(value)
 *             if obj:             # <<<<<<<<<<<<<<
 *                 self.setitem_slice_assignment(self[index], obj)
 *             else:
 */
      goto __pyx_L5;
    }

    /* "View.MemoryView":427
 *                 self.setitem_slice_assignment(self[index], obj)
 *             else:
 *                 self.setitem_slice_assign_scalar(self[index], value)             # <<<<<<<<<<<<<<
 *         else:
 *             self.setitem_indexed(index, value)
 */
    /*else*/ {
      __pyx_t_4 = __Pyx_PyObject_GetItem(((PyObject *)__pyx_v_self), __pyx_v_index); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 427, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_4);
      if (!(likely(((__pyx_t_4) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_4, __pyx_memoryview_type))))) __PYX_ERR(2, 427, __pyx_L1_error)
      __pyx_t_2 = ((struct __pyx_vtabstruct_memoryview *)__pyx_v_self->__pyx_vtab)->setitem_slice_assign_scalar(__pyx_v_self, ((struct __pyx_memoryview_obj *)__pyx_t_4), __pyx_v_value); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 427, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_2);
      __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    }
    __pyx_L5:;

    /* "View.MemoryView":422
 *         have_slices, index = _unellipsify(index, self.view.ndim)
 * 
 *         if have_slices:             # <<<<<<<<<<<<<<
 *             obj = self.is_slice(value)
 *             if obj:
 */
    goto __pyx_L4;
  }

  /* "View.MemoryView":429
 *                 self.setitem_slice_assign_scalar(self[index], value)
 *         else:
 *             self.setitem_indexed(index, value)             # <<<<<<<<<<<<<<
 * 
 *     cdef is_slice(self, obj):
 */
  /*else*/ {
    __pyx_t_2 = ((struct __pyx_vtabstruct_memoryview *)__pyx_v_self->__pyx_vtab)->setitem_indexed(__pyx_v_self, __pyx_v_index, __pyx_v_value); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 429, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  }
  __pyx_L4:;

  /* "View.MemoryView":416
 *             return self.convert_item_to_object(itemp)
 * 
 *     def __setitem__(memoryview self, object index, object value):             # <<<<<<<<<<<<<<
 *         if self.view.readonly:
 *             raise TypeError("Cannot assign to read-only memoryview")
 */

  /* function exit code */
  __pyx_r = 0;
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_AddTraceback("View.MemoryView.memoryview.__setitem__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = -1;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_have_slices);
  __Pyx_XDECREF(__pyx_v_obj);
  __Pyx_XDECREF(__pyx_v_index);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":431
 *             self.setitem_indexed(index, value)
 * 
 *     cdef is_slice(self, obj):             # <<<<<<<<<<<<<<
 *         if not isinstance(obj, memoryview):
 *             try:
 */

static PyObject *__pyx_memoryview_is_slice(struct __pyx_memoryview_obj *__pyx_v_self, PyObject *__pyx_v_obj) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  int __pyx_t_2;
  PyObject *__pyx_t_3 = NULL;
  PyObject *__pyx_t_4 = NULL;
  PyObject *__pyx_t_5 = NULL;
  PyObject *__pyx_t_6 = NULL;
  PyObject *__pyx_t_7 = NULL;
  PyObject *__pyx_t_8 = NULL;
  int __pyx_t_9;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("is_slice", 0);
  __Pyx_INCREF(__pyx_v_obj);

  /* "View.MemoryView":432
 * 
 *     cdef is_slice(self, obj):
 *         if not isinstance(obj, memoryview):             # <<<<<<<<<<<<<<
 *             try:
 *                 obj = memoryview(obj, self.flags & ~PyBUF_WRITABLE | PyBUF_ANY_CONTIGUOUS,
 */
  __pyx_t_1 = __Pyx_TypeCheck(__pyx_v_obj, __pyx_memoryview_type); 
  __pyx_t_2 = ((!(__pyx_t_1 != 0)) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":433
 *     cdef is_slice(self, obj):
 *         if not isinstance(obj, memoryview):
 *             try:             # <<<<<<<<<<<<<<
 *                 obj = memoryview(obj, self.flags & ~PyBUF_WRITABLE | PyBUF_ANY_CONTIGUOUS,
 *                                  self.dtype_is_object)
 */
    {
      __Pyx_PyThreadState_declare
      __Pyx_PyThreadState_assign
      __Pyx_ExceptionSave(&__pyx_t_3, &__pyx_t_4, &__pyx_t_5);
      __Pyx_XGOTREF(__pyx_t_3);
      __Pyx_XGOTREF(__pyx_t_4);
      __Pyx_XGOTREF(__pyx_t_5);
      /*try:*/ {

        /* "View.MemoryView":434
 *         if not isinstance(obj, memoryview):
 *             try:
 *                 obj = memoryview(obj, self.flags & ~PyBUF_WRITABLE | PyBUF_ANY_CONTIGUOUS,             # <<<<<<<<<<<<<<
 *                                  self.dtype_is_object)
 *             except TypeError:
 */
        __pyx_t_6 = __Pyx_PyInt_From_int(((__pyx_v_self->flags & (~PyBUF_WRITABLE)) | PyBUF_ANY_CONTIGUOUS)); if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 434, __pyx_L4_error)
        __Pyx_GOTREF(__pyx_t_6);

        /* "View.MemoryView":435
 *             try:
 *                 obj = memoryview(obj, self.flags & ~PyBUF_WRITABLE | PyBUF_ANY_CONTIGUOUS,
 *                                  self.dtype_is_object)             # <<<<<<<<<<<<<<
 *             except TypeError:
 *                 return None
 */
        __pyx_t_7 = __Pyx_PyBool_FromLong(__pyx_v_self->dtype_is_object); if (unlikely(!__pyx_t_7)) __PYX_ERR(2, 435, __pyx_L4_error)
        __Pyx_GOTREF(__pyx_t_7);

        /* "View.MemoryView":434
 *         if not isinstance(obj, memoryview):
 *             try:
 *                 obj = memoryview(obj, self.flags & ~PyBUF_WRITABLE | PyBUF_ANY_CONTIGUOUS,             # <<<<<<<<<<<<<<
 *                                  self.dtype_is_object)
 *             except TypeError:
 */
        __pyx_t_8 = PyTuple_New(3); if (unlikely(!__pyx_t_8)) __PYX_ERR(2, 434, __pyx_L4_error)
        __Pyx_GOTREF(__pyx_t_8);
        __Pyx_INCREF(__pyx_v_obj);
        __Pyx_GIVEREF(__pyx_v_obj);
        PyTuple_SET_ITEM(__pyx_t_8, 0, __pyx_v_obj);
        __Pyx_GIVEREF(__pyx_t_6);
        PyTuple_SET_ITEM(__pyx_t_8, 1, __pyx_t_6);
        __Pyx_GIVEREF(__pyx_t_7);
        PyTuple_SET_ITEM(__pyx_t_8, 2, __pyx_t_7);
        __pyx_t_6 = 0;
        __pyx_t_7 = 0;
        __pyx_t_7 = __Pyx_PyObject_Call(((PyObject *)__pyx_memoryview_type), __pyx_t_8, NULL); if (unlikely(!__pyx_t_7)) __PYX_ERR(2, 434, __pyx_L4_error)
        __Pyx_GOTREF(__pyx_t_7);
        __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0;
        __Pyx_DECREF_SET(__pyx_v_obj, __pyx_t_7);
        __pyx_t_7 = 0;

        /* "View.MemoryView":433
 *     cdef is_slice(self, obj):
 *         if not isinstance(obj, memoryview):
 *             try:             # <<<<<<<<<<<<<<
 *                 obj = memoryview(obj, self.flags & ~PyBUF_WRITABLE | PyBUF_ANY_CONTIGUOUS,
 *                                  self.dtype_is_object)
 */
      }
      __Pyx_XDECREF(__pyx_t_3); __pyx_t_3 = 0;
      __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0;
      goto __pyx_L9_try_end;
      __pyx_L4_error:;
      __Pyx_XDECREF(__pyx_t_6); __pyx_t_6 = 0;
      __Pyx_XDECREF(__pyx_t_7); __pyx_t_7 = 0;
      __Pyx_XDECREF(__pyx_t_8); __pyx_t_8 = 0;

      /* "View.MemoryView":436
 *                 obj = memoryview(obj, self.flags & ~PyBUF_WRITABLE | PyBUF_ANY_CONTIGUOUS,
 *                                  self.dtype_is_object)
 *             except TypeError:             # <<<<<<<<<<<<<<
 *                 return None
 * 
 */
      __pyx_t_9 = __Pyx_PyErr_ExceptionMatches(__pyx_builtin_TypeError);
      if (__pyx_t_9) {
        __Pyx_AddTraceback("View.MemoryView.memoryview.is_slice", __pyx_clineno, __pyx_lineno, __pyx_filename);
        if (__Pyx_GetException(&__pyx_t_7, &__pyx_t_8, &__pyx_t_6) < 0) __PYX_ERR(2, 436, __pyx_L6_except_error)
        __Pyx_GOTREF(__pyx_t_7);
        __Pyx_GOTREF(__pyx_t_8);
        __Pyx_GOTREF(__pyx_t_6);

        /* "View.MemoryView":437
 *                                  self.dtype_is_object)
 *             except TypeError:
 *                 return None             # <<<<<<<<<<<<<<
 * 
 *         return obj
 */
        __Pyx_XDECREF(__pyx_r);
        __pyx_r = Py_None; __Pyx_INCREF(Py_None);
        __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
        __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
        __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0;
        goto __pyx_L7_except_return;
      }
      goto __pyx_L6_except_error;
      __pyx_L6_except_error:;

      /* "View.MemoryView":433
 *     cdef is_slice(self, obj):
 *         if not isinstance(obj, memoryview):
 *             try:             # <<<<<<<<<<<<<<
 *                 obj = memoryview(obj, self.flags & ~PyBUF_WRITABLE | PyBUF_ANY_CONTIGUOUS,
 *                                  self.dtype_is_object)
 */
      __Pyx_XGIVEREF(__pyx_t_3);
      __Pyx_XGIVEREF(__pyx_t_4);
      __Pyx_XGIVEREF(__pyx_t_5);
      __Pyx_ExceptionReset(__pyx_t_3, __pyx_t_4, __pyx_t_5);
      goto __pyx_L1_error;
      __pyx_L7_except_return:;
      __Pyx_XGIVEREF(__pyx_t_3);
      __Pyx_XGIVEREF(__pyx_t_4);
      __Pyx_XGIVEREF(__pyx_t_5);
      __Pyx_ExceptionReset(__pyx_t_3, __pyx_t_4, __pyx_t_5);
      goto __pyx_L0;
      __pyx_L9_try_end:;
    }

    /* "View.MemoryView":432
 * 
 *     cdef is_slice(self, obj):
 *         if not isinstance(obj, memoryview):             # <<<<<<<<<<<<<<
 *             try:
 *                 obj = memoryview(obj, self.flags & ~PyBUF_WRITABLE | PyBUF_ANY_CONTIGUOUS,
 */
  }

  /* "View.MemoryView":439
 *                 return None
 * 
 *         return obj             # <<<<<<<<<<<<<<
 * 
 *     cdef setitem_slice_assignment(self, dst, src):
 */
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_obj);
  __pyx_r = __pyx_v_obj;
  goto __pyx_L0;

  /* "View.MemoryView":431
 *             self.setitem_indexed(index, value)
 * 
 *     cdef is_slice(self, obj):             # <<<<<<<<<<<<<<
 *         if not isinstance(obj, memoryview):
 *             try:
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_6);
  __Pyx_XDECREF(__pyx_t_7);
  __Pyx_XDECREF(__pyx_t_8);
  __Pyx_AddTraceback("View.MemoryView.memoryview.is_slice", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_obj);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":441
 *         return obj
 * 
 *     cdef setitem_slice_assignment(self, dst, src):             # <<<<<<<<<<<<<<
 *         cdef __Pyx_memviewslice dst_slice
 *         cdef __Pyx_memviewslice src_slice
 */

static PyObject *__pyx_memoryview_setitem_slice_assignment(struct __pyx_memoryview_obj *__pyx_v_self, PyObject *__pyx_v_dst, PyObject *__pyx_v_src) {
  __Pyx_memviewslice __pyx_v_dst_slice;
  __Pyx_memviewslice __pyx_v_src_slice;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_memviewslice *__pyx_t_1;
  __Pyx_memviewslice *__pyx_t_2;
  PyObject *__pyx_t_3 = NULL;
  int __pyx_t_4;
  int __pyx_t_5;
  int __pyx_t_6;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("setitem_slice_assignment", 0);

  /* "View.MemoryView":445
 *         cdef __Pyx_memviewslice src_slice
 * 
 *         memoryview_copy_contents(get_slice_from_memview(src, &src_slice)[0],             # <<<<<<<<<<<<<<
 *                                  get_slice_from_memview(dst, &dst_slice)[0],
 *                                  src.ndim, dst.ndim, self.dtype_is_object)
 */
  if (!(likely(((__pyx_v_src) == Py_None) || likely(__Pyx_TypeTest(__pyx_v_src, __pyx_memoryview_type))))) __PYX_ERR(2, 445, __pyx_L1_error)
  __pyx_t_1 = __pyx_memoryview_get_slice_from_memoryview(((struct __pyx_memoryview_obj *)__pyx_v_src), (&__pyx_v_src_slice)); if (unlikely(__pyx_t_1 == ((__Pyx_memviewslice *)NULL))) __PYX_ERR(2, 445, __pyx_L1_error)

  /* "View.MemoryView":446
 * 
 *         memoryview_copy_contents(get_slice_from_memview(src, &src_slice)[0],
 *                                  get_slice_from_memview(dst, &dst_slice)[0],             # <<<<<<<<<<<<<<
 *                                  src.ndim, dst.ndim, self.dtype_is_object)
 * 
 */
  if (!(likely(((__pyx_v_dst) == Py_None) || likely(__Pyx_TypeTest(__pyx_v_dst, __pyx_memoryview_type))))) __PYX_ERR(2, 446, __pyx_L1_error)
  __pyx_t_2 = __pyx_memoryview_get_slice_from_memoryview(((struct __pyx_memoryview_obj *)__pyx_v_dst), (&__pyx_v_dst_slice)); if (unlikely(__pyx_t_2 == ((__Pyx_memviewslice *)NULL))) __PYX_ERR(2, 446, __pyx_L1_error)

  /* "View.MemoryView":447
 *         memoryview_copy_contents(get_slice_from_memview(src, &src_slice)[0],
 *                                  get_slice_from_memview(dst, &dst_slice)[0],
 *                                  src.ndim, dst.ndim, self.dtype_is_object)             # <<<<<<<<<<<<<<
 * 
 *     cdef setitem_slice_assign_scalar(self, memoryview dst, value):
 */
  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_v_src, __pyx_n_s_ndim); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 447, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_t_4 = __Pyx_PyInt_As_int(__pyx_t_3); if (unlikely((__pyx_t_4 == (int)-1) && PyErr_Occurred())) __PYX_ERR(2, 447, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_t_3 = __Pyx_PyObject_GetAttrStr(__pyx_v_dst, __pyx_n_s_ndim); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 447, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_t_5 = __Pyx_PyInt_As_int(__pyx_t_3); if (unlikely((__pyx_t_5 == (int)-1) && PyErr_Occurred())) __PYX_ERR(2, 447, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;

  /* "View.MemoryView":445
 *         cdef __Pyx_memviewslice src_slice
 * 
 *         memoryview_copy_contents(get_slice_from_memview(src, &src_slice)[0],             # <<<<<<<<<<<<<<
 *                                  get_slice_from_memview(dst, &dst_slice)[0],
 *                                  src.ndim, dst.ndim, self.dtype_is_object)
 */
  __pyx_t_6 = __pyx_memoryview_copy_contents((__pyx_t_1[0]), (__pyx_t_2[0]), __pyx_t_4, __pyx_t_5, __pyx_v_self->dtype_is_object); if (unlikely(__pyx_t_6 == ((int)-1))) __PYX_ERR(2, 445, __pyx_L1_error)

  /* "View.MemoryView":441
 *         return obj
 * 
 *     cdef setitem_slice_assignment(self, dst, src):             # <<<<<<<<<<<<<<
 *         cdef __Pyx_memviewslice dst_slice
 *         cdef __Pyx_memviewslice src_slice
 */

  /* function exit code */
  __pyx_r = Py_None; __Pyx_INCREF(Py_None);
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_AddTraceback("View.MemoryView.memoryview.setitem_slice_assignment", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":449
 *                                  src.ndim, dst.ndim, self.dtype_is_object)
 * 
 *     cdef setitem_slice_assign_scalar(self, memoryview dst, value):             # <<<<<<<<<<<<<<
 *         cdef int array[128]
 *         cdef void *tmp = NULL
 */

static PyObject *__pyx_memoryview_setitem_slice_assign_scalar(struct __pyx_memoryview_obj *__pyx_v_self, struct __pyx_memoryview_obj *__pyx_v_dst, PyObject *__pyx_v_value) {
  int __pyx_v_array[0x80];
  void *__pyx_v_tmp;
  void *__pyx_v_item;
  __Pyx_memviewslice *__pyx_v_dst_slice;
  __Pyx_memviewslice __pyx_v_tmp_slice;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_memviewslice *__pyx_t_1;
  int __pyx_t_2;
  PyObject *__pyx_t_3 = NULL;
  int __pyx_t_4;
  int __pyx_t_5;
  char const *__pyx_t_6;
  PyObject *__pyx_t_7 = NULL;
  PyObject *__pyx_t_8 = NULL;
  PyObject *__pyx_t_9 = NULL;
  PyObject *__pyx_t_10 = NULL;
  PyObject *__pyx_t_11 = NULL;
  PyObject *__pyx_t_12 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("setitem_slice_assign_scalar", 0);

  /* "View.MemoryView":451
 *     cdef setitem_slice_assign_scalar(self, memoryview dst, value):
 *         cdef int array[128]
 *         cdef void *tmp = NULL             # <<<<<<<<<<<<<<
 *         cdef void *item
 * 
 */
  __pyx_v_tmp = NULL;

  /* "View.MemoryView":456
 *         cdef __Pyx_memviewslice *dst_slice
 *         cdef __Pyx_memviewslice tmp_slice
 *         dst_slice = get_slice_from_memview(dst, &tmp_slice)             # <<<<<<<<<<<<<<
 * 
 *         if <size_t>self.view.itemsize > sizeof(array):
 */
  __pyx_t_1 = __pyx_memoryview_get_slice_from_memoryview(__pyx_v_dst, (&__pyx_v_tmp_slice)); if (unlikely(__pyx_t_1 == ((__Pyx_memviewslice *)NULL))) __PYX_ERR(2, 456, __pyx_L1_error)
  __pyx_v_dst_slice = __pyx_t_1;

  /* "View.MemoryView":458
 *         dst_slice = get_slice_from_memview(dst, &tmp_slice)
 * 
 *         if <size_t>self.view.itemsize > sizeof(array):             # <<<<<<<<<<<<<<
 *             tmp = PyMem_Malloc(self.view.itemsize)
 *             if tmp == NULL:
 */
  __pyx_t_2 = ((((size_t)__pyx_v_self->view.itemsize) > (sizeof(__pyx_v_array))) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":459
 * 
 *         if <size_t>self.view.itemsize > sizeof(array):
 *             tmp = PyMem_Malloc(self.view.itemsize)             # <<<<<<<<<<<<<<
 *             if tmp == NULL:
 *                 raise MemoryError
 */
    __pyx_v_tmp = PyMem_Malloc(__pyx_v_self->view.itemsize);

    /* "View.MemoryView":460
 *         if <size_t>self.view.itemsize > sizeof(array):
 *             tmp = PyMem_Malloc(self.view.itemsize)
 *             if tmp == NULL:             # <<<<<<<<<<<<<<
 *                 raise MemoryError
 *             item = tmp
 */
    __pyx_t_2 = ((__pyx_v_tmp == NULL) != 0);
    if (unlikely(__pyx_t_2)) {

      /* "View.MemoryView":461
 *             tmp = PyMem_Malloc(self.view.itemsize)
 *             if tmp == NULL:
 *                 raise MemoryError             # <<<<<<<<<<<<<<
 *             item = tmp
 *         else:
 */
      PyErr_NoMemory(); __PYX_ERR(2, 461, __pyx_L1_error)

      /* "View.MemoryView":460
 *         if <size_t>self.view.itemsize > sizeof(array):
 *             tmp = PyMem_Malloc(self.view.itemsize)
 *             if tmp == NULL:             # <<<<<<<<<<<<<<
 *                 raise MemoryError
 *             item = tmp
 */
    }

    /* "View.MemoryView":462
 *             if tmp == NULL:
 *                 raise MemoryError
 *             item = tmp             # <<<<<<<<<<<<<<
 *         else:
 *             item = <void *> array
 */
    __pyx_v_item = __pyx_v_tmp;

    /* "View.MemoryView":458
 *         dst_slice = get_slice_from_memview(dst, &tmp_slice)
 * 
 *         if <size_t>self.view.itemsize > sizeof(array):             # <<<<<<<<<<<<<<
 *             tmp = PyMem_Malloc(self.view.itemsize)
 *             if tmp == NULL:
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":464
 *             item = tmp
 *         else:
 *             item = <void *> array             # <<<<<<<<<<<<<<
 * 
 *         try:
 */
  /*else*/ {
    __pyx_v_item = ((void *)__pyx_v_array);
  }
  __pyx_L3:;

  /* "View.MemoryView":466
 *             item = <void *> array
 * 
 *         try:             # <<<<<<<<<<<<<<
 *             if self.dtype_is_object:
 *                 (<PyObject **> item)[0] = <PyObject *> value
 */
  /*try:*/ {

    /* "View.MemoryView":467
 * 
 *         try:
 *             if self.dtype_is_object:             # <<<<<<<<<<<<<<
 *                 (<PyObject **> item)[0] = <PyObject *> value
 *             else:
 */
    __pyx_t_2 = (__pyx_v_self->dtype_is_object != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":468
 *         try:
 *             if self.dtype_is_object:
 *                 (<PyObject **> item)[0] = <PyObject *> value             # <<<<<<<<<<<<<<
 *             else:
 *                 self.assign_item_from_object(<char *> item, value)
 */
      (((PyObject **)__pyx_v_item)[0]) = ((PyObject *)__pyx_v_value);

      /* "View.MemoryView":467
 * 
 *         try:
 *             if self.dtype_is_object:             # <<<<<<<<<<<<<<
 *                 (<PyObject **> item)[0] = <PyObject *> value
 *             else:
 */
      goto __pyx_L8;
    }

    /* "View.MemoryView":470
 *                 (<PyObject **> item)[0] = <PyObject *> value
 *             else:
 *                 self.assign_item_from_object(<char *> item, value)             # <<<<<<<<<<<<<<
 * 
 * 
 */
    /*else*/ {
      __pyx_t_3 = ((struct __pyx_vtabstruct_memoryview *)__pyx_v_self->__pyx_vtab)->assign_item_from_object(__pyx_v_self, ((char *)__pyx_v_item), __pyx_v_value); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 470, __pyx_L6_error)
      __Pyx_GOTREF(__pyx_t_3);
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    }
    __pyx_L8:;

    /* "View.MemoryView":474
 * 
 * 
 *             if self.view.suboffsets != NULL:             # <<<<<<<<<<<<<<
 *                 assert_direct_dimensions(self.view.suboffsets, self.view.ndim)
 *             slice_assign_scalar(dst_slice, dst.view.ndim, self.view.itemsize,
 */
    __pyx_t_2 = ((__pyx_v_self->view.suboffsets != NULL) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":475
 * 
 *             if self.view.suboffsets != NULL:
 *                 assert_direct_dimensions(self.view.suboffsets, self.view.ndim)             # <<<<<<<<<<<<<<
 *             slice_assign_scalar(dst_slice, dst.view.ndim, self.view.itemsize,
 *                                 item, self.dtype_is_object)
 */
      __pyx_t_3 = assert_direct_dimensions(__pyx_v_self->view.suboffsets, __pyx_v_self->view.ndim); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 475, __pyx_L6_error)
      __Pyx_GOTREF(__pyx_t_3);
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;

      /* "View.MemoryView":474
 * 
 * 
 *             if self.view.suboffsets != NULL:             # <<<<<<<<<<<<<<
 *                 assert_direct_dimensions(self.view.suboffsets, self.view.ndim)
 *             slice_assign_scalar(dst_slice, dst.view.ndim, self.view.itemsize,
 */
    }

    /* "View.MemoryView":476
 *             if self.view.suboffsets != NULL:
 *                 assert_direct_dimensions(self.view.suboffsets, self.view.ndim)
 *             slice_assign_scalar(dst_slice, dst.view.ndim, self.view.itemsize,             # <<<<<<<<<<<<<<
 *                                 item, self.dtype_is_object)
 *         finally:
 */
    __pyx_memoryview_slice_assign_scalar(__pyx_v_dst_slice, __pyx_v_dst->view.ndim, __pyx_v_self->view.itemsize, __pyx_v_item, __pyx_v_self->dtype_is_object);
  }

  /* "View.MemoryView":479
 *                                 item, self.dtype_is_object)
 *         finally:
 *             PyMem_Free(tmp)             # <<<<<<<<<<<<<<
 * 
 *     cdef setitem_indexed(self, index, value):
 */
  /*finally:*/ {
    /*normal exit:*/{
      PyMem_Free(__pyx_v_tmp);
      goto __pyx_L7;
    }
    __pyx_L6_error:;
    /*exception exit:*/{
      __Pyx_PyThreadState_declare
      __Pyx_PyThreadState_assign
      __pyx_t_7 = 0; __pyx_t_8 = 0; __pyx_t_9 = 0; __pyx_t_10 = 0; __pyx_t_11 = 0; __pyx_t_12 = 0;
      __Pyx_XDECREF(__pyx_t_3); __pyx_t_3 = 0;
      if (PY_MAJOR_VERSION >= 3) __Pyx_ExceptionSwap(&__pyx_t_10, &__pyx_t_11, &__pyx_t_12);
      if ((PY_MAJOR_VERSION < 3) || unlikely(__Pyx_GetException(&__pyx_t_7, &__pyx_t_8, &__pyx_t_9) < 0)) __Pyx_ErrFetch(&__pyx_t_7, &__pyx_t_8, &__pyx_t_9);
      __Pyx_XGOTREF(__pyx_t_7);
      __Pyx_XGOTREF(__pyx_t_8);
      __Pyx_XGOTREF(__pyx_t_9);
      __Pyx_XGOTREF(__pyx_t_10);
      __Pyx_XGOTREF(__pyx_t_11);
      __Pyx_XGOTREF(__pyx_t_12);
      __pyx_t_4 = __pyx_lineno; __pyx_t_5 = __pyx_clineno; __pyx_t_6 = __pyx_filename;
      {
        PyMem_Free(__pyx_v_tmp);
      }
      if (PY_MAJOR_VERSION >= 3) {
        __Pyx_XGIVEREF(__pyx_t_10);
        __Pyx_XGIVEREF(__pyx_t_11);
        __Pyx_XGIVEREF(__pyx_t_12);
        __Pyx_ExceptionReset(__pyx_t_10, __pyx_t_11, __pyx_t_12);
      }
      __Pyx_XGIVEREF(__pyx_t_7);
      __Pyx_XGIVEREF(__pyx_t_8);
      __Pyx_XGIVEREF(__pyx_t_9);
      __Pyx_ErrRestore(__pyx_t_7, __pyx_t_8, __pyx_t_9);
      __pyx_t_7 = 0; __pyx_t_8 = 0; __pyx_t_9 = 0; __pyx_t_10 = 0; __pyx_t_11 = 0; __pyx_t_12 = 0;
      __pyx_lineno = __pyx_t_4; __pyx_clineno = __pyx_t_5; __pyx_filename = __pyx_t_6;
      goto __pyx_L1_error;
    }
    __pyx_L7:;
  }

  /* "View.MemoryView":449
 *                                  src.ndim, dst.ndim, self.dtype_is_object)
 * 
 *     cdef setitem_slice_assign_scalar(self, memoryview dst, value):             # <<<<<<<<<<<<<<
 *         cdef int array[128]
 *         cdef void *tmp = NULL
 */

  /* function exit code */
  __pyx_r = Py_None; __Pyx_INCREF(Py_None);
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_AddTraceback("View.MemoryView.memoryview.setitem_slice_assign_scalar", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":481
 *             PyMem_Free(tmp)
 * 
 *     cdef setitem_indexed(self, index, value):             # <<<<<<<<<<<<<<
 *         cdef char *itemp = self.get_item_pointer(index)
 *         self.assign_item_from_object(itemp, value)
 */

static PyObject *__pyx_memoryview_setitem_indexed(struct __pyx_memoryview_obj *__pyx_v_self, PyObject *__pyx_v_index, PyObject *__pyx_v_value) {
  char *__pyx_v_itemp;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  char *__pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("setitem_indexed", 0);

  /* "View.MemoryView":482
 * 
 *     cdef setitem_indexed(self, index, value):
 *         cdef char *itemp = self.get_item_pointer(index)             # <<<<<<<<<<<<<<
 *         self.assign_item_from_object(itemp, value)
 * 
 */
  __pyx_t_1 = ((struct __pyx_vtabstruct_memoryview *)__pyx_v_self->__pyx_vtab)->get_item_pointer(__pyx_v_self, __pyx_v_index); if (unlikely(__pyx_t_1 == ((char *)NULL))) __PYX_ERR(2, 482, __pyx_L1_error)
  __pyx_v_itemp = __pyx_t_1;

  /* "View.MemoryView":483
 *     cdef setitem_indexed(self, index, value):
 *         cdef char *itemp = self.get_item_pointer(index)
 *         self.assign_item_from_object(itemp, value)             # <<<<<<<<<<<<<<
 * 
 *     cdef convert_item_to_object(self, char *itemp):
 */
  __pyx_t_2 = ((struct __pyx_vtabstruct_memoryview *)__pyx_v_self->__pyx_vtab)->assign_item_from_object(__pyx_v_self, __pyx_v_itemp, __pyx_v_value); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 483, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;

  /* "View.MemoryView":481
 *             PyMem_Free(tmp)
 * 
 *     cdef setitem_indexed(self, index, value):             # <<<<<<<<<<<<<<
 *         cdef char *itemp = self.get_item_pointer(index)
 *         self.assign_item_from_object(itemp, value)
 */

  /* function exit code */
  __pyx_r = Py_None; __Pyx_INCREF(Py_None);
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_AddTraceback("View.MemoryView.memoryview.setitem_indexed", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":485
 *         self.assign_item_from_object(itemp, value)
 * 
 *     cdef convert_item_to_object(self, char *itemp):             # <<<<<<<<<<<<<<
 *         """Only used if instantiated manually by the user, or if Cython doesn't
 *         know how to convert the type"""
 */

static PyObject *__pyx_memoryview_convert_item_to_object(struct __pyx_memoryview_obj *__pyx_v_self, char *__pyx_v_itemp) {
  PyObject *__pyx_v_struct = NULL;
  PyObject *__pyx_v_bytesitem = 0;
  PyObject *__pyx_v_result = NULL;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  PyObject *__pyx_t_4 = NULL;
  PyObject *__pyx_t_5 = NULL;
  PyObject *__pyx_t_6 = NULL;
  PyObject *__pyx_t_7 = NULL;
  int __pyx_t_8;
  PyObject *__pyx_t_9 = NULL;
  size_t __pyx_t_10;
  int __pyx_t_11;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("convert_item_to_object", 0);

  /* "View.MemoryView":488
 *         """Only used if instantiated manually by the user, or if Cython doesn't
 *         know how to convert the type"""
 *         import struct             # <<<<<<<<<<<<<<
 *         cdef bytes bytesitem
 * 
 */
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_struct, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 488, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_v_struct = __pyx_t_1;
  __pyx_t_1 = 0;

  /* "View.MemoryView":491
 *         cdef bytes bytesitem
 * 
 *         bytesitem = itemp[:self.view.itemsize]             # <<<<<<<<<<<<<<
 *         try:
 *             result = struct.unpack(self.view.format, bytesitem)
 */
  __pyx_t_1 = __Pyx_PyBytes_FromStringAndSize(__pyx_v_itemp + 0, __pyx_v_self->view.itemsize - 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 491, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_v_bytesitem = ((PyObject*)__pyx_t_1);
  __pyx_t_1 = 0;

  /* "View.MemoryView":492
 * 
 *         bytesitem = itemp[:self.view.itemsize]
 *         try:             # <<<<<<<<<<<<<<
 *             result = struct.unpack(self.view.format, bytesitem)
 *         except struct.error:
 */
  {
    __Pyx_PyThreadState_declare
    __Pyx_PyThreadState_assign
    __Pyx_ExceptionSave(&__pyx_t_2, &__pyx_t_3, &__pyx_t_4);
    __Pyx_XGOTREF(__pyx_t_2);
    __Pyx_XGOTREF(__pyx_t_3);
    __Pyx_XGOTREF(__pyx_t_4);
    /*try:*/ {

      /* "View.MemoryView":493
 *         bytesitem = itemp[:self.view.itemsize]
 *         try:
 *             result = struct.unpack(self.view.format, bytesitem)             # <<<<<<<<<<<<<<
 *         except struct.error:
 *             raise ValueError("Unable to convert item to object")
 */
      __pyx_t_5 = __Pyx_PyObject_GetAttrStr(__pyx_v_struct, __pyx_n_s_unpack); if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 493, __pyx_L3_error)
      __Pyx_GOTREF(__pyx_t_5);
      __pyx_t_6 = __Pyx_PyBytes_FromString(__pyx_v_self->view.format); if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 493, __pyx_L3_error)
      __Pyx_GOTREF(__pyx_t_6);
      __pyx_t_7 = NULL;
      __pyx_t_8 = 0;
      if (CYTHON_UNPACK_METHODS && likely(PyMethod_Check(__pyx_t_5))) {
        __pyx_t_7 = PyMethod_GET_SELF(__pyx_t_5);
        if (likely(__pyx_t_7)) {
          PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_5);
          __Pyx_INCREF(__pyx_t_7);
          __Pyx_INCREF(function);
          __Pyx_DECREF_SET(__pyx_t_5, function);
          __pyx_t_8 = 1;
        }
      }
      #if CYTHON_FAST_PYCALL
      if (PyFunction_Check(__pyx_t_5)) {
        PyObject *__pyx_temp[3] = {__pyx_t_7, __pyx_t_6, __pyx_v_bytesitem};
        __pyx_t_1 = __Pyx_PyFunction_FastCall(__pyx_t_5, __pyx_temp+1-__pyx_t_8, 2+__pyx_t_8); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 493, __pyx_L3_error)
        __Pyx_XDECREF(__pyx_t_7); __pyx_t_7 = 0;
        __Pyx_GOTREF(__pyx_t_1);
        __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
      } else
      #endif
      #if CYTHON_FAST_PYCCALL
      if (__Pyx_PyFastCFunction_Check(__pyx_t_5)) {
        PyObject *__pyx_temp[3] = {__pyx_t_7, __pyx_t_6, __pyx_v_bytesitem};
        __pyx_t_1 = __Pyx_PyCFunction_FastCall(__pyx_t_5, __pyx_temp+1-__pyx_t_8, 2+__pyx_t_8); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 493, __pyx_L3_error)
        __Pyx_XDECREF(__pyx_t_7); __pyx_t_7 = 0;
        __Pyx_GOTREF(__pyx_t_1);
        __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
      } else
      #endif
      {
        __pyx_t_9 = PyTuple_New(2+__pyx_t_8); if (unlikely(!__pyx_t_9)) __PYX_ERR(2, 493, __pyx_L3_error)
        __Pyx_GOTREF(__pyx_t_9);
        if (__pyx_t_7) {
          __Pyx_GIVEREF(__pyx_t_7); PyTuple_SET_ITEM(__pyx_t_9, 0, __pyx_t_7); __pyx_t_7 = NULL;
        }
        __Pyx_GIVEREF(__pyx_t_6);
        PyTuple_SET_ITEM(__pyx_t_9, 0+__pyx_t_8, __pyx_t_6);
        __Pyx_INCREF(__pyx_v_bytesitem);
        __Pyx_GIVEREF(__pyx_v_bytesitem);
        PyTuple_SET_ITEM(__pyx_t_9, 1+__pyx_t_8, __pyx_v_bytesitem);
        __pyx_t_6 = 0;
        __pyx_t_1 = __Pyx_PyObject_Call(__pyx_t_5, __pyx_t_9, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 493, __pyx_L3_error)
        __Pyx_GOTREF(__pyx_t_1);
        __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
      }
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
      __pyx_v_result = __pyx_t_1;
      __pyx_t_1 = 0;

      /* "View.MemoryView":492
 * 
 *         bytesitem = itemp[:self.view.itemsize]
 *         try:             # <<<<<<<<<<<<<<
 *             result = struct.unpack(self.view.format, bytesitem)
 *         except struct.error:
 */
    }

    /* "View.MemoryView":497
 *             raise ValueError("Unable to convert item to object")
 *         else:
 *             if len(self.view.format) == 1:             # <<<<<<<<<<<<<<
 *                 return result[0]
 *             return result
 */
    /*else:*/ {
      __pyx_t_10 = strlen(__pyx_v_self->view.format); 
      __pyx_t_11 = ((__pyx_t_10 == 1) != 0);
      if (__pyx_t_11) {

        /* "View.MemoryView":498
 *         else:
 *             if len(self.view.format) == 1:
 *                 return result[0]             # <<<<<<<<<<<<<<
 *             return result
 * 
 */
        __Pyx_XDECREF(__pyx_r);
        __pyx_t_1 = __Pyx_GetItemInt(__pyx_v_result, 0, long, 1, __Pyx_PyInt_From_long, 0, 0, 1); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 498, __pyx_L5_except_error)
        __Pyx_GOTREF(__pyx_t_1);
        __pyx_r = __pyx_t_1;
        __pyx_t_1 = 0;
        goto __pyx_L6_except_return;

        /* "View.MemoryView":497
 *             raise ValueError("Unable to convert item to object")
 *         else:
 *             if len(self.view.format) == 1:             # <<<<<<<<<<<<<<
 *                 return result[0]
 *             return result
 */
      }

      /* "View.MemoryView":499
 *             if len(self.view.format) == 1:
 *                 return result[0]
 *             return result             # <<<<<<<<<<<<<<
 * 
 *     cdef assign_item_from_object(self, char *itemp, object value):
 */
      __Pyx_XDECREF(__pyx_r);
      __Pyx_INCREF(__pyx_v_result);
      __pyx_r = __pyx_v_result;
      goto __pyx_L6_except_return;
    }
    __pyx_L3_error:;
    __Pyx_XDECREF(__pyx_t_1); __pyx_t_1 = 0;
    __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0;
    __Pyx_XDECREF(__pyx_t_6); __pyx_t_6 = 0;
    __Pyx_XDECREF(__pyx_t_7); __pyx_t_7 = 0;
    __Pyx_XDECREF(__pyx_t_9); __pyx_t_9 = 0;

    /* "View.MemoryView":494
 *         try:
 *             result = struct.unpack(self.view.format, bytesitem)
 *         except struct.error:             # <<<<<<<<<<<<<<
 *             raise ValueError("Unable to convert item to object")
 *         else:
 */
    __Pyx_ErrFetch(&__pyx_t_1, &__pyx_t_5, &__pyx_t_9);
    __pyx_t_6 = __Pyx_PyObject_GetAttrStr(__pyx_v_struct, __pyx_n_s_error); if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 494, __pyx_L5_except_error)
    __Pyx_GOTREF(__pyx_t_6);
    __pyx_t_8 = __Pyx_PyErr_GivenExceptionMatches(__pyx_t_1, __pyx_t_6);
    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
    __Pyx_ErrRestore(__pyx_t_1, __pyx_t_5, __pyx_t_9);
    __pyx_t_1 = 0; __pyx_t_5 = 0; __pyx_t_9 = 0;
    if (__pyx_t_8) {
      __Pyx_AddTraceback("View.MemoryView.memoryview.convert_item_to_object", __pyx_clineno, __pyx_lineno, __pyx_filename);
      if (__Pyx_GetException(&__pyx_t_9, &__pyx_t_5, &__pyx_t_1) < 0) __PYX_ERR(2, 494, __pyx_L5_except_error)
      __Pyx_GOTREF(__pyx_t_9);
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_GOTREF(__pyx_t_1);

      /* "View.MemoryView":495
 *             result = struct.unpack(self.view.format, bytesitem)
 *         except struct.error:
 *             raise ValueError("Unable to convert item to object")             # <<<<<<<<<<<<<<
 *         else:
 *             if len(self.view.format) == 1:
 */
      __pyx_t_6 = __Pyx_PyObject_Call(__pyx_builtin_ValueError, __pyx_tuple__12, NULL); if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 495, __pyx_L5_except_error)
      __Pyx_GOTREF(__pyx_t_6);
      __Pyx_Raise(__pyx_t_6, 0, 0, 0);
      __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
      __PYX_ERR(2, 495, __pyx_L5_except_error)
    }
    goto __pyx_L5_except_error;
    __pyx_L5_except_error:;

    /* "View.MemoryView":492
 * 
 *         bytesitem = itemp[:self.view.itemsize]
 *         try:             # <<<<<<<<<<<<<<
 *             result = struct.unpack(self.view.format, bytesitem)
 *         except struct.error:
 */
    __Pyx_XGIVEREF(__pyx_t_2);
    __Pyx_XGIVEREF(__pyx_t_3);
    __Pyx_XGIVEREF(__pyx_t_4);
    __Pyx_ExceptionReset(__pyx_t_2, __pyx_t_3, __pyx_t_4);
    goto __pyx_L1_error;
    __pyx_L6_except_return:;
    __Pyx_XGIVEREF(__pyx_t_2);
    __Pyx_XGIVEREF(__pyx_t_3);
    __Pyx_XGIVEREF(__pyx_t_4);
    __Pyx_ExceptionReset(__pyx_t_2, __pyx_t_3, __pyx_t_4);
    goto __pyx_L0;
  }

  /* "View.MemoryView":485
 *         self.assign_item_from_object(itemp, value)
 * 
 *     cdef convert_item_to_object(self, char *itemp):             # <<<<<<<<<<<<<<
 *         """Only used if instantiated manually by the user, or if Cython doesn't
 *         know how to convert the type"""
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_XDECREF(__pyx_t_6);
  __Pyx_XDECREF(__pyx_t_7);
  __Pyx_XDECREF(__pyx_t_9);
  __Pyx_AddTraceback("View.MemoryView.memoryview.convert_item_to_object", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_struct);
  __Pyx_XDECREF(__pyx_v_bytesitem);
  __Pyx_XDECREF(__pyx_v_result);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":501
 *             return result
 * 
 *     cdef assign_item_from_object(self, char *itemp, object value):             # <<<<<<<<<<<<<<
 *         """Only used if instantiated manually by the user, or if Cython doesn't
 *         know how to convert the type"""
 */

static PyObject *__pyx_memoryview_assign_item_from_object(struct __pyx_memoryview_obj *__pyx_v_self, char *__pyx_v_itemp, PyObject *__pyx_v_value) {
  PyObject *__pyx_v_struct = NULL;
  char __pyx_v_c;
  PyObject *__pyx_v_bytesvalue = 0;
  Py_ssize_t __pyx_v_i;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_t_2;
  int __pyx_t_3;
  PyObject *__pyx_t_4 = NULL;
  PyObject *__pyx_t_5 = NULL;
  PyObject *__pyx_t_6 = NULL;
  int __pyx_t_7;
  PyObject *__pyx_t_8 = NULL;
  Py_ssize_t __pyx_t_9;
  PyObject *__pyx_t_10 = NULL;
  char *__pyx_t_11;
  char *__pyx_t_12;
  char *__pyx_t_13;
  char *__pyx_t_14;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("assign_item_from_object", 0);

  /* "View.MemoryView":504
 *         """Only used if instantiated manually by the user, or if Cython doesn't
 *         know how to convert the type"""
 *         import struct             # <<<<<<<<<<<<<<
 *         cdef char c
 *         cdef bytes bytesvalue
 */
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_struct, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 504, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_v_struct = __pyx_t_1;
  __pyx_t_1 = 0;

  /* "View.MemoryView":509
 *         cdef Py_ssize_t i
 * 
 *         if isinstance(value, tuple):             # <<<<<<<<<<<<<<
 *             bytesvalue = struct.pack(self.view.format, *value)
 *         else:
 */
  __pyx_t_2 = PyTuple_Check(__pyx_v_value); 
  __pyx_t_3 = (__pyx_t_2 != 0);
  if (__pyx_t_3) {

    /* "View.MemoryView":510
 * 
 *         if isinstance(value, tuple):
 *             bytesvalue = struct.pack(self.view.format, *value)             # <<<<<<<<<<<<<<
 *         else:
 *             bytesvalue = struct.pack(self.view.format, value)
 */
    __pyx_t_1 = __Pyx_PyObject_GetAttrStr(__pyx_v_struct, __pyx_n_s_pack); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 510, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    __pyx_t_4 = __Pyx_PyBytes_FromString(__pyx_v_self->view.format); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 510, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __pyx_t_5 = PyTuple_New(1); if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 510, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
    __Pyx_GIVEREF(__pyx_t_4);
    PyTuple_SET_ITEM(__pyx_t_5, 0, __pyx_t_4);
    __pyx_t_4 = 0;
    __pyx_t_4 = __Pyx_PySequence_Tuple(__pyx_v_value); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 510, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __pyx_t_6 = PyNumber_Add(__pyx_t_5, __pyx_t_4); if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 510, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_6);
    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
    __pyx_t_4 = __Pyx_PyObject_Call(__pyx_t_1, __pyx_t_6, NULL); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 510, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
    if (!(likely(PyBytes_CheckExact(__pyx_t_4))||((__pyx_t_4) == Py_None)||(PyErr_Format(PyExc_TypeError, "Expected %.16s, got %.200s", "bytes", Py_TYPE(__pyx_t_4)->tp_name), 0))) __PYX_ERR(2, 510, __pyx_L1_error)
    __pyx_v_bytesvalue = ((PyObject*)__pyx_t_4);
    __pyx_t_4 = 0;

    /* "View.MemoryView":509
 *         cdef Py_ssize_t i
 * 
 *         if isinstance(value, tuple):             # <<<<<<<<<<<<<<
 *             bytesvalue = struct.pack(self.view.format, *value)
 *         else:
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":512
 *             bytesvalue = struct.pack(self.view.format, *value)
 *         else:
 *             bytesvalue = struct.pack(self.view.format, value)             # <<<<<<<<<<<<<<
 * 
 *         for i, c in enumerate(bytesvalue):
 */
  /*else*/ {
    __pyx_t_6 = __Pyx_PyObject_GetAttrStr(__pyx_v_struct, __pyx_n_s_pack); if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 512, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_6);
    __pyx_t_1 = __Pyx_PyBytes_FromString(__pyx_v_self->view.format); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 512, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    __pyx_t_5 = NULL;
    __pyx_t_7 = 0;
    if (CYTHON_UNPACK_METHODS && likely(PyMethod_Check(__pyx_t_6))) {
      __pyx_t_5 = PyMethod_GET_SELF(__pyx_t_6);
      if (likely(__pyx_t_5)) {
        PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_6);
        __Pyx_INCREF(__pyx_t_5);
        __Pyx_INCREF(function);
        __Pyx_DECREF_SET(__pyx_t_6, function);
        __pyx_t_7 = 1;
      }
    }
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(__pyx_t_6)) {
      PyObject *__pyx_temp[3] = {__pyx_t_5, __pyx_t_1, __pyx_v_value};
      __pyx_t_4 = __Pyx_PyFunction_FastCall(__pyx_t_6, __pyx_temp+1-__pyx_t_7, 2+__pyx_t_7); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 512, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0;
      __Pyx_GOTREF(__pyx_t_4);
      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
    } else
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(__pyx_t_6)) {
      PyObject *__pyx_temp[3] = {__pyx_t_5, __pyx_t_1, __pyx_v_value};
      __pyx_t_4 = __Pyx_PyCFunction_FastCall(__pyx_t_6, __pyx_temp+1-__pyx_t_7, 2+__pyx_t_7); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 512, __pyx_L1_error)
      __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0;
      __Pyx_GOTREF(__pyx_t_4);
      __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
    } else
    #endif
    {
      __pyx_t_8 = PyTuple_New(2+__pyx_t_7); if (unlikely(!__pyx_t_8)) __PYX_ERR(2, 512, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_8);
      if (__pyx_t_5) {
        __Pyx_GIVEREF(__pyx_t_5); PyTuple_SET_ITEM(__pyx_t_8, 0, __pyx_t_5); __pyx_t_5 = NULL;
      }
      __Pyx_GIVEREF(__pyx_t_1);
      PyTuple_SET_ITEM(__pyx_t_8, 0+__pyx_t_7, __pyx_t_1);
      __Pyx_INCREF(__pyx_v_value);
      __Pyx_GIVEREF(__pyx_v_value);
      PyTuple_SET_ITEM(__pyx_t_8, 1+__pyx_t_7, __pyx_v_value);
      __pyx_t_1 = 0;
      __pyx_t_4 = __Pyx_PyObject_Call(__pyx_t_6, __pyx_t_8, NULL); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 512, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_4);
      __Pyx_DECREF(__pyx_t_8); __pyx_t_8 = 0;
    }
    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
    if (!(likely(PyBytes_CheckExact(__pyx_t_4))||((__pyx_t_4) == Py_None)||(PyErr_Format(PyExc_TypeError, "Expected %.16s, got %.200s", "bytes", Py_TYPE(__pyx_t_4)->tp_name), 0))) __PYX_ERR(2, 512, __pyx_L1_error)
    __pyx_v_bytesvalue = ((PyObject*)__pyx_t_4);
    __pyx_t_4 = 0;
  }
  __pyx_L3:;

  /* "View.MemoryView":514
 *             bytesvalue = struct.pack(self.view.format, value)
 * 
 *         for i, c in enumerate(bytesvalue):             # <<<<<<<<<<<<<<
 *             itemp[i] = c
 * 
 */
  __pyx_t_9 = 0;
  if (unlikely(__pyx_v_bytesvalue == Py_None)) {
    PyErr_SetString(PyExc_TypeError, "'NoneType' is not iterable");
    __PYX_ERR(2, 514, __pyx_L1_error)
  }
  __Pyx_INCREF(__pyx_v_bytesvalue);
  __pyx_t_10 = __pyx_v_bytesvalue;
  __pyx_t_12 = PyBytes_AS_STRING(__pyx_t_10);
  __pyx_t_13 = (__pyx_t_12 + PyBytes_GET_SIZE(__pyx_t_10));
  for (__pyx_t_14 = __pyx_t_12; __pyx_t_14 < __pyx_t_13; __pyx_t_14++) {
    __pyx_t_11 = __pyx_t_14;
    __pyx_v_c = (__pyx_t_11[0]);

    /* "View.MemoryView":515
 * 
 *         for i, c in enumerate(bytesvalue):
 *             itemp[i] = c             # <<<<<<<<<<<<<<
 * 
 *     @cname('getbuffer')
 */
    __pyx_v_i = __pyx_t_9;

    /* "View.MemoryView":514
 *             bytesvalue = struct.pack(self.view.format, value)
 * 
 *         for i, c in enumerate(bytesvalue):             # <<<<<<<<<<<<<<
 *             itemp[i] = c
 * 
 */
    __pyx_t_9 = (__pyx_t_9 + 1);

    /* "View.MemoryView":515
 * 
 *         for i, c in enumerate(bytesvalue):
 *             itemp[i] = c             # <<<<<<<<<<<<<<
 * 
 *     @cname('getbuffer')
 */
    (__pyx_v_itemp[__pyx_v_i]) = __pyx_v_c;
  }
  __Pyx_DECREF(__pyx_t_10); __pyx_t_10 = 0;

  /* "View.MemoryView":501
 *             return result
 * 
 *     cdef assign_item_from_object(self, char *itemp, object value):             # <<<<<<<<<<<<<<
 *         """Only used if instantiated manually by the user, or if Cython doesn't
 *         know how to convert the type"""
 */

  /* function exit code */
  __pyx_r = Py_None; __Pyx_INCREF(Py_None);
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_XDECREF(__pyx_t_6);
  __Pyx_XDECREF(__pyx_t_8);
  __Pyx_XDECREF(__pyx_t_10);
  __Pyx_AddTraceback("View.MemoryView.memoryview.assign_item_from_object", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_struct);
  __Pyx_XDECREF(__pyx_v_bytesvalue);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":518
 * 
 *     @cname('getbuffer')
 *     def __getbuffer__(self, Py_buffer *info, int flags):             # <<<<<<<<<<<<<<
 *         if flags & PyBUF_WRITABLE and self.view.readonly:
 *             raise ValueError("Cannot create writable memory view from read-only memoryview")
 */

/* Python wrapper */
static CYTHON_UNUSED int __pyx_memoryview_getbuffer(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags); /*proto*/
static CYTHON_UNUSED int __pyx_memoryview_getbuffer(PyObject *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__getbuffer__ (wrapper)", 0);
  __pyx_r = __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_8__getbuffer__(((struct __pyx_memoryview_obj *)__pyx_v_self), ((Py_buffer *)__pyx_v_info), ((int)__pyx_v_flags));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static int __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_8__getbuffer__(struct __pyx_memoryview_obj *__pyx_v_self, Py_buffer *__pyx_v_info, int __pyx_v_flags) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  int __pyx_t_2;
  PyObject *__pyx_t_3 = NULL;
  Py_ssize_t *__pyx_t_4;
  char *__pyx_t_5;
  void *__pyx_t_6;
  int __pyx_t_7;
  Py_ssize_t __pyx_t_8;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  if (__pyx_v_info == NULL) {
    PyErr_SetString(PyExc_BufferError, "PyObject_GetBuffer: view==NULL argument is obsolete");
    return -1;
  }
  __Pyx_RefNannySetupContext("__getbuffer__", 0);
  __pyx_v_info->obj = Py_None; __Pyx_INCREF(Py_None);
  __Pyx_GIVEREF(__pyx_v_info->obj);

  /* "View.MemoryView":519
 *     @cname('getbuffer')
 *     def __getbuffer__(self, Py_buffer *info, int flags):
 *         if flags & PyBUF_WRITABLE and self.view.readonly:             # <<<<<<<<<<<<<<
 *             raise ValueError("Cannot create writable memory view from read-only memoryview")
 * 
 */
  __pyx_t_2 = ((__pyx_v_flags & PyBUF_WRITABLE) != 0);
  if (__pyx_t_2) {
  } else {
    __pyx_t_1 = __pyx_t_2;
    goto __pyx_L4_bool_binop_done;
  }
  __pyx_t_2 = (__pyx_v_self->view.readonly != 0);
  __pyx_t_1 = __pyx_t_2;
  __pyx_L4_bool_binop_done:;
  if (unlikely(__pyx_t_1)) {

    /* "View.MemoryView":520
 *     def __getbuffer__(self, Py_buffer *info, int flags):
 *         if flags & PyBUF_WRITABLE and self.view.readonly:
 *             raise ValueError("Cannot create writable memory view from read-only memoryview")             # <<<<<<<<<<<<<<
 * 
 *         if flags & PyBUF_ND:
 */
    __pyx_t_3 = __Pyx_PyObject_Call(__pyx_builtin_ValueError, __pyx_tuple__13, NULL); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 520, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_Raise(__pyx_t_3, 0, 0, 0);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __PYX_ERR(2, 520, __pyx_L1_error)

    /* "View.MemoryView":519
 *     @cname('getbuffer')
 *     def __getbuffer__(self, Py_buffer *info, int flags):
 *         if flags & PyBUF_WRITABLE and self.view.readonly:             # <<<<<<<<<<<<<<
 *             raise ValueError("Cannot create writable memory view from read-only memoryview")
 * 
 */
  }

  /* "View.MemoryView":522
 *             raise ValueError("Cannot create writable memory view from read-only memoryview")
 * 
 *         if flags & PyBUF_ND:             # <<<<<<<<<<<<<<
 *             info.shape = self.view.shape
 *         else:
 */
  __pyx_t_1 = ((__pyx_v_flags & PyBUF_ND) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":523
 * 
 *         if flags & PyBUF_ND:
 *             info.shape = self.view.shape             # <<<<<<<<<<<<<<
 *         else:
 *             info.shape = NULL
 */
    __pyx_t_4 = __pyx_v_self->view.shape;
    __pyx_v_info->shape = __pyx_t_4;

    /* "View.MemoryView":522
 *             raise ValueError("Cannot create writable memory view from read-only memoryview")
 * 
 *         if flags & PyBUF_ND:             # <<<<<<<<<<<<<<
 *             info.shape = self.view.shape
 *         else:
 */
    goto __pyx_L6;
  }

  /* "View.MemoryView":525
 *             info.shape = self.view.shape
 *         else:
 *             info.shape = NULL             # <<<<<<<<<<<<<<
 * 
 *         if flags & PyBUF_STRIDES:
 */
  /*else*/ {
    __pyx_v_info->shape = NULL;
  }
  __pyx_L6:;

  /* "View.MemoryView":527
 *             info.shape = NULL
 * 
 *         if flags & PyBUF_STRIDES:             # <<<<<<<<<<<<<<
 *             info.strides = self.view.strides
 *         else:
 */
  __pyx_t_1 = ((__pyx_v_flags & PyBUF_STRIDES) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":528
 * 
 *         if flags & PyBUF_STRIDES:
 *             info.strides = self.view.strides             # <<<<<<<<<<<<<<
 *         else:
 *             info.strides = NULL
 */
    __pyx_t_4 = __pyx_v_self->view.strides;
    __pyx_v_info->strides = __pyx_t_4;

    /* "View.MemoryView":527
 *             info.shape = NULL
 * 
 *         if flags & PyBUF_STRIDES:             # <<<<<<<<<<<<<<
 *             info.strides = self.view.strides
 *         else:
 */
    goto __pyx_L7;
  }

  /* "View.MemoryView":530
 *             info.strides = self.view.strides
 *         else:
 *             info.strides = NULL             # <<<<<<<<<<<<<<
 * 
 *         if flags & PyBUF_INDIRECT:
 */
  /*else*/ {
    __pyx_v_info->strides = NULL;
  }
  __pyx_L7:;

  /* "View.MemoryView":532
 *             info.strides = NULL
 * 
 *         if flags & PyBUF_INDIRECT:             # <<<<<<<<<<<<<<
 *             info.suboffsets = self.view.suboffsets
 *         else:
 */
  __pyx_t_1 = ((__pyx_v_flags & PyBUF_INDIRECT) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":533
 * 
 *         if flags & PyBUF_INDIRECT:
 *             info.suboffsets = self.view.suboffsets             # <<<<<<<<<<<<<<
 *         else:
 *             info.suboffsets = NULL
 */
    __pyx_t_4 = __pyx_v_self->view.suboffsets;
    __pyx_v_info->suboffsets = __pyx_t_4;

    /* "View.MemoryView":532
 *             info.strides = NULL
 * 
 *         if flags & PyBUF_INDIRECT:             # <<<<<<<<<<<<<<
 *             info.suboffsets = self.view.suboffsets
 *         else:
 */
    goto __pyx_L8;
  }

  /* "View.MemoryView":535
 *             info.suboffsets = self.view.suboffsets
 *         else:
 *             info.suboffsets = NULL             # <<<<<<<<<<<<<<
 * 
 *         if flags & PyBUF_FORMAT:
 */
  /*else*/ {
    __pyx_v_info->suboffsets = NULL;
  }
  __pyx_L8:;

  /* "View.MemoryView":537
 *             info.suboffsets = NULL
 * 
 *         if flags & PyBUF_FORMAT:             # <<<<<<<<<<<<<<
 *             info.format = self.view.format
 *         else:
 */
  __pyx_t_1 = ((__pyx_v_flags & PyBUF_FORMAT) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":538
 * 
 *         if flags & PyBUF_FORMAT:
 *             info.format = self.view.format             # <<<<<<<<<<<<<<
 *         else:
 *             info.format = NULL
 */
    __pyx_t_5 = __pyx_v_self->view.format;
    __pyx_v_info->format = __pyx_t_5;

    /* "View.MemoryView":537
 *             info.suboffsets = NULL
 * 
 *         if flags & PyBUF_FORMAT:             # <<<<<<<<<<<<<<
 *             info.format = self.view.format
 *         else:
 */
    goto __pyx_L9;
  }

  /* "View.MemoryView":540
 *             info.format = self.view.format
 *         else:
 *             info.format = NULL             # <<<<<<<<<<<<<<
 * 
 *         info.buf = self.view.buf
 */
  /*else*/ {
    __pyx_v_info->format = NULL;
  }
  __pyx_L9:;

  /* "View.MemoryView":542
 *             info.format = NULL
 * 
 *         info.buf = self.view.buf             # <<<<<<<<<<<<<<
 *         info.ndim = self.view.ndim
 *         info.itemsize = self.view.itemsize
 */
  __pyx_t_6 = __pyx_v_self->view.buf;
  __pyx_v_info->buf = __pyx_t_6;

  /* "View.MemoryView":543
 * 
 *         info.buf = self.view.buf
 *         info.ndim = self.view.ndim             # <<<<<<<<<<<<<<
 *         info.itemsize = self.view.itemsize
 *         info.len = self.view.len
 */
  __pyx_t_7 = __pyx_v_self->view.ndim;
  __pyx_v_info->ndim = __pyx_t_7;

  /* "View.MemoryView":544
 *         info.buf = self.view.buf
 *         info.ndim = self.view.ndim
 *         info.itemsize = self.view.itemsize             # <<<<<<<<<<<<<<
 *         info.len = self.view.len
 *         info.readonly = self.view.readonly
 */
  __pyx_t_8 = __pyx_v_self->view.itemsize;
  __pyx_v_info->itemsize = __pyx_t_8;

  /* "View.MemoryView":545
 *         info.ndim = self.view.ndim
 *         info.itemsize = self.view.itemsize
 *         info.len = self.view.len             # <<<<<<<<<<<<<<
 *         info.readonly = self.view.readonly
 *         info.obj = self
 */
  __pyx_t_8 = __pyx_v_self->view.len;
  __pyx_v_info->len = __pyx_t_8;

  /* "View.MemoryView":546
 *         info.itemsize = self.view.itemsize
 *         info.len = self.view.len
 *         info.readonly = self.view.readonly             # <<<<<<<<<<<<<<
 *         info.obj = self
 * 
 */
  __pyx_t_1 = __pyx_v_self->view.readonly;
  __pyx_v_info->readonly = __pyx_t_1;

  /* "View.MemoryView":547
 *         info.len = self.view.len
 *         info.readonly = self.view.readonly
 *         info.obj = self             # <<<<<<<<<<<<<<
 * 
 *     __pyx_getbuffer = capsule(<void *> &__pyx_memoryview_getbuffer, "getbuffer(obj, view, flags)")
 */
  __Pyx_INCREF(((PyObject *)__pyx_v_self));
  __Pyx_GIVEREF(((PyObject *)__pyx_v_self));
  __Pyx_GOTREF(__pyx_v_info->obj);
  __Pyx_DECREF(__pyx_v_info->obj);
  __pyx_v_info->obj = ((PyObject *)__pyx_v_self);

  /* "View.MemoryView":518
 * 
 *     @cname('getbuffer')
 *     def __getbuffer__(self, Py_buffer *info, int flags):             # <<<<<<<<<<<<<<
 *         if flags & PyBUF_WRITABLE and self.view.readonly:
 *             raise ValueError("Cannot create writable memory view from read-only memoryview")
 */

  /* function exit code */
  __pyx_r = 0;
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_AddTraceback("View.MemoryView.memoryview.__getbuffer__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = -1;
  if (__pyx_v_info->obj != NULL) {
    __Pyx_GOTREF(__pyx_v_info->obj);
    __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = 0;
  }
  goto __pyx_L2;
  __pyx_L0:;
  if (__pyx_v_info->obj == Py_None) {
    __Pyx_GOTREF(__pyx_v_info->obj);
    __Pyx_DECREF(__pyx_v_info->obj); __pyx_v_info->obj = 0;
  }
  __pyx_L2:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":553
 * 
 *     @property
 *     def T(self):             # <<<<<<<<<<<<<<
 *         cdef _memoryviewslice result = memoryview_copy(self)
 *         transpose_memslice(&result.from_slice)
 */

/* Python wrapper */
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_1T_1__get__(PyObject *__pyx_v_self); /*proto*/
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_1T_1__get__(PyObject *__pyx_v_self) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__get__ (wrapper)", 0);
  __pyx_r = __pyx_pf_15View_dot_MemoryView_10memoryview_1T___get__(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_1T___get__(struct __pyx_memoryview_obj *__pyx_v_self) {
  struct __pyx_memoryviewslice_obj *__pyx_v_result = 0;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_t_2;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__get__", 0);

  /* "View.MemoryView":554
 *     @property
 *     def T(self):
 *         cdef _memoryviewslice result = memoryview_copy(self)             # <<<<<<<<<<<<<<
 *         transpose_memslice(&result.from_slice)
 *         return result
 */
  __pyx_t_1 = __pyx_memoryview_copy_object(__pyx_v_self); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 554, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (!(likely(((__pyx_t_1) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_1, __pyx_memoryviewslice_type))))) __PYX_ERR(2, 554, __pyx_L1_error)
  __pyx_v_result = ((struct __pyx_memoryviewslice_obj *)__pyx_t_1);
  __pyx_t_1 = 0;

  /* "View.MemoryView":555
 *     def T(self):
 *         cdef _memoryviewslice result = memoryview_copy(self)
 *         transpose_memslice(&result.from_slice)             # <<<<<<<<<<<<<<
 *         return result
 * 
 */
  __pyx_t_2 = __pyx_memslice_transpose((&__pyx_v_result->from_slice)); if (unlikely(__pyx_t_2 == ((int)0))) __PYX_ERR(2, 555, __pyx_L1_error)

  /* "View.MemoryView":556
 *         cdef _memoryviewslice result = memoryview_copy(self)
 *         transpose_memslice(&result.from_slice)
 *         return result             # <<<<<<<<<<<<<<
 * 
 *     @property
 */
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(((PyObject *)__pyx_v_result));
  __pyx_r = ((PyObject *)__pyx_v_result);
  goto __pyx_L0;

  /* "View.MemoryView":553
 * 
 *     @property
 *     def T(self):             # <<<<<<<<<<<<<<
 *         cdef _memoryviewslice result = memoryview_copy(self)
 *         transpose_memslice(&result.from_slice)
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("View.MemoryView.memoryview.T.__get__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF((PyObject *)__pyx_v_result);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":559
 * 
 *     @property
 *     def base(self):             # <<<<<<<<<<<<<<
 *         return self.obj
 * 
 */

/* Python wrapper */
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_4base_1__get__(PyObject *__pyx_v_self); /*proto*/
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_4base_1__get__(PyObject *__pyx_v_self) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__get__ (wrapper)", 0);
  __pyx_r = __pyx_pf_15View_dot_MemoryView_10memoryview_4base___get__(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_4base___get__(struct __pyx_memoryview_obj *__pyx_v_self) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__get__", 0);

  /* "View.MemoryView":560
 *     @property
 *     def base(self):
 *         return self.obj             # <<<<<<<<<<<<<<
 * 
 *     @property
 */
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_self->obj);
  __pyx_r = __pyx_v_self->obj;
  goto __pyx_L0;

  /* "View.MemoryView":559
 * 
 *     @property
 *     def base(self):             # <<<<<<<<<<<<<<
 *         return self.obj
 * 
 */

  /* function exit code */
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":563
 * 
 *     @property
 *     def shape(self):             # <<<<<<<<<<<<<<
 *         return tuple([length for length in self.view.shape[:self.view.ndim]])
 * 
 */

/* Python wrapper */
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_5shape_1__get__(PyObject *__pyx_v_self); /*proto*/
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_5shape_1__get__(PyObject *__pyx_v_self) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__get__ (wrapper)", 0);
  __pyx_r = __pyx_pf_15View_dot_MemoryView_10memoryview_5shape___get__(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_5shape___get__(struct __pyx_memoryview_obj *__pyx_v_self) {
  Py_ssize_t __pyx_v_length;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  Py_ssize_t *__pyx_t_2;
  Py_ssize_t *__pyx_t_3;
  Py_ssize_t *__pyx_t_4;
  PyObject *__pyx_t_5 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__get__", 0);

  /* "View.MemoryView":564
 *     @property
 *     def shape(self):
 *         return tuple([length for length in self.view.shape[:self.view.ndim]])             # <<<<<<<<<<<<<<
 * 
 *     @property
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = PyList_New(0); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 564, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_3 = (__pyx_v_self->view.shape + __pyx_v_self->view.ndim);
  for (__pyx_t_4 = __pyx_v_self->view.shape; __pyx_t_4 < __pyx_t_3; __pyx_t_4++) {
    __pyx_t_2 = __pyx_t_4;
    __pyx_v_length = (__pyx_t_2[0]);
    __pyx_t_5 = PyInt_FromSsize_t(__pyx_v_length); if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 564, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_5);
    if (unlikely(__Pyx_ListComp_Append(__pyx_t_1, (PyObject*)__pyx_t_5))) __PYX_ERR(2, 564, __pyx_L1_error)
    __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
  }
  __pyx_t_5 = PyList_AsTuple(((PyObject*)__pyx_t_1)); if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 564, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_r = __pyx_t_5;
  __pyx_t_5 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":563
 * 
 *     @property
 *     def shape(self):             # <<<<<<<<<<<<<<
 *         return tuple([length for length in self.view.shape[:self.view.ndim]])
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_AddTraceback("View.MemoryView.memoryview.shape.__get__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":567
 * 
 *     @property
 *     def strides(self):             # <<<<<<<<<<<<<<
 *         if self.view.strides == NULL:
 * 
 */

/* Python wrapper */
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_7strides_1__get__(PyObject *__pyx_v_self); /*proto*/
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_7strides_1__get__(PyObject *__pyx_v_self) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__get__ (wrapper)", 0);
  __pyx_r = __pyx_pf_15View_dot_MemoryView_10memoryview_7strides___get__(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_7strides___get__(struct __pyx_memoryview_obj *__pyx_v_self) {
  Py_ssize_t __pyx_v_stride;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  Py_ssize_t *__pyx_t_3;
  Py_ssize_t *__pyx_t_4;
  Py_ssize_t *__pyx_t_5;
  PyObject *__pyx_t_6 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__get__", 0);

  /* "View.MemoryView":568
 *     @property
 *     def strides(self):
 *         if self.view.strides == NULL:             # <<<<<<<<<<<<<<
 * 
 *             raise ValueError("Buffer view does not expose strides")
 */
  __pyx_t_1 = ((__pyx_v_self->view.strides == NULL) != 0);
  if (unlikely(__pyx_t_1)) {

    /* "View.MemoryView":570
 *         if self.view.strides == NULL:
 * 
 *             raise ValueError("Buffer view does not expose strides")             # <<<<<<<<<<<<<<
 * 
 *         return tuple([stride for stride in self.view.strides[:self.view.ndim]])
 */
    __pyx_t_2 = __Pyx_PyObject_Call(__pyx_builtin_ValueError, __pyx_tuple__14, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 570, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __Pyx_Raise(__pyx_t_2, 0, 0, 0);
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    __PYX_ERR(2, 570, __pyx_L1_error)

    /* "View.MemoryView":568
 *     @property
 *     def strides(self):
 *         if self.view.strides == NULL:             # <<<<<<<<<<<<<<
 * 
 *             raise ValueError("Buffer view does not expose strides")
 */
  }

  /* "View.MemoryView":572
 *             raise ValueError("Buffer view does not expose strides")
 * 
 *         return tuple([stride for stride in self.view.strides[:self.view.ndim]])             # <<<<<<<<<<<<<<
 * 
 *     @property
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_2 = PyList_New(0); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 572, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_4 = (__pyx_v_self->view.strides + __pyx_v_self->view.ndim);
  for (__pyx_t_5 = __pyx_v_self->view.strides; __pyx_t_5 < __pyx_t_4; __pyx_t_5++) {
    __pyx_t_3 = __pyx_t_5;
    __pyx_v_stride = (__pyx_t_3[0]);
    __pyx_t_6 = PyInt_FromSsize_t(__pyx_v_stride); if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 572, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_6);
    if (unlikely(__Pyx_ListComp_Append(__pyx_t_2, (PyObject*)__pyx_t_6))) __PYX_ERR(2, 572, __pyx_L1_error)
    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
  }
  __pyx_t_6 = PyList_AsTuple(((PyObject*)__pyx_t_2)); if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 572, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_6);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_r = __pyx_t_6;
  __pyx_t_6 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":567
 * 
 *     @property
 *     def strides(self):             # <<<<<<<<<<<<<<
 *         if self.view.strides == NULL:
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_6);
  __Pyx_AddTraceback("View.MemoryView.memoryview.strides.__get__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":575
 * 
 *     @property
 *     def suboffsets(self):             # <<<<<<<<<<<<<<
 *         if self.view.suboffsets == NULL:
 *             return (-1,) * self.view.ndim
 */

/* Python wrapper */
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_10suboffsets_1__get__(PyObject *__pyx_v_self); /*proto*/
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_10suboffsets_1__get__(PyObject *__pyx_v_self) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__get__ (wrapper)", 0);
  __pyx_r = __pyx_pf_15View_dot_MemoryView_10memoryview_10suboffsets___get__(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_10suboffsets___get__(struct __pyx_memoryview_obj *__pyx_v_self) {
  Py_ssize_t __pyx_v_suboffset;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  Py_ssize_t *__pyx_t_4;
  Py_ssize_t *__pyx_t_5;
  Py_ssize_t *__pyx_t_6;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__get__", 0);

  /* "View.MemoryView":576
 *     @property
 *     def suboffsets(self):
 *         if self.view.suboffsets == NULL:             # <<<<<<<<<<<<<<
 *             return (-1,) * self.view.ndim
 * 
 */
  __pyx_t_1 = ((__pyx_v_self->view.suboffsets == NULL) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":577
 *     def suboffsets(self):
 *         if self.view.suboffsets == NULL:
 *             return (-1,) * self.view.ndim             # <<<<<<<<<<<<<<
 * 
 *         return tuple([suboffset for suboffset in self.view.suboffsets[:self.view.ndim]])
 */
    __Pyx_XDECREF(__pyx_r);
    __pyx_t_2 = __Pyx_PyInt_From_int(__pyx_v_self->view.ndim); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 577, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __pyx_t_3 = PyNumber_Multiply(__pyx_tuple__15, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 577, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    __pyx_r = __pyx_t_3;
    __pyx_t_3 = 0;
    goto __pyx_L0;

    /* "View.MemoryView":576
 *     @property
 *     def suboffsets(self):
 *         if self.view.suboffsets == NULL:             # <<<<<<<<<<<<<<
 *             return (-1,) * self.view.ndim
 * 
 */
  }

  /* "View.MemoryView":579
 *             return (-1,) * self.view.ndim
 * 
 *         return tuple([suboffset for suboffset in self.view.suboffsets[:self.view.ndim]])             # <<<<<<<<<<<<<<
 * 
 *     @property
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_3 = PyList_New(0); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 579, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_t_5 = (__pyx_v_self->view.suboffsets + __pyx_v_self->view.ndim);
  for (__pyx_t_6 = __pyx_v_self->view.suboffsets; __pyx_t_6 < __pyx_t_5; __pyx_t_6++) {
    __pyx_t_4 = __pyx_t_6;
    __pyx_v_suboffset = (__pyx_t_4[0]);
    __pyx_t_2 = PyInt_FromSsize_t(__pyx_v_suboffset); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 579, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    if (unlikely(__Pyx_ListComp_Append(__pyx_t_3, (PyObject*)__pyx_t_2))) __PYX_ERR(2, 579, __pyx_L1_error)
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  }
  __pyx_t_2 = PyList_AsTuple(((PyObject*)__pyx_t_3)); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 579, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_r = __pyx_t_2;
  __pyx_t_2 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":575
 * 
 *     @property
 *     def suboffsets(self):             # <<<<<<<<<<<<<<
 *         if self.view.suboffsets == NULL:
 *             return (-1,) * self.view.ndim
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_AddTraceback("View.MemoryView.memoryview.suboffsets.__get__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":582
 * 
 *     @property
 *     def ndim(self):             # <<<<<<<<<<<<<<
 *         return self.view.ndim
 * 
 */

/* Python wrapper */
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_4ndim_1__get__(PyObject *__pyx_v_self); /*proto*/
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_4ndim_1__get__(PyObject *__pyx_v_self) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__get__ (wrapper)", 0);
  __pyx_r = __pyx_pf_15View_dot_MemoryView_10memoryview_4ndim___get__(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_4ndim___get__(struct __pyx_memoryview_obj *__pyx_v_self) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__get__", 0);

  /* "View.MemoryView":583
 *     @property
 *     def ndim(self):
 *         return self.view.ndim             # <<<<<<<<<<<<<<
 * 
 *     @property
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = __Pyx_PyInt_From_int(__pyx_v_self->view.ndim); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 583, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":582
 * 
 *     @property
 *     def ndim(self):             # <<<<<<<<<<<<<<
 *         return self.view.ndim
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("View.MemoryView.memoryview.ndim.__get__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":586
 * 
 *     @property
 *     def itemsize(self):             # <<<<<<<<<<<<<<
 *         return self.view.itemsize
 * 
 */

/* Python wrapper */
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_8itemsize_1__get__(PyObject *__pyx_v_self); /*proto*/
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_8itemsize_1__get__(PyObject *__pyx_v_self) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__get__ (wrapper)", 0);
  __pyx_r = __pyx_pf_15View_dot_MemoryView_10memoryview_8itemsize___get__(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_8itemsize___get__(struct __pyx_memoryview_obj *__pyx_v_self) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__get__", 0);

  /* "View.MemoryView":587
 *     @property
 *     def itemsize(self):
 *         return self.view.itemsize             # <<<<<<<<<<<<<<
 * 
 *     @property
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = PyInt_FromSsize_t(__pyx_v_self->view.itemsize); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 587, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":586
 * 
 *     @property
 *     def itemsize(self):             # <<<<<<<<<<<<<<
 *         return self.view.itemsize
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("View.MemoryView.memoryview.itemsize.__get__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":590
 * 
 *     @property
 *     def nbytes(self):             # <<<<<<<<<<<<<<
 *         return self.size * self.view.itemsize
 * 
 */

/* Python wrapper */
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_6nbytes_1__get__(PyObject *__pyx_v_self); /*proto*/
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_6nbytes_1__get__(PyObject *__pyx_v_self) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__get__ (wrapper)", 0);
  __pyx_r = __pyx_pf_15View_dot_MemoryView_10memoryview_6nbytes___get__(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_6nbytes___get__(struct __pyx_memoryview_obj *__pyx_v_self) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__get__", 0);

  /* "View.MemoryView":591
 *     @property
 *     def nbytes(self):
 *         return self.size * self.view.itemsize             # <<<<<<<<<<<<<<
 * 
 *     @property
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = __Pyx_PyObject_GetAttrStr(((PyObject *)__pyx_v_self), __pyx_n_s_size); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 591, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = PyInt_FromSsize_t(__pyx_v_self->view.itemsize); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 591, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = PyNumber_Multiply(__pyx_t_1, __pyx_t_2); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 591, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_r = __pyx_t_3;
  __pyx_t_3 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":590
 * 
 *     @property
 *     def nbytes(self):             # <<<<<<<<<<<<<<
 *         return self.size * self.view.itemsize
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_AddTraceback("View.MemoryView.memoryview.nbytes.__get__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":594
 * 
 *     @property
 *     def size(self):             # <<<<<<<<<<<<<<
 *         if self._size is None:
 *             result = 1
 */

/* Python wrapper */
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_4size_1__get__(PyObject *__pyx_v_self); /*proto*/
static PyObject *__pyx_pw_15View_dot_MemoryView_10memoryview_4size_1__get__(PyObject *__pyx_v_self) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__get__ (wrapper)", 0);
  __pyx_r = __pyx_pf_15View_dot_MemoryView_10memoryview_4size___get__(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_15View_dot_MemoryView_10memoryview_4size___get__(struct __pyx_memoryview_obj *__pyx_v_self) {
  PyObject *__pyx_v_result = NULL;
  PyObject *__pyx_v_length = NULL;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  int __pyx_t_2;
  Py_ssize_t *__pyx_t_3;
  Py_ssize_t *__pyx_t_4;
  Py_ssize_t *__pyx_t_5;
  PyObject *__pyx_t_6 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__get__", 0);

  /* "View.MemoryView":595
 *     @property
 *     def size(self):
 *         if self._size is None:             # <<<<<<<<<<<<<<
 *             result = 1
 * 
 */
  __pyx_t_1 = (__pyx_v_self->_size == Py_None);
  __pyx_t_2 = (__pyx_t_1 != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":596
 *     def size(self):
 *         if self._size is None:
 *             result = 1             # <<<<<<<<<<<<<<
 * 
 *             for length in self.view.shape[:self.view.ndim]:
 */
    __Pyx_INCREF(__pyx_int_1);
    __pyx_v_result = __pyx_int_1;

    /* "View.MemoryView":598
 *             result = 1
 * 
 *             for length in self.view.shape[:self.view.ndim]:             # <<<<<<<<<<<<<<
 *                 result *= length
 * 
 */
    __pyx_t_4 = (__pyx_v_self->view.shape + __pyx_v_self->view.ndim);
    for (__pyx_t_5 = __pyx_v_self->view.shape; __pyx_t_5 < __pyx_t_4; __pyx_t_5++) {
      __pyx_t_3 = __pyx_t_5;
      __pyx_t_6 = PyInt_FromSsize_t((__pyx_t_3[0])); if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 598, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_6);
      __Pyx_XDECREF_SET(__pyx_v_length, __pyx_t_6);
      __pyx_t_6 = 0;

      /* "View.MemoryView":599
 * 
 *             for length in self.view.shape[:self.view.ndim]:
 *                 result *= length             # <<<<<<<<<<<<<<
 * 
 *             self._size = result
 */
      __pyx_t_6 = PyNumber_InPlaceMultiply(__pyx_v_result, __pyx_v_length); if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 599, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_6);
      __Pyx_DECREF_SET(__pyx_v_result, __pyx_t_6);
      __pyx_t_6 = 0;
    }

    /* "View.MemoryView":601
 *                 result *= length
 * 
 *             self._size = result             # <<<<<<<<<<<<<<
 * 
 *         return self._size
 */
    __Pyx_INCREF(__pyx_v_result);
    __Pyx_GIVEREF(__pyx_v_result);
    __Pyx_GOTREF(__pyx_v_self->_size);
    __Pyx_DECREF(__pyx_v_self->_size);
    __pyx_v_self->_size = __pyx_v_result;

    /* "View.MemoryView":595
 *     @property
 *     def size(self):
 *         if self._size is None:             # <<<<<<<<<<<<<<
 *             result = 1
 * 
 */
  }

  /* "View.MemoryView":603
 *             self._size = result
 * 
 *         return self._size             # <<<<<<<<<<<<<<
 * 
 *     def __len__(self):
 */
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_self->_size);
  __pyx_r = __pyx_v_self->_size;
  goto __pyx_L0;

  /* "View.MemoryView":594
 * 
 *     @property
 *     def size(self):             # <<<<<<<<<<<<<<
 *         if self._size is None:
 *             result = 1
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_6);
  __Pyx_AddTraceback("View.MemoryView.memoryview.size.__get__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_result);
  __Pyx_XDECREF(__pyx_v_length);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":605
 *         return self._size
 * 
 *     def __len__(self):             # <<<<<<<<<<<<<<
 *         if self.view.ndim >= 1:
 *             return self.view.shape[0]
 */

/* Python wrapper */
static Py_ssize_t __pyx_memoryview___len__(PyObject *__pyx_v_self); /*proto*/
static Py_ssize_t __pyx_memoryview___len__(PyObject *__pyx_v_self) {
  Py_ssize_t __pyx_r;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__len__ (wrapper)", 0);
  __pyx_r = __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_10__len__(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static Py_ssize_t __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_10__len__(struct __pyx_memoryview_obj *__pyx_v_self) {
  Py_ssize_t __pyx_r;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  __Pyx_RefNannySetupContext("__len__", 0);

  /* "View.MemoryView":606
 * 
 *     def __len__(self):
 *         if self.view.ndim >= 1:             # <<<<<<<<<<<<<<
 *             return self.view.shape[0]
 * 
 */
  __pyx_t_1 = ((__pyx_v_self->view.ndim >= 1) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":607
 *     def __len__(self):
 *         if self.view.ndim >= 1:
 *             return self.view.shape[0]             # <<<<<<<<<<<<<<
 * 
 *         return 0
 */
    __pyx_r = (__pyx_v_self->view.shape[0]);
    goto __pyx_L0;

    /* "View.MemoryView":606
 * 
 *     def __len__(self):
 *         if self.view.ndim >= 1:             # <<<<<<<<<<<<<<
 *             return self.view.shape[0]
 * 
 */
  }

  /* "View.MemoryView":609
 *             return self.view.shape[0]
 * 
 *         return 0             # <<<<<<<<<<<<<<
 * 
 *     def __repr__(self):
 */
  __pyx_r = 0;
  goto __pyx_L0;

  /* "View.MemoryView":605
 *         return self._size
 * 
 *     def __len__(self):             # <<<<<<<<<<<<<<
 *         if self.view.ndim >= 1:
 *             return self.view.shape[0]
 */

  /* function exit code */
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":611
 *         return 0
 * 
 *     def __repr__(self):             # <<<<<<<<<<<<<<
 *         return "<MemoryView of %r at 0x%x>" % (self.base.__class__.__name__,
 *                                                id(self))
 */

/* Python wrapper */
static PyObject *__pyx_memoryview___repr__(PyObject *__pyx_v_self); /*proto*/
static PyObject *__pyx_memoryview___repr__(PyObject *__pyx_v_self) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__repr__ (wrapper)", 0);
  __pyx_r = __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_12__repr__(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_12__repr__(struct __pyx_memoryview_obj *__pyx_v_self) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__repr__", 0);

  /* "View.MemoryView":612
 * 
 *     def __repr__(self):
 *         return "<MemoryView of %r at 0x%x>" % (self.base.__class__.__name__,             # <<<<<<<<<<<<<<
 *                                                id(self))
 * 
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = __Pyx_PyObject_GetAttrStr(((PyObject *)__pyx_v_self), __pyx_n_s_base); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 612, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_class); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 612, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_name_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 612, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;

  /* "View.MemoryView":613
 *     def __repr__(self):
 *         return "<MemoryView of %r at 0x%x>" % (self.base.__class__.__name__,
 *                                                id(self))             # <<<<<<<<<<<<<<
 * 
 *     def __str__(self):
 */
  __pyx_t_2 = __Pyx_PyObject_CallOneArg(__pyx_builtin_id, ((PyObject *)__pyx_v_self)); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 613, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);

  /* "View.MemoryView":612
 * 
 *     def __repr__(self):
 *         return "<MemoryView of %r at 0x%x>" % (self.base.__class__.__name__,             # <<<<<<<<<<<<<<
 *                                                id(self))
 * 
 */
  __pyx_t_3 = PyTuple_New(2); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 612, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_GIVEREF(__pyx_t_1);
  PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_t_1);
  __Pyx_GIVEREF(__pyx_t_2);
  PyTuple_SET_ITEM(__pyx_t_3, 1, __pyx_t_2);
  __pyx_t_1 = 0;
  __pyx_t_2 = 0;
  __pyx_t_2 = __Pyx_PyString_Format(__pyx_kp_s_MemoryView_of_r_at_0x_x, __pyx_t_3); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 612, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_r = __pyx_t_2;
  __pyx_t_2 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":611
 *         return 0
 * 
 *     def __repr__(self):             # <<<<<<<<<<<<<<
 *         return "<MemoryView of %r at 0x%x>" % (self.base.__class__.__name__,
 *                                                id(self))
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_AddTraceback("View.MemoryView.memoryview.__repr__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":615
 *                                                id(self))
 * 
 *     def __str__(self):             # <<<<<<<<<<<<<<
 *         return "<MemoryView of %r object>" % (self.base.__class__.__name__,)
 * 
 */

/* Python wrapper */
static PyObject *__pyx_memoryview___str__(PyObject *__pyx_v_self); /*proto*/
static PyObject *__pyx_memoryview___str__(PyObject *__pyx_v_self) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__str__ (wrapper)", 0);
  __pyx_r = __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_14__str__(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_14__str__(struct __pyx_memoryview_obj *__pyx_v_self) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  PyObject *__pyx_t_2 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__str__", 0);

  /* "View.MemoryView":616
 * 
 *     def __str__(self):
 *         return "<MemoryView of %r object>" % (self.base.__class__.__name__,)             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = __Pyx_PyObject_GetAttrStr(((PyObject *)__pyx_v_self), __pyx_n_s_base); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 616, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_PyObject_GetAttrStr(__pyx_t_1, __pyx_n_s_class); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 616, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = __Pyx_PyObject_GetAttrStr(__pyx_t_2, __pyx_n_s_name_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 616, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_t_2 = PyTuple_New(1); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 616, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_GIVEREF(__pyx_t_1);
  PyTuple_SET_ITEM(__pyx_t_2, 0, __pyx_t_1);
  __pyx_t_1 = 0;
  __pyx_t_1 = __Pyx_PyString_Format(__pyx_kp_s_MemoryView_of_r_object, __pyx_t_2); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 616, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":615
 *                                                id(self))
 * 
 *     def __str__(self):             # <<<<<<<<<<<<<<
 *         return "<MemoryView of %r object>" % (self.base.__class__.__name__,)
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_AddTraceback("View.MemoryView.memoryview.__str__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":619
 * 
 * 
 *     def is_c_contig(self):             # <<<<<<<<<<<<<<
 *         cdef __Pyx_memviewslice *mslice
 *         cdef __Pyx_memviewslice tmp
 */

/* Python wrapper */
static PyObject *__pyx_memoryview_is_c_contig(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused); /*proto*/
static PyObject *__pyx_memoryview_is_c_contig(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("is_c_contig (wrapper)", 0);
  __pyx_r = __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_16is_c_contig(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_16is_c_contig(struct __pyx_memoryview_obj *__pyx_v_self) {
  __Pyx_memviewslice *__pyx_v_mslice;
  __Pyx_memviewslice __pyx_v_tmp;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_memviewslice *__pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("is_c_contig", 0);

  /* "View.MemoryView":622
 *         cdef __Pyx_memviewslice *mslice
 *         cdef __Pyx_memviewslice tmp
 *         mslice = get_slice_from_memview(self, &tmp)             # <<<<<<<<<<<<<<
 *         return slice_is_contig(mslice[0], 'C', self.view.ndim)
 * 
 */
  __pyx_t_1 = __pyx_memoryview_get_slice_from_memoryview(__pyx_v_self, (&__pyx_v_tmp)); if (unlikely(__pyx_t_1 == ((__Pyx_memviewslice *)NULL))) __PYX_ERR(2, 622, __pyx_L1_error)
  __pyx_v_mslice = __pyx_t_1;

  /* "View.MemoryView":623
 *         cdef __Pyx_memviewslice tmp
 *         mslice = get_slice_from_memview(self, &tmp)
 *         return slice_is_contig(mslice[0], 'C', self.view.ndim)             # <<<<<<<<<<<<<<
 * 
 *     def is_f_contig(self):
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_2 = __Pyx_PyBool_FromLong(__pyx_memviewslice_is_contig((__pyx_v_mslice[0]), 'C', __pyx_v_self->view.ndim)); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 623, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_r = __pyx_t_2;
  __pyx_t_2 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":619
 * 
 * 
 *     def is_c_contig(self):             # <<<<<<<<<<<<<<
 *         cdef __Pyx_memviewslice *mslice
 *         cdef __Pyx_memviewslice tmp
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_AddTraceback("View.MemoryView.memoryview.is_c_contig", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":625
 *         return slice_is_contig(mslice[0], 'C', self.view.ndim)
 * 
 *     def is_f_contig(self):             # <<<<<<<<<<<<<<
 *         cdef __Pyx_memviewslice *mslice
 *         cdef __Pyx_memviewslice tmp
 */

/* Python wrapper */
static PyObject *__pyx_memoryview_is_f_contig(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused); /*proto*/
static PyObject *__pyx_memoryview_is_f_contig(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("is_f_contig (wrapper)", 0);
  __pyx_r = __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_18is_f_contig(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_18is_f_contig(struct __pyx_memoryview_obj *__pyx_v_self) {
  __Pyx_memviewslice *__pyx_v_mslice;
  __Pyx_memviewslice __pyx_v_tmp;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_memviewslice *__pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("is_f_contig", 0);

  /* "View.MemoryView":628
 *         cdef __Pyx_memviewslice *mslice
 *         cdef __Pyx_memviewslice tmp
 *         mslice = get_slice_from_memview(self, &tmp)             # <<<<<<<<<<<<<<
 *         return slice_is_contig(mslice[0], 'F', self.view.ndim)
 * 
 */
  __pyx_t_1 = __pyx_memoryview_get_slice_from_memoryview(__pyx_v_self, (&__pyx_v_tmp)); if (unlikely(__pyx_t_1 == ((__Pyx_memviewslice *)NULL))) __PYX_ERR(2, 628, __pyx_L1_error)
  __pyx_v_mslice = __pyx_t_1;

  /* "View.MemoryView":629
 *         cdef __Pyx_memviewslice tmp
 *         mslice = get_slice_from_memview(self, &tmp)
 *         return slice_is_contig(mslice[0], 'F', self.view.ndim)             # <<<<<<<<<<<<<<
 * 
 *     def copy(self):
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_2 = __Pyx_PyBool_FromLong(__pyx_memviewslice_is_contig((__pyx_v_mslice[0]), 'F', __pyx_v_self->view.ndim)); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 629, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_r = __pyx_t_2;
  __pyx_t_2 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":625
 *         return slice_is_contig(mslice[0], 'C', self.view.ndim)
 * 
 *     def is_f_contig(self):             # <<<<<<<<<<<<<<
 *         cdef __Pyx_memviewslice *mslice
 *         cdef __Pyx_memviewslice tmp
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_AddTraceback("View.MemoryView.memoryview.is_f_contig", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":631
 *         return slice_is_contig(mslice[0], 'F', self.view.ndim)
 * 
 *     def copy(self):             # <<<<<<<<<<<<<<
 *         cdef __Pyx_memviewslice mslice
 *         cdef int flags = self.flags & ~PyBUF_F_CONTIGUOUS
 */

/* Python wrapper */
static PyObject *__pyx_memoryview_copy(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused); /*proto*/
static PyObject *__pyx_memoryview_copy(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("copy (wrapper)", 0);
  __pyx_r = __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_20copy(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_20copy(struct __pyx_memoryview_obj *__pyx_v_self) {
  __Pyx_memviewslice __pyx_v_mslice;
  int __pyx_v_flags;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_memviewslice __pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("copy", 0);

  /* "View.MemoryView":633
 *     def copy(self):
 *         cdef __Pyx_memviewslice mslice
 *         cdef int flags = self.flags & ~PyBUF_F_CONTIGUOUS             # <<<<<<<<<<<<<<
 * 
 *         slice_copy(self, &mslice)
 */
  __pyx_v_flags = (__pyx_v_self->flags & (~PyBUF_F_CONTIGUOUS));

  /* "View.MemoryView":635
 *         cdef int flags = self.flags & ~PyBUF_F_CONTIGUOUS
 * 
 *         slice_copy(self, &mslice)             # <<<<<<<<<<<<<<
 *         mslice = slice_copy_contig(&mslice, "c", self.view.ndim,
 *                                    self.view.itemsize,
 */
  __pyx_memoryview_slice_copy(__pyx_v_self, (&__pyx_v_mslice));

  /* "View.MemoryView":636
 * 
 *         slice_copy(self, &mslice)
 *         mslice = slice_copy_contig(&mslice, "c", self.view.ndim,             # <<<<<<<<<<<<<<
 *                                    self.view.itemsize,
 *                                    flags|PyBUF_C_CONTIGUOUS,
 */
  __pyx_t_1 = __pyx_memoryview_copy_new_contig((&__pyx_v_mslice), ((char *)"c"), __pyx_v_self->view.ndim, __pyx_v_self->view.itemsize, (__pyx_v_flags | PyBUF_C_CONTIGUOUS), __pyx_v_self->dtype_is_object); if (unlikely(PyErr_Occurred())) __PYX_ERR(2, 636, __pyx_L1_error)
  __pyx_v_mslice = __pyx_t_1;

  /* "View.MemoryView":641
 *                                    self.dtype_is_object)
 * 
 *         return memoryview_copy_from_slice(self, &mslice)             # <<<<<<<<<<<<<<
 * 
 *     def copy_fortran(self):
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_2 = __pyx_memoryview_copy_object_from_slice(__pyx_v_self, (&__pyx_v_mslice)); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 641, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_r = __pyx_t_2;
  __pyx_t_2 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":631
 *         return slice_is_contig(mslice[0], 'F', self.view.ndim)
 * 
 *     def copy(self):             # <<<<<<<<<<<<<<
 *         cdef __Pyx_memviewslice mslice
 *         cdef int flags = self.flags & ~PyBUF_F_CONTIGUOUS
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_AddTraceback("View.MemoryView.memoryview.copy", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":643
 *         return memoryview_copy_from_slice(self, &mslice)
 * 
 *     def copy_fortran(self):             # <<<<<<<<<<<<<<
 *         cdef __Pyx_memviewslice src, dst
 *         cdef int flags = self.flags & ~PyBUF_C_CONTIGUOUS
 */

/* Python wrapper */
static PyObject *__pyx_memoryview_copy_fortran(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused); /*proto*/
static PyObject *__pyx_memoryview_copy_fortran(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("copy_fortran (wrapper)", 0);
  __pyx_r = __pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_22copy_fortran(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_memoryview___pyx_pf_15View_dot_MemoryView_10memoryview_22copy_fortran(struct __pyx_memoryview_obj *__pyx_v_self) {
  __Pyx_memviewslice __pyx_v_src;
  __Pyx_memviewslice __pyx_v_dst;
  int __pyx_v_flags;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_memviewslice __pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("copy_fortran", 0);

  /* "View.MemoryView":645
 *     def copy_fortran(self):
 *         cdef __Pyx_memviewslice src, dst
 *         cdef int flags = self.flags & ~PyBUF_C_CONTIGUOUS             # <<<<<<<<<<<<<<
 * 
 *         slice_copy(self, &src)
 */
  __pyx_v_flags = (__pyx_v_self->flags & (~PyBUF_C_CONTIGUOUS));

  /* "View.MemoryView":647
 *         cdef int flags = self.flags & ~PyBUF_C_CONTIGUOUS
 * 
 *         slice_copy(self, &src)             # <<<<<<<<<<<<<<
 *         dst = slice_copy_contig(&src, "fortran", self.view.ndim,
 *                                 self.view.itemsize,
 */
  __pyx_memoryview_slice_copy(__pyx_v_self, (&__pyx_v_src));

  /* "View.MemoryView":648
 * 
 *         slice_copy(self, &src)
 *         dst = slice_copy_contig(&src, "fortran", self.view.ndim,             # <<<<<<<<<<<<<<
 *                                 self.view.itemsize,
 *                                 flags|PyBUF_F_CONTIGUOUS,
 */
  __pyx_t_1 = __pyx_memoryview_copy_new_contig((&__pyx_v_src), ((char *)"fortran"), __pyx_v_self->view.ndim, __pyx_v_self->view.itemsize, (__pyx_v_flags | PyBUF_F_CONTIGUOUS), __pyx_v_self->dtype_is_object); if (unlikely(PyErr_Occurred())) __PYX_ERR(2, 648, __pyx_L1_error)
  __pyx_v_dst = __pyx_t_1;

  /* "View.MemoryView":653
 *                                 self.dtype_is_object)
 * 
 *         return memoryview_copy_from_slice(self, &dst)             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_2 = __pyx_memoryview_copy_object_from_slice(__pyx_v_self, (&__pyx_v_dst)); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 653, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_r = __pyx_t_2;
  __pyx_t_2 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":643
 *         return memoryview_copy_from_slice(self, &mslice)
 * 
 *     def copy_fortran(self):             # <<<<<<<<<<<<<<
 *         cdef __Pyx_memviewslice src, dst
 *         cdef int flags = self.flags & ~PyBUF_C_CONTIGUOUS
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_AddTraceback("View.MemoryView.memoryview.copy_fortran", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "(tree fragment)":1
 * def __reduce_cython__(self):             # <<<<<<<<<<<<<<
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):
 */

/* Python wrapper */
static PyObject *__pyx_pw___pyx_memoryview_1__reduce_cython__(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused); /*proto*/
static PyObject *__pyx_pw___pyx_memoryview_1__reduce_cython__(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__reduce_cython__ (wrapper)", 0);
  __pyx_r = __pyx_pf___pyx_memoryview___reduce_cython__(((struct __pyx_memoryview_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf___pyx_memoryview___reduce_cython__(CYTHON_UNUSED struct __pyx_memoryview_obj *__pyx_v_self) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__reduce_cython__", 0);

  /* "(tree fragment)":2
 * def __reduce_cython__(self):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")             # <<<<<<<<<<<<<<
 * def __setstate_cython__(self, __pyx_state):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 */
  __pyx_t_1 = __Pyx_PyObject_Call(__pyx_builtin_TypeError, __pyx_tuple__16, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 2, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_Raise(__pyx_t_1, 0, 0, 0);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __PYX_ERR(2, 2, __pyx_L1_error)

  /* "(tree fragment)":1
 * def __reduce_cython__(self):             # <<<<<<<<<<<<<<
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("View.MemoryView.memoryview.__reduce_cython__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "(tree fragment)":3
 * def __reduce_cython__(self):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):             # <<<<<<<<<<<<<<
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 */

/* Python wrapper */
static PyObject *__pyx_pw___pyx_memoryview_3__setstate_cython__(PyObject *__pyx_v_self, PyObject *__pyx_v___pyx_state); /*proto*/
static PyObject *__pyx_pw___pyx_memoryview_3__setstate_cython__(PyObject *__pyx_v_self, PyObject *__pyx_v___pyx_state) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__setstate_cython__ (wrapper)", 0);
  __pyx_r = __pyx_pf___pyx_memoryview_2__setstate_cython__(((struct __pyx_memoryview_obj *)__pyx_v_self), ((PyObject *)__pyx_v___pyx_state));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf___pyx_memoryview_2__setstate_cython__(CYTHON_UNUSED struct __pyx_memoryview_obj *__pyx_v_self, CYTHON_UNUSED PyObject *__pyx_v___pyx_state) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__setstate_cython__", 0);

  /* "(tree fragment)":4
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")             # <<<<<<<<<<<<<<
 */
  __pyx_t_1 = __Pyx_PyObject_Call(__pyx_builtin_TypeError, __pyx_tuple__17, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_Raise(__pyx_t_1, 0, 0, 0);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __PYX_ERR(2, 4, __pyx_L1_error)

  /* "(tree fragment)":3
 * def __reduce_cython__(self):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):             # <<<<<<<<<<<<<<
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("View.MemoryView.memoryview.__setstate_cython__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":657
 * 
 * @cname('__pyx_memoryview_new')
 * cdef memoryview_cwrapper(object o, int flags, bint dtype_is_object, __Pyx_TypeInfo *typeinfo):             # <<<<<<<<<<<<<<
 *     cdef memoryview result = memoryview(o, flags, dtype_is_object)
 *     result.typeinfo = typeinfo
 */

static PyObject *__pyx_memoryview_new(PyObject *__pyx_v_o, int __pyx_v_flags, int __pyx_v_dtype_is_object, __Pyx_TypeInfo *__pyx_v_typeinfo) {
  struct __pyx_memoryview_obj *__pyx_v_result = 0;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("memoryview_cwrapper", 0);

  /* "View.MemoryView":658
 * @cname('__pyx_memoryview_new')
 * cdef memoryview_cwrapper(object o, int flags, bint dtype_is_object, __Pyx_TypeInfo *typeinfo):
 *     cdef memoryview result = memoryview(o, flags, dtype_is_object)             # <<<<<<<<<<<<<<
 *     result.typeinfo = typeinfo
 *     return result
 */
  __pyx_t_1 = __Pyx_PyInt_From_int(__pyx_v_flags); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 658, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = __Pyx_PyBool_FromLong(__pyx_v_dtype_is_object); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 658, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = PyTuple_New(3); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 658, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_INCREF(__pyx_v_o);
  __Pyx_GIVEREF(__pyx_v_o);
  PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_v_o);
  __Pyx_GIVEREF(__pyx_t_1);
  PyTuple_SET_ITEM(__pyx_t_3, 1, __pyx_t_1);
  __Pyx_GIVEREF(__pyx_t_2);
  PyTuple_SET_ITEM(__pyx_t_3, 2, __pyx_t_2);
  __pyx_t_1 = 0;
  __pyx_t_2 = 0;
  __pyx_t_2 = __Pyx_PyObject_Call(((PyObject *)__pyx_memoryview_type), __pyx_t_3, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 658, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_v_result = ((struct __pyx_memoryview_obj *)__pyx_t_2);
  __pyx_t_2 = 0;

  /* "View.MemoryView":659
 * cdef memoryview_cwrapper(object o, int flags, bint dtype_is_object, __Pyx_TypeInfo *typeinfo):
 *     cdef memoryview result = memoryview(o, flags, dtype_is_object)
 *     result.typeinfo = typeinfo             # <<<<<<<<<<<<<<
 *     return result
 * 
 */
  __pyx_v_result->typeinfo = __pyx_v_typeinfo;

  /* "View.MemoryView":660
 *     cdef memoryview result = memoryview(o, flags, dtype_is_object)
 *     result.typeinfo = typeinfo
 *     return result             # <<<<<<<<<<<<<<
 * 
 * @cname('__pyx_memoryview_check')
 */
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(((PyObject *)__pyx_v_result));
  __pyx_r = ((PyObject *)__pyx_v_result);
  goto __pyx_L0;

  /* "View.MemoryView":657
 * 
 * @cname('__pyx_memoryview_new')
 * cdef memoryview_cwrapper(object o, int flags, bint dtype_is_object, __Pyx_TypeInfo *typeinfo):             # <<<<<<<<<<<<<<
 *     cdef memoryview result = memoryview(o, flags, dtype_is_object)
 *     result.typeinfo = typeinfo
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_AddTraceback("View.MemoryView.memoryview_cwrapper", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XDECREF((PyObject *)__pyx_v_result);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":663
 * 
 * @cname('__pyx_memoryview_check')
 * cdef inline bint memoryview_check(object o):             # <<<<<<<<<<<<<<
 *     return isinstance(o, memoryview)
 * 
 */

static CYTHON_INLINE int __pyx_memoryview_check(PyObject *__pyx_v_o) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  __Pyx_RefNannySetupContext("memoryview_check", 0);

  /* "View.MemoryView":664
 * @cname('__pyx_memoryview_check')
 * cdef inline bint memoryview_check(object o):
 *     return isinstance(o, memoryview)             # <<<<<<<<<<<<<<
 * 
 * cdef tuple _unellipsify(object index, int ndim):
 */
  __pyx_t_1 = __Pyx_TypeCheck(__pyx_v_o, __pyx_memoryview_type); 
  __pyx_r = __pyx_t_1;
  goto __pyx_L0;

  /* "View.MemoryView":663
 * 
 * @cname('__pyx_memoryview_check')
 * cdef inline bint memoryview_check(object o):             # <<<<<<<<<<<<<<
 *     return isinstance(o, memoryview)
 * 
 */

  /* function exit code */
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":666
 *     return isinstance(o, memoryview)
 * 
 * cdef tuple _unellipsify(object index, int ndim):             # <<<<<<<<<<<<<<
 *     """
 *     Replace all ellipses with full slices and fill incomplete indices with
 */

static PyObject *_unellipsify(PyObject *__pyx_v_index, int __pyx_v_ndim) {
  PyObject *__pyx_v_tup = NULL;
  PyObject *__pyx_v_result = NULL;
  int __pyx_v_have_slices;
  int __pyx_v_seen_ellipsis;
  CYTHON_UNUSED PyObject *__pyx_v_idx = NULL;
  PyObject *__pyx_v_item = NULL;
  Py_ssize_t __pyx_v_nslices;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  int __pyx_t_2;
  PyObject *__pyx_t_3 = NULL;
  PyObject *__pyx_t_4 = NULL;
  Py_ssize_t __pyx_t_5;
  PyObject *(*__pyx_t_6)(PyObject *);
  PyObject *__pyx_t_7 = NULL;
  Py_ssize_t __pyx_t_8;
  int __pyx_t_9;
  int __pyx_t_10;
  PyObject *__pyx_t_11 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("_unellipsify", 0);

  /* "View.MemoryView":671
 *     full slices.
 *     """
 *     if not isinstance(index, tuple):             # <<<<<<<<<<<<<<
 *         tup = (index,)
 *     else:
 */
  __pyx_t_1 = PyTuple_Check(__pyx_v_index); 
  __pyx_t_2 = ((!(__pyx_t_1 != 0)) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":672
 *     """
 *     if not isinstance(index, tuple):
 *         tup = (index,)             # <<<<<<<<<<<<<<
 *     else:
 *         tup = index
 */
    __pyx_t_3 = PyTuple_New(1); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 672, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_INCREF(__pyx_v_index);
    __Pyx_GIVEREF(__pyx_v_index);
    PyTuple_SET_ITEM(__pyx_t_3, 0, __pyx_v_index);
    __pyx_v_tup = __pyx_t_3;
    __pyx_t_3 = 0;

    /* "View.MemoryView":671
 *     full slices.
 *     """
 *     if not isinstance(index, tuple):             # <<<<<<<<<<<<<<
 *         tup = (index,)
 *     else:
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":674
 *         tup = (index,)
 *     else:
 *         tup = index             # <<<<<<<<<<<<<<
 * 
 *     result = []
 */
  /*else*/ {
    __Pyx_INCREF(__pyx_v_index);
    __pyx_v_tup = __pyx_v_index;
  }
  __pyx_L3:;

  /* "View.MemoryView":676
 *         tup = index
 * 
 *     result = []             # <<<<<<<<<<<<<<
 *     have_slices = False
 *     seen_ellipsis = False
 */
  __pyx_t_3 = PyList_New(0); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 676, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_v_result = ((PyObject*)__pyx_t_3);
  __pyx_t_3 = 0;

  /* "View.MemoryView":677
 * 
 *     result = []
 *     have_slices = False             # <<<<<<<<<<<<<<
 *     seen_ellipsis = False
 *     for idx, item in enumerate(tup):
 */
  __pyx_v_have_slices = 0;

  /* "View.MemoryView":678
 *     result = []
 *     have_slices = False
 *     seen_ellipsis = False             # <<<<<<<<<<<<<<
 *     for idx, item in enumerate(tup):
 *         if item is Ellipsis:
 */
  __pyx_v_seen_ellipsis = 0;

  /* "View.MemoryView":679
 *     have_slices = False
 *     seen_ellipsis = False
 *     for idx, item in enumerate(tup):             # <<<<<<<<<<<<<<
 *         if item is Ellipsis:
 *             if not seen_ellipsis:
 */
  __Pyx_INCREF(__pyx_int_0);
  __pyx_t_3 = __pyx_int_0;
  if (likely(PyList_CheckExact(__pyx_v_tup)) || PyTuple_CheckExact(__pyx_v_tup)) {
    __pyx_t_4 = __pyx_v_tup; __Pyx_INCREF(__pyx_t_4); __pyx_t_5 = 0;
    __pyx_t_6 = NULL;
  } else {
    __pyx_t_5 = -1; __pyx_t_4 = PyObject_GetIter(__pyx_v_tup); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 679, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __pyx_t_6 = Py_TYPE(__pyx_t_4)->tp_iternext; if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 679, __pyx_L1_error)
  }
  for (;;) {
    if (likely(!__pyx_t_6)) {
      if (likely(PyList_CheckExact(__pyx_t_4))) {
        if (__pyx_t_5 >= PyList_GET_SIZE(__pyx_t_4)) break;
        #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS
        __pyx_t_7 = PyList_GET_ITEM(__pyx_t_4, __pyx_t_5); __Pyx_INCREF(__pyx_t_7); __pyx_t_5++; if (unlikely(0 < 0)) __PYX_ERR(2, 679, __pyx_L1_error)
        #else
        __pyx_t_7 = PySequence_ITEM(__pyx_t_4, __pyx_t_5); __pyx_t_5++; if (unlikely(!__pyx_t_7)) __PYX_ERR(2, 679, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_7);
        #endif
      } else {
        if (__pyx_t_5 >= PyTuple_GET_SIZE(__pyx_t_4)) break;
        #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS
        __pyx_t_7 = PyTuple_GET_ITEM(__pyx_t_4, __pyx_t_5); __Pyx_INCREF(__pyx_t_7); __pyx_t_5++; if (unlikely(0 < 0)) __PYX_ERR(2, 679, __pyx_L1_error)
        #else
        __pyx_t_7 = PySequence_ITEM(__pyx_t_4, __pyx_t_5); __pyx_t_5++; if (unlikely(!__pyx_t_7)) __PYX_ERR(2, 679, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_7);
        #endif
      }
    } else {
      __pyx_t_7 = __pyx_t_6(__pyx_t_4);
      if (unlikely(!__pyx_t_7)) {
        PyObject* exc_type = PyErr_Occurred();
        if (exc_type) {
          if (likely(__Pyx_PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear();
          else __PYX_ERR(2, 679, __pyx_L1_error)
        }
        break;
      }
      __Pyx_GOTREF(__pyx_t_7);
    }
    __Pyx_XDECREF_SET(__pyx_v_item, __pyx_t_7);
    __pyx_t_7 = 0;
    __Pyx_INCREF(__pyx_t_3);
    __Pyx_XDECREF_SET(__pyx_v_idx, __pyx_t_3);
    __pyx_t_7 = __Pyx_PyInt_AddObjC(__pyx_t_3, __pyx_int_1, 1, 0, 0); if (unlikely(!__pyx_t_7)) __PYX_ERR(2, 679, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_7);
    __Pyx_DECREF(__pyx_t_3);
    __pyx_t_3 = __pyx_t_7;
    __pyx_t_7 = 0;

    /* "View.MemoryView":680
 *     seen_ellipsis = False
 *     for idx, item in enumerate(tup):
 *         if item is Ellipsis:             # <<<<<<<<<<<<<<
 *             if not seen_ellipsis:
 *                 result.extend([slice(None)] * (ndim - len(tup) + 1))
 */
    __pyx_t_2 = (__pyx_v_item == __pyx_builtin_Ellipsis);
    __pyx_t_1 = (__pyx_t_2 != 0);
    if (__pyx_t_1) {

      /* "View.MemoryView":681
 *     for idx, item in enumerate(tup):
 *         if item is Ellipsis:
 *             if not seen_ellipsis:             # <<<<<<<<<<<<<<
 *                 result.extend([slice(None)] * (ndim - len(tup) + 1))
 *                 seen_ellipsis = True
 */
      __pyx_t_1 = ((!(__pyx_v_seen_ellipsis != 0)) != 0);
      if (__pyx_t_1) {

        /* "View.MemoryView":682
 *         if item is Ellipsis:
 *             if not seen_ellipsis:
 *                 result.extend([slice(None)] * (ndim - len(tup) + 1))             # <<<<<<<<<<<<<<
 *                 seen_ellipsis = True
 *             else:
 */
        __pyx_t_8 = PyObject_Length(__pyx_v_tup); if (unlikely(__pyx_t_8 == ((Py_ssize_t)-1))) __PYX_ERR(2, 682, __pyx_L1_error)
        __pyx_t_7 = PyList_New(1 * ((((__pyx_v_ndim - __pyx_t_8) + 1)<0) ? 0:((__pyx_v_ndim - __pyx_t_8) + 1))); if (unlikely(!__pyx_t_7)) __PYX_ERR(2, 682, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_7);
        { Py_ssize_t __pyx_temp;
          for (__pyx_temp=0; __pyx_temp < ((__pyx_v_ndim - __pyx_t_8) + 1); __pyx_temp++) {
            __Pyx_INCREF(__pyx_slice__18);
            __Pyx_GIVEREF(__pyx_slice__18);
            PyList_SET_ITEM(__pyx_t_7, __pyx_temp, __pyx_slice__18);
          }
        }
        __pyx_t_9 = __Pyx_PyList_Extend(__pyx_v_result, __pyx_t_7); if (unlikely(__pyx_t_9 == ((int)-1))) __PYX_ERR(2, 682, __pyx_L1_error)
        __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;

        /* "View.MemoryView":683
 *             if not seen_ellipsis:
 *                 result.extend([slice(None)] * (ndim - len(tup) + 1))
 *                 seen_ellipsis = True             # <<<<<<<<<<<<<<
 *             else:
 *                 result.append(slice(None))
 */
        __pyx_v_seen_ellipsis = 1;

        /* "View.MemoryView":681
 *     for idx, item in enumerate(tup):
 *         if item is Ellipsis:
 *             if not seen_ellipsis:             # <<<<<<<<<<<<<<
 *                 result.extend([slice(None)] * (ndim - len(tup) + 1))
 *                 seen_ellipsis = True
 */
        goto __pyx_L7;
      }

      /* "View.MemoryView":685
 *                 seen_ellipsis = True
 *             else:
 *                 result.append(slice(None))             # <<<<<<<<<<<<<<
 *             have_slices = True
 *         else:
 */
      /*else*/ {
        __pyx_t_9 = __Pyx_PyList_Append(__pyx_v_result, __pyx_slice__18); if (unlikely(__pyx_t_9 == ((int)-1))) __PYX_ERR(2, 685, __pyx_L1_error)
      }
      __pyx_L7:;

      /* "View.MemoryView":686
 *             else:
 *                 result.append(slice(None))
 *             have_slices = True             # <<<<<<<<<<<<<<
 *         else:
 *             if not isinstance(item, slice) and not PyIndex_Check(item):
 */
      __pyx_v_have_slices = 1;

      /* "View.MemoryView":680
 *     seen_ellipsis = False
 *     for idx, item in enumerate(tup):
 *         if item is Ellipsis:             # <<<<<<<<<<<<<<
 *             if not seen_ellipsis:
 *                 result.extend([slice(None)] * (ndim - len(tup) + 1))
 */
      goto __pyx_L6;
    }

    /* "View.MemoryView":688
 *             have_slices = True
 *         else:
 *             if not isinstance(item, slice) and not PyIndex_Check(item):             # <<<<<<<<<<<<<<
 *                 raise TypeError("Cannot index with type '%s'" % type(item))
 * 
 */
    /*else*/ {
      __pyx_t_2 = PySlice_Check(__pyx_v_item); 
      __pyx_t_10 = ((!(__pyx_t_2 != 0)) != 0);
      if (__pyx_t_10) {
      } else {
        __pyx_t_1 = __pyx_t_10;
        goto __pyx_L9_bool_binop_done;
      }
      __pyx_t_10 = ((!(PyIndex_Check(__pyx_v_item) != 0)) != 0);
      __pyx_t_1 = __pyx_t_10;
      __pyx_L9_bool_binop_done:;
      if (unlikely(__pyx_t_1)) {

        /* "View.MemoryView":689
 *         else:
 *             if not isinstance(item, slice) and not PyIndex_Check(item):
 *                 raise TypeError("Cannot index with type '%s'" % type(item))             # <<<<<<<<<<<<<<
 * 
 *             have_slices = have_slices or isinstance(item, slice)
 */
        __pyx_t_7 = __Pyx_PyString_FormatSafe(__pyx_kp_s_Cannot_index_with_type_s, ((PyObject *)Py_TYPE(__pyx_v_item))); if (unlikely(!__pyx_t_7)) __PYX_ERR(2, 689, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_7);
        __pyx_t_11 = __Pyx_PyObject_CallOneArg(__pyx_builtin_TypeError, __pyx_t_7); if (unlikely(!__pyx_t_11)) __PYX_ERR(2, 689, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_11);
        __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
        __Pyx_Raise(__pyx_t_11, 0, 0, 0);
        __Pyx_DECREF(__pyx_t_11); __pyx_t_11 = 0;
        __PYX_ERR(2, 689, __pyx_L1_error)

        /* "View.MemoryView":688
 *             have_slices = True
 *         else:
 *             if not isinstance(item, slice) and not PyIndex_Check(item):             # <<<<<<<<<<<<<<
 *                 raise TypeError("Cannot index with type '%s'" % type(item))
 * 
 */
      }

      /* "View.MemoryView":691
 *                 raise TypeError("Cannot index with type '%s'" % type(item))
 * 
 *             have_slices = have_slices or isinstance(item, slice)             # <<<<<<<<<<<<<<
 *             result.append(item)
 * 
 */
      __pyx_t_10 = (__pyx_v_have_slices != 0);
      if (!__pyx_t_10) {
      } else {
        __pyx_t_1 = __pyx_t_10;
        goto __pyx_L11_bool_binop_done;
      }
      __pyx_t_10 = PySlice_Check(__pyx_v_item); 
      __pyx_t_2 = (__pyx_t_10 != 0);
      __pyx_t_1 = __pyx_t_2;
      __pyx_L11_bool_binop_done:;
      __pyx_v_have_slices = __pyx_t_1;

      /* "View.MemoryView":692
 * 
 *             have_slices = have_slices or isinstance(item, slice)
 *             result.append(item)             # <<<<<<<<<<<<<<
 * 
 *     nslices = ndim - len(result)
 */
      __pyx_t_9 = __Pyx_PyList_Append(__pyx_v_result, __pyx_v_item); if (unlikely(__pyx_t_9 == ((int)-1))) __PYX_ERR(2, 692, __pyx_L1_error)
    }
    __pyx_L6:;

    /* "View.MemoryView":679
 *     have_slices = False
 *     seen_ellipsis = False
 *     for idx, item in enumerate(tup):             # <<<<<<<<<<<<<<
 *         if item is Ellipsis:
 *             if not seen_ellipsis:
 */
  }
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;

  /* "View.MemoryView":694
 *             result.append(item)
 * 
 *     nslices = ndim - len(result)             # <<<<<<<<<<<<<<
 *     if nslices:
 *         result.extend([slice(None)] * nslices)
 */
  __pyx_t_5 = PyList_GET_SIZE(__pyx_v_result); if (unlikely(__pyx_t_5 == ((Py_ssize_t)-1))) __PYX_ERR(2, 694, __pyx_L1_error)
  __pyx_v_nslices = (__pyx_v_ndim - __pyx_t_5);

  /* "View.MemoryView":695
 * 
 *     nslices = ndim - len(result)
 *     if nslices:             # <<<<<<<<<<<<<<
 *         result.extend([slice(None)] * nslices)
 * 
 */
  __pyx_t_1 = (__pyx_v_nslices != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":696
 *     nslices = ndim - len(result)
 *     if nslices:
 *         result.extend([slice(None)] * nslices)             # <<<<<<<<<<<<<<
 * 
 *     return have_slices or nslices, tuple(result)
 */
    __pyx_t_3 = PyList_New(1 * ((__pyx_v_nslices<0) ? 0:__pyx_v_nslices)); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 696, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    { Py_ssize_t __pyx_temp;
      for (__pyx_temp=0; __pyx_temp < __pyx_v_nslices; __pyx_temp++) {
        __Pyx_INCREF(__pyx_slice__18);
        __Pyx_GIVEREF(__pyx_slice__18);
        PyList_SET_ITEM(__pyx_t_3, __pyx_temp, __pyx_slice__18);
      }
    }
    __pyx_t_9 = __Pyx_PyList_Extend(__pyx_v_result, __pyx_t_3); if (unlikely(__pyx_t_9 == ((int)-1))) __PYX_ERR(2, 696, __pyx_L1_error)
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;

    /* "View.MemoryView":695
 * 
 *     nslices = ndim - len(result)
 *     if nslices:             # <<<<<<<<<<<<<<
 *         result.extend([slice(None)] * nslices)
 * 
 */
  }

  /* "View.MemoryView":698
 *         result.extend([slice(None)] * nslices)
 * 
 *     return have_slices or nslices, tuple(result)             # <<<<<<<<<<<<<<
 * 
 * cdef assert_direct_dimensions(Py_ssize_t *suboffsets, int ndim):
 */
  __Pyx_XDECREF(__pyx_r);
  if (!__pyx_v_have_slices) {
  } else {
    __pyx_t_4 = __Pyx_PyBool_FromLong(__pyx_v_have_slices); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 698, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __pyx_t_3 = __pyx_t_4;
    __pyx_t_4 = 0;
    goto __pyx_L14_bool_binop_done;
  }
  __pyx_t_4 = PyInt_FromSsize_t(__pyx_v_nslices); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 698, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __pyx_t_3 = __pyx_t_4;
  __pyx_t_4 = 0;
  __pyx_L14_bool_binop_done:;
  __pyx_t_4 = PyList_AsTuple(__pyx_v_result); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 698, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __pyx_t_11 = PyTuple_New(2); if (unlikely(!__pyx_t_11)) __PYX_ERR(2, 698, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_11);
  __Pyx_GIVEREF(__pyx_t_3);
  PyTuple_SET_ITEM(__pyx_t_11, 0, __pyx_t_3);
  __Pyx_GIVEREF(__pyx_t_4);
  PyTuple_SET_ITEM(__pyx_t_11, 1, __pyx_t_4);
  __pyx_t_3 = 0;
  __pyx_t_4 = 0;
  __pyx_r = ((PyObject*)__pyx_t_11);
  __pyx_t_11 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":666
 *     return isinstance(o, memoryview)
 * 
 * cdef tuple _unellipsify(object index, int ndim):             # <<<<<<<<<<<<<<
 *     """
 *     Replace all ellipses with full slices and fill incomplete indices with
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_7);
  __Pyx_XDECREF(__pyx_t_11);
  __Pyx_AddTraceback("View.MemoryView._unellipsify", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v_tup);
  __Pyx_XDECREF(__pyx_v_result);
  __Pyx_XDECREF(__pyx_v_idx);
  __Pyx_XDECREF(__pyx_v_item);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":700
 *     return have_slices or nslices, tuple(result)
 * 
 * cdef assert_direct_dimensions(Py_ssize_t *suboffsets, int ndim):             # <<<<<<<<<<<<<<
 *     for suboffset in suboffsets[:ndim]:
 *         if suboffset >= 0:
 */

static PyObject *assert_direct_dimensions(Py_ssize_t *__pyx_v_suboffsets, int __pyx_v_ndim) {
  Py_ssize_t __pyx_v_suboffset;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  Py_ssize_t *__pyx_t_1;
  Py_ssize_t *__pyx_t_2;
  Py_ssize_t *__pyx_t_3;
  int __pyx_t_4;
  PyObject *__pyx_t_5 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("assert_direct_dimensions", 0);

  /* "View.MemoryView":701
 * 
 * cdef assert_direct_dimensions(Py_ssize_t *suboffsets, int ndim):
 *     for suboffset in suboffsets[:ndim]:             # <<<<<<<<<<<<<<
 *         if suboffset >= 0:
 *             raise ValueError("Indirect dimensions not supported")
 */
  __pyx_t_2 = (__pyx_v_suboffsets + __pyx_v_ndim);
  for (__pyx_t_3 = __pyx_v_suboffsets; __pyx_t_3 < __pyx_t_2; __pyx_t_3++) {
    __pyx_t_1 = __pyx_t_3;
    __pyx_v_suboffset = (__pyx_t_1[0]);

    /* "View.MemoryView":702
 * cdef assert_direct_dimensions(Py_ssize_t *suboffsets, int ndim):
 *     for suboffset in suboffsets[:ndim]:
 *         if suboffset >= 0:             # <<<<<<<<<<<<<<
 *             raise ValueError("Indirect dimensions not supported")
 * 
 */
    __pyx_t_4 = ((__pyx_v_suboffset >= 0) != 0);
    if (unlikely(__pyx_t_4)) {

      /* "View.MemoryView":703
 *     for suboffset in suboffsets[:ndim]:
 *         if suboffset >= 0:
 *             raise ValueError("Indirect dimensions not supported")             # <<<<<<<<<<<<<<
 * 
 * 
 */
      __pyx_t_5 = __Pyx_PyObject_Call(__pyx_builtin_ValueError, __pyx_tuple__19, NULL); if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 703, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_5);
      __Pyx_Raise(__pyx_t_5, 0, 0, 0);
      __Pyx_DECREF(__pyx_t_5); __pyx_t_5 = 0;
      __PYX_ERR(2, 703, __pyx_L1_error)

      /* "View.MemoryView":702
 * cdef assert_direct_dimensions(Py_ssize_t *suboffsets, int ndim):
 *     for suboffset in suboffsets[:ndim]:
 *         if suboffset >= 0:             # <<<<<<<<<<<<<<
 *             raise ValueError("Indirect dimensions not supported")
 * 
 */
    }
  }

  /* "View.MemoryView":700
 *     return have_slices or nslices, tuple(result)
 * 
 * cdef assert_direct_dimensions(Py_ssize_t *suboffsets, int ndim):             # <<<<<<<<<<<<<<
 *     for suboffset in suboffsets[:ndim]:
 *         if suboffset >= 0:
 */

  /* function exit code */
  __pyx_r = Py_None; __Pyx_INCREF(Py_None);
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_AddTraceback("View.MemoryView.assert_direct_dimensions", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":710
 * 
 * @cname('__pyx_memview_slice')
 * cdef memoryview memview_slice(memoryview memview, object indices):             # <<<<<<<<<<<<<<
 *     cdef int new_ndim = 0, suboffset_dim = -1, dim
 *     cdef bint negative_step
 */

static struct __pyx_memoryview_obj *__pyx_memview_slice(struct __pyx_memoryview_obj *__pyx_v_memview, PyObject *__pyx_v_indices) {
  int __pyx_v_new_ndim;
  int __pyx_v_suboffset_dim;
  int __pyx_v_dim;
  __Pyx_memviewslice __pyx_v_src;
  __Pyx_memviewslice __pyx_v_dst;
  __Pyx_memviewslice *__pyx_v_p_src;
  struct __pyx_memoryviewslice_obj *__pyx_v_memviewsliceobj = 0;
  __Pyx_memviewslice *__pyx_v_p_dst;
  int *__pyx_v_p_suboffset_dim;
  Py_ssize_t __pyx_v_start;
  Py_ssize_t __pyx_v_stop;
  Py_ssize_t __pyx_v_step;
  int __pyx_v_have_start;
  int __pyx_v_have_stop;
  int __pyx_v_have_step;
  PyObject *__pyx_v_index = NULL;
  struct __pyx_memoryview_obj *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  int __pyx_t_2;
  PyObject *__pyx_t_3 = NULL;
  struct __pyx_memoryview_obj *__pyx_t_4;
  char *__pyx_t_5;
  int __pyx_t_6;
  Py_ssize_t __pyx_t_7;
  PyObject *(*__pyx_t_8)(PyObject *);
  PyObject *__pyx_t_9 = NULL;
  Py_ssize_t __pyx_t_10;
  int __pyx_t_11;
  Py_ssize_t __pyx_t_12;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("memview_slice", 0);

  /* "View.MemoryView":711
 * @cname('__pyx_memview_slice')
 * cdef memoryview memview_slice(memoryview memview, object indices):
 *     cdef int new_ndim = 0, suboffset_dim = -1, dim             # <<<<<<<<<<<<<<
 *     cdef bint negative_step
 *     cdef __Pyx_memviewslice src, dst
 */
  __pyx_v_new_ndim = 0;
  __pyx_v_suboffset_dim = -1;

  /* "View.MemoryView":718
 * 
 * 
 *     memset(&dst, 0, sizeof(dst))             # <<<<<<<<<<<<<<
 * 
 *     cdef _memoryviewslice memviewsliceobj
 */
  (void)(memset((&__pyx_v_dst), 0, (sizeof(__pyx_v_dst))));

  /* "View.MemoryView":722
 *     cdef _memoryviewslice memviewsliceobj
 * 
 *     assert memview.view.ndim > 0             # <<<<<<<<<<<<<<
 * 
 *     if isinstance(memview, _memoryviewslice):
 */
  #ifndef CYTHON_WITHOUT_ASSERTIONS
  if (unlikely(!Py_OptimizeFlag)) {
    if (unlikely(!((__pyx_v_memview->view.ndim > 0) != 0))) {
      PyErr_SetNone(PyExc_AssertionError);
      __PYX_ERR(2, 722, __pyx_L1_error)
    }
  }
  #endif

  /* "View.MemoryView":724
 *     assert memview.view.ndim > 0
 * 
 *     if isinstance(memview, _memoryviewslice):             # <<<<<<<<<<<<<<
 *         memviewsliceobj = memview
 *         p_src = &memviewsliceobj.from_slice
 */
  __pyx_t_1 = __Pyx_TypeCheck(((PyObject *)__pyx_v_memview), __pyx_memoryviewslice_type); 
  __pyx_t_2 = (__pyx_t_1 != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":725
 * 
 *     if isinstance(memview, _memoryviewslice):
 *         memviewsliceobj = memview             # <<<<<<<<<<<<<<
 *         p_src = &memviewsliceobj.from_slice
 *     else:
 */
    if (!(likely(((((PyObject *)__pyx_v_memview)) == Py_None) || likely(__Pyx_TypeTest(((PyObject *)__pyx_v_memview), __pyx_memoryviewslice_type))))) __PYX_ERR(2, 725, __pyx_L1_error)
    __pyx_t_3 = ((PyObject *)__pyx_v_memview);
    __Pyx_INCREF(__pyx_t_3);
    __pyx_v_memviewsliceobj = ((struct __pyx_memoryviewslice_obj *)__pyx_t_3);
    __pyx_t_3 = 0;

    /* "View.MemoryView":726
 *     if isinstance(memview, _memoryviewslice):
 *         memviewsliceobj = memview
 *         p_src = &memviewsliceobj.from_slice             # <<<<<<<<<<<<<<
 *     else:
 *         slice_copy(memview, &src)
 */
    __pyx_v_p_src = (&__pyx_v_memviewsliceobj->from_slice);

    /* "View.MemoryView":724
 *     assert memview.view.ndim > 0
 * 
 *     if isinstance(memview, _memoryviewslice):             # <<<<<<<<<<<<<<
 *         memviewsliceobj = memview
 *         p_src = &memviewsliceobj.from_slice
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":728
 *         p_src = &memviewsliceobj.from_slice
 *     else:
 *         slice_copy(memview, &src)             # <<<<<<<<<<<<<<
 *         p_src = &src
 * 
 */
  /*else*/ {
    __pyx_memoryview_slice_copy(__pyx_v_memview, (&__pyx_v_src));

    /* "View.MemoryView":729
 *     else:
 *         slice_copy(memview, &src)
 *         p_src = &src             # <<<<<<<<<<<<<<
 * 
 * 
 */
    __pyx_v_p_src = (&__pyx_v_src);
  }
  __pyx_L3:;

  /* "View.MemoryView":735
 * 
 * 
 *     dst.memview = p_src.memview             # <<<<<<<<<<<<<<
 *     dst.data = p_src.data
 * 
 */
  __pyx_t_4 = __pyx_v_p_src->memview;
  __pyx_v_dst.memview = __pyx_t_4;

  /* "View.MemoryView":736
 * 
 *     dst.memview = p_src.memview
 *     dst.data = p_src.data             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_t_5 = __pyx_v_p_src->data;
  __pyx_v_dst.data = __pyx_t_5;

  /* "View.MemoryView":741
 * 
 * 
 *     cdef __Pyx_memviewslice *p_dst = &dst             # <<<<<<<<<<<<<<
 *     cdef int *p_suboffset_dim = &suboffset_dim
 *     cdef Py_ssize_t start, stop, step
 */
  __pyx_v_p_dst = (&__pyx_v_dst);

  /* "View.MemoryView":742
 * 
 *     cdef __Pyx_memviewslice *p_dst = &dst
 *     cdef int *p_suboffset_dim = &suboffset_dim             # <<<<<<<<<<<<<<
 *     cdef Py_ssize_t start, stop, step
 *     cdef bint have_start, have_stop, have_step
 */
  __pyx_v_p_suboffset_dim = (&__pyx_v_suboffset_dim);

  /* "View.MemoryView":746
 *     cdef bint have_start, have_stop, have_step
 * 
 *     for dim, index in enumerate(indices):             # <<<<<<<<<<<<<<
 *         if PyIndex_Check(index):
 *             slice_memviewslice(
 */
  __pyx_t_6 = 0;
  if (likely(PyList_CheckExact(__pyx_v_indices)) || PyTuple_CheckExact(__pyx_v_indices)) {
    __pyx_t_3 = __pyx_v_indices; __Pyx_INCREF(__pyx_t_3); __pyx_t_7 = 0;
    __pyx_t_8 = NULL;
  } else {
    __pyx_t_7 = -1; __pyx_t_3 = PyObject_GetIter(__pyx_v_indices); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 746, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_t_8 = Py_TYPE(__pyx_t_3)->tp_iternext; if (unlikely(!__pyx_t_8)) __PYX_ERR(2, 746, __pyx_L1_error)
  }
  for (;;) {
    if (likely(!__pyx_t_8)) {
      if (likely(PyList_CheckExact(__pyx_t_3))) {
        if (__pyx_t_7 >= PyList_GET_SIZE(__pyx_t_3)) break;
        #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS
        __pyx_t_9 = PyList_GET_ITEM(__pyx_t_3, __pyx_t_7); __Pyx_INCREF(__pyx_t_9); __pyx_t_7++; if (unlikely(0 < 0)) __PYX_ERR(2, 746, __pyx_L1_error)
        #else
        __pyx_t_9 = PySequence_ITEM(__pyx_t_3, __pyx_t_7); __pyx_t_7++; if (unlikely(!__pyx_t_9)) __PYX_ERR(2, 746, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_9);
        #endif
      } else {
        if (__pyx_t_7 >= PyTuple_GET_SIZE(__pyx_t_3)) break;
        #if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS
        __pyx_t_9 = PyTuple_GET_ITEM(__pyx_t_3, __pyx_t_7); __Pyx_INCREF(__pyx_t_9); __pyx_t_7++; if (unlikely(0 < 0)) __PYX_ERR(2, 746, __pyx_L1_error)
        #else
        __pyx_t_9 = PySequence_ITEM(__pyx_t_3, __pyx_t_7); __pyx_t_7++; if (unlikely(!__pyx_t_9)) __PYX_ERR(2, 746, __pyx_L1_error)
        __Pyx_GOTREF(__pyx_t_9);
        #endif
      }
    } else {
      __pyx_t_9 = __pyx_t_8(__pyx_t_3);
      if (unlikely(!__pyx_t_9)) {
        PyObject* exc_type = PyErr_Occurred();
        if (exc_type) {
          if (likely(__Pyx_PyErr_GivenExceptionMatches(exc_type, PyExc_StopIteration))) PyErr_Clear();
          else __PYX_ERR(2, 746, __pyx_L1_error)
        }
        break;
      }
      __Pyx_GOTREF(__pyx_t_9);
    }
    __Pyx_XDECREF_SET(__pyx_v_index, __pyx_t_9);
    __pyx_t_9 = 0;
    __pyx_v_dim = __pyx_t_6;
    __pyx_t_6 = (__pyx_t_6 + 1);

    /* "View.MemoryView":747
 * 
 *     for dim, index in enumerate(indices):
 *         if PyIndex_Check(index):             # <<<<<<<<<<<<<<
 *             slice_memviewslice(
 *                 p_dst, p_src.shape[dim], p_src.strides[dim], p_src.suboffsets[dim],
 */
    __pyx_t_2 = (PyIndex_Check(__pyx_v_index) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":751
 *                 p_dst, p_src.shape[dim], p_src.strides[dim], p_src.suboffsets[dim],
 *                 dim, new_ndim, p_suboffset_dim,
 *                 index, 0, 0, # start, stop, step             # <<<<<<<<<<<<<<
 *                 0, 0, 0, # have_{start,stop,step}
 *                 False)
 */
      __pyx_t_10 = __Pyx_PyIndex_AsSsize_t(__pyx_v_index); if (unlikely((__pyx_t_10 == (Py_ssize_t)-1) && PyErr_Occurred())) __PYX_ERR(2, 751, __pyx_L1_error)

      /* "View.MemoryView":748
 *     for dim, index in enumerate(indices):
 *         if PyIndex_Check(index):
 *             slice_memviewslice(             # <<<<<<<<<<<<<<
 *                 p_dst, p_src.shape[dim], p_src.strides[dim], p_src.suboffsets[dim],
 *                 dim, new_ndim, p_suboffset_dim,
 */
      __pyx_t_11 = __pyx_memoryview_slice_memviewslice(__pyx_v_p_dst, (__pyx_v_p_src->shape[__pyx_v_dim]), (__pyx_v_p_src->strides[__pyx_v_dim]), (__pyx_v_p_src->suboffsets[__pyx_v_dim]), __pyx_v_dim, __pyx_v_new_ndim, __pyx_v_p_suboffset_dim, __pyx_t_10, 0, 0, 0, 0, 0, 0); if (unlikely(__pyx_t_11 == ((int)-1))) __PYX_ERR(2, 748, __pyx_L1_error)

      /* "View.MemoryView":747
 * 
 *     for dim, index in enumerate(indices):
 *         if PyIndex_Check(index):             # <<<<<<<<<<<<<<
 *             slice_memviewslice(
 *                 p_dst, p_src.shape[dim], p_src.strides[dim], p_src.suboffsets[dim],
 */
      goto __pyx_L6;
    }

    /* "View.MemoryView":754
 *                 0, 0, 0, # have_{start,stop,step}
 *                 False)
 *         elif index is None:             # <<<<<<<<<<<<<<
 *             p_dst.shape[new_ndim] = 1
 *             p_dst.strides[new_ndim] = 0
 */
    __pyx_t_2 = (__pyx_v_index == Py_None);
    __pyx_t_1 = (__pyx_t_2 != 0);
    if (__pyx_t_1) {

      /* "View.MemoryView":755
 *                 False)
 *         elif index is None:
 *             p_dst.shape[new_ndim] = 1             # <<<<<<<<<<<<<<
 *             p_dst.strides[new_ndim] = 0
 *             p_dst.suboffsets[new_ndim] = -1
 */
      (__pyx_v_p_dst->shape[__pyx_v_new_ndim]) = 1;

      /* "View.MemoryView":756
 *         elif index is None:
 *             p_dst.shape[new_ndim] = 1
 *             p_dst.strides[new_ndim] = 0             # <<<<<<<<<<<<<<
 *             p_dst.suboffsets[new_ndim] = -1
 *             new_ndim += 1
 */
      (__pyx_v_p_dst->strides[__pyx_v_new_ndim]) = 0;

      /* "View.MemoryView":757
 *             p_dst.shape[new_ndim] = 1
 *             p_dst.strides[new_ndim] = 0
 *             p_dst.suboffsets[new_ndim] = -1             # <<<<<<<<<<<<<<
 *             new_ndim += 1
 *         else:
 */
      (__pyx_v_p_dst->suboffsets[__pyx_v_new_ndim]) = -1L;

      /* "View.MemoryView":758
 *             p_dst.strides[new_ndim] = 0
 *             p_dst.suboffsets[new_ndim] = -1
 *             new_ndim += 1             # <<<<<<<<<<<<<<
 *         else:
 *             start = index.start or 0
 */
      __pyx_v_new_ndim = (__pyx_v_new_ndim + 1);

      /* "View.MemoryView":754
 *                 0, 0, 0, # have_{start,stop,step}
 *                 False)
 *         elif index is None:             # <<<<<<<<<<<<<<
 *             p_dst.shape[new_ndim] = 1
 *             p_dst.strides[new_ndim] = 0
 */
      goto __pyx_L6;
    }

    /* "View.MemoryView":760
 *             new_ndim += 1
 *         else:
 *             start = index.start or 0             # <<<<<<<<<<<<<<
 *             stop = index.stop or 0
 *             step = index.step or 0
 */
    /*else*/ {
      __pyx_t_9 = __Pyx_PyObject_GetAttrStr(__pyx_v_index, __pyx_n_s_start); if (unlikely(!__pyx_t_9)) __PYX_ERR(2, 760, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_9);
      __pyx_t_1 = __Pyx_PyObject_IsTrue(__pyx_t_9); if (unlikely(__pyx_t_1 < 0)) __PYX_ERR(2, 760, __pyx_L1_error)
      if (!__pyx_t_1) {
        __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
      } else {
        __pyx_t_12 = __Pyx_PyIndex_AsSsize_t(__pyx_t_9); if (unlikely((__pyx_t_12 == (Py_ssize_t)-1) && PyErr_Occurred())) __PYX_ERR(2, 760, __pyx_L1_error)
        __pyx_t_10 = __pyx_t_12;
        __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
        goto __pyx_L7_bool_binop_done;
      }
      __pyx_t_10 = 0;
      __pyx_L7_bool_binop_done:;
      __pyx_v_start = __pyx_t_10;

      /* "View.MemoryView":761
 *         else:
 *             start = index.start or 0
 *             stop = index.stop or 0             # <<<<<<<<<<<<<<
 *             step = index.step or 0
 * 
 */
      __pyx_t_9 = __Pyx_PyObject_GetAttrStr(__pyx_v_index, __pyx_n_s_stop); if (unlikely(!__pyx_t_9)) __PYX_ERR(2, 761, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_9);
      __pyx_t_1 = __Pyx_PyObject_IsTrue(__pyx_t_9); if (unlikely(__pyx_t_1 < 0)) __PYX_ERR(2, 761, __pyx_L1_error)
      if (!__pyx_t_1) {
        __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
      } else {
        __pyx_t_12 = __Pyx_PyIndex_AsSsize_t(__pyx_t_9); if (unlikely((__pyx_t_12 == (Py_ssize_t)-1) && PyErr_Occurred())) __PYX_ERR(2, 761, __pyx_L1_error)
        __pyx_t_10 = __pyx_t_12;
        __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
        goto __pyx_L9_bool_binop_done;
      }
      __pyx_t_10 = 0;
      __pyx_L9_bool_binop_done:;
      __pyx_v_stop = __pyx_t_10;

      /* "View.MemoryView":762
 *             start = index.start or 0
 *             stop = index.stop or 0
 *             step = index.step or 0             # <<<<<<<<<<<<<<
 * 
 *             have_start = index.start is not None
 */
      __pyx_t_9 = __Pyx_PyObject_GetAttrStr(__pyx_v_index, __pyx_n_s_step); if (unlikely(!__pyx_t_9)) __PYX_ERR(2, 762, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_9);
      __pyx_t_1 = __Pyx_PyObject_IsTrue(__pyx_t_9); if (unlikely(__pyx_t_1 < 0)) __PYX_ERR(2, 762, __pyx_L1_error)
      if (!__pyx_t_1) {
        __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
      } else {
        __pyx_t_12 = __Pyx_PyIndex_AsSsize_t(__pyx_t_9); if (unlikely((__pyx_t_12 == (Py_ssize_t)-1) && PyErr_Occurred())) __PYX_ERR(2, 762, __pyx_L1_error)
        __pyx_t_10 = __pyx_t_12;
        __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
        goto __pyx_L11_bool_binop_done;
      }
      __pyx_t_10 = 0;
      __pyx_L11_bool_binop_done:;
      __pyx_v_step = __pyx_t_10;

      /* "View.MemoryView":764
 *             step = index.step or 0
 * 
 *             have_start = index.start is not None             # <<<<<<<<<<<<<<
 *             have_stop = index.stop is not None
 *             have_step = index.step is not None
 */
      __pyx_t_9 = __Pyx_PyObject_GetAttrStr(__pyx_v_index, __pyx_n_s_start); if (unlikely(!__pyx_t_9)) __PYX_ERR(2, 764, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_9);
      __pyx_t_1 = (__pyx_t_9 != Py_None);
      __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
      __pyx_v_have_start = __pyx_t_1;

      /* "View.MemoryView":765
 * 
 *             have_start = index.start is not None
 *             have_stop = index.stop is not None             # <<<<<<<<<<<<<<
 *             have_step = index.step is not None
 * 
 */
      __pyx_t_9 = __Pyx_PyObject_GetAttrStr(__pyx_v_index, __pyx_n_s_stop); if (unlikely(!__pyx_t_9)) __PYX_ERR(2, 765, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_9);
      __pyx_t_1 = (__pyx_t_9 != Py_None);
      __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
      __pyx_v_have_stop = __pyx_t_1;

      /* "View.MemoryView":766
 *             have_start = index.start is not None
 *             have_stop = index.stop is not None
 *             have_step = index.step is not None             # <<<<<<<<<<<<<<
 * 
 *             slice_memviewslice(
 */
      __pyx_t_9 = __Pyx_PyObject_GetAttrStr(__pyx_v_index, __pyx_n_s_step); if (unlikely(!__pyx_t_9)) __PYX_ERR(2, 766, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_9);
      __pyx_t_1 = (__pyx_t_9 != Py_None);
      __Pyx_DECREF(__pyx_t_9); __pyx_t_9 = 0;
      __pyx_v_have_step = __pyx_t_1;

      /* "View.MemoryView":768
 *             have_step = index.step is not None
 * 
 *             slice_memviewslice(             # <<<<<<<<<<<<<<
 *                 p_dst, p_src.shape[dim], p_src.strides[dim], p_src.suboffsets[dim],
 *                 dim, new_ndim, p_suboffset_dim,
 */
      __pyx_t_11 = __pyx_memoryview_slice_memviewslice(__pyx_v_p_dst, (__pyx_v_p_src->shape[__pyx_v_dim]), (__pyx_v_p_src->strides[__pyx_v_dim]), (__pyx_v_p_src->suboffsets[__pyx_v_dim]), __pyx_v_dim, __pyx_v_new_ndim, __pyx_v_p_suboffset_dim, __pyx_v_start, __pyx_v_stop, __pyx_v_step, __pyx_v_have_start, __pyx_v_have_stop, __pyx_v_have_step, 1); if (unlikely(__pyx_t_11 == ((int)-1))) __PYX_ERR(2, 768, __pyx_L1_error)

      /* "View.MemoryView":774
 *                 have_start, have_stop, have_step,
 *                 True)
 *             new_ndim += 1             # <<<<<<<<<<<<<<
 * 
 *     if isinstance(memview, _memoryviewslice):
 */
      __pyx_v_new_ndim = (__pyx_v_new_ndim + 1);
    }
    __pyx_L6:;

    /* "View.MemoryView":746
 *     cdef bint have_start, have_stop, have_step
 * 
 *     for dim, index in enumerate(indices):             # <<<<<<<<<<<<<<
 *         if PyIndex_Check(index):
 *             slice_memviewslice(
 */
  }
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;

  /* "View.MemoryView":776
 *             new_ndim += 1
 * 
 *     if isinstance(memview, _memoryviewslice):             # <<<<<<<<<<<<<<
 *         return memoryview_fromslice(dst, new_ndim,
 *                                     memviewsliceobj.to_object_func,
 */
  __pyx_t_1 = __Pyx_TypeCheck(((PyObject *)__pyx_v_memview), __pyx_memoryviewslice_type); 
  __pyx_t_2 = (__pyx_t_1 != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":777
 * 
 *     if isinstance(memview, _memoryviewslice):
 *         return memoryview_fromslice(dst, new_ndim,             # <<<<<<<<<<<<<<
 *                                     memviewsliceobj.to_object_func,
 *                                     memviewsliceobj.to_dtype_func,
 */
    __Pyx_XDECREF(((PyObject *)__pyx_r));

    /* "View.MemoryView":778
 *     if isinstance(memview, _memoryviewslice):
 *         return memoryview_fromslice(dst, new_ndim,
 *                                     memviewsliceobj.to_object_func,             # <<<<<<<<<<<<<<
 *                                     memviewsliceobj.to_dtype_func,
 *                                     memview.dtype_is_object)
 */
    if (unlikely(!__pyx_v_memviewsliceobj)) { __Pyx_RaiseUnboundLocalError("memviewsliceobj"); __PYX_ERR(2, 778, __pyx_L1_error) }

    /* "View.MemoryView":779
 *         return memoryview_fromslice(dst, new_ndim,
 *                                     memviewsliceobj.to_object_func,
 *                                     memviewsliceobj.to_dtype_func,             # <<<<<<<<<<<<<<
 *                                     memview.dtype_is_object)
 *     else:
 */
    if (unlikely(!__pyx_v_memviewsliceobj)) { __Pyx_RaiseUnboundLocalError("memviewsliceobj"); __PYX_ERR(2, 779, __pyx_L1_error) }

    /* "View.MemoryView":777
 * 
 *     if isinstance(memview, _memoryviewslice):
 *         return memoryview_fromslice(dst, new_ndim,             # <<<<<<<<<<<<<<
 *                                     memviewsliceobj.to_object_func,
 *                                     memviewsliceobj.to_dtype_func,
 */
    __pyx_t_3 = __pyx_memoryview_fromslice(__pyx_v_dst, __pyx_v_new_ndim, __pyx_v_memviewsliceobj->to_object_func, __pyx_v_memviewsliceobj->to_dtype_func, __pyx_v_memview->dtype_is_object); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 777, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    if (!(likely(((__pyx_t_3) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_3, __pyx_memoryview_type))))) __PYX_ERR(2, 777, __pyx_L1_error)
    __pyx_r = ((struct __pyx_memoryview_obj *)__pyx_t_3);
    __pyx_t_3 = 0;
    goto __pyx_L0;

    /* "View.MemoryView":776
 *             new_ndim += 1
 * 
 *     if isinstance(memview, _memoryviewslice):             # <<<<<<<<<<<<<<
 *         return memoryview_fromslice(dst, new_ndim,
 *                                     memviewsliceobj.to_object_func,
 */
  }

  /* "View.MemoryView":782
 *                                     memview.dtype_is_object)
 *     else:
 *         return memoryview_fromslice(dst, new_ndim, NULL, NULL,             # <<<<<<<<<<<<<<
 *                                     memview.dtype_is_object)
 * 
 */
  /*else*/ {
    __Pyx_XDECREF(((PyObject *)__pyx_r));

    /* "View.MemoryView":783
 *     else:
 *         return memoryview_fromslice(dst, new_ndim, NULL, NULL,
 *                                     memview.dtype_is_object)             # <<<<<<<<<<<<<<
 * 
 * 
 */
    __pyx_t_3 = __pyx_memoryview_fromslice(__pyx_v_dst, __pyx_v_new_ndim, NULL, NULL, __pyx_v_memview->dtype_is_object); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 782, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);

    /* "View.MemoryView":782
 *                                     memview.dtype_is_object)
 *     else:
 *         return memoryview_fromslice(dst, new_ndim, NULL, NULL,             # <<<<<<<<<<<<<<
 *                                     memview.dtype_is_object)
 * 
 */
    if (!(likely(((__pyx_t_3) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_3, __pyx_memoryview_type))))) __PYX_ERR(2, 782, __pyx_L1_error)
    __pyx_r = ((struct __pyx_memoryview_obj *)__pyx_t_3);
    __pyx_t_3 = 0;
    goto __pyx_L0;
  }

  /* "View.MemoryView":710
 * 
 * @cname('__pyx_memview_slice')
 * cdef memoryview memview_slice(memoryview memview, object indices):             # <<<<<<<<<<<<<<
 *     cdef int new_ndim = 0, suboffset_dim = -1, dim
 *     cdef bint negative_step
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_9);
  __Pyx_AddTraceback("View.MemoryView.memview_slice", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XDECREF((PyObject *)__pyx_v_memviewsliceobj);
  __Pyx_XDECREF(__pyx_v_index);
  __Pyx_XGIVEREF((PyObject *)__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":807
 * 
 * @cname('__pyx_memoryview_slice_memviewslice')
 * cdef int slice_memviewslice(             # <<<<<<<<<<<<<<
 *         __Pyx_memviewslice *dst,
 *         Py_ssize_t shape, Py_ssize_t stride, Py_ssize_t suboffset,
 */

static int __pyx_memoryview_slice_memviewslice(__Pyx_memviewslice *__pyx_v_dst, Py_ssize_t __pyx_v_shape, Py_ssize_t __pyx_v_stride, Py_ssize_t __pyx_v_suboffset, int __pyx_v_dim, int __pyx_v_new_ndim, int *__pyx_v_suboffset_dim, Py_ssize_t __pyx_v_start, Py_ssize_t __pyx_v_stop, Py_ssize_t __pyx_v_step, int __pyx_v_have_start, int __pyx_v_have_stop, int __pyx_v_have_step, int __pyx_v_is_slice) {
  Py_ssize_t __pyx_v_new_shape;
  int __pyx_v_negative_step;
  int __pyx_r;
  int __pyx_t_1;
  int __pyx_t_2;
  int __pyx_t_3;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;

  /* "View.MemoryView":827
 *     cdef bint negative_step
 * 
 *     if not is_slice:             # <<<<<<<<<<<<<<
 * 
 *         if start < 0:
 */
  __pyx_t_1 = ((!(__pyx_v_is_slice != 0)) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":829
 *     if not is_slice:
 * 
 *         if start < 0:             # <<<<<<<<<<<<<<
 *             start += shape
 *         if not 0 <= start < shape:
 */
    __pyx_t_1 = ((__pyx_v_start < 0) != 0);
    if (__pyx_t_1) {

      /* "View.MemoryView":830
 * 
 *         if start < 0:
 *             start += shape             # <<<<<<<<<<<<<<
 *         if not 0 <= start < shape:
 *             _err_dim(IndexError, "Index out of bounds (axis %d)", dim)
 */
      __pyx_v_start = (__pyx_v_start + __pyx_v_shape);

      /* "View.MemoryView":829
 *     if not is_slice:
 * 
 *         if start < 0:             # <<<<<<<<<<<<<<
 *             start += shape
 *         if not 0 <= start < shape:
 */
    }

    /* "View.MemoryView":831
 *         if start < 0:
 *             start += shape
 *         if not 0 <= start < shape:             # <<<<<<<<<<<<<<
 *             _err_dim(IndexError, "Index out of bounds (axis %d)", dim)
 *     else:
 */
    __pyx_t_1 = (0 <= __pyx_v_start);
    if (__pyx_t_1) {
      __pyx_t_1 = (__pyx_v_start < __pyx_v_shape);
    }
    __pyx_t_2 = ((!(__pyx_t_1 != 0)) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":832
 *             start += shape
 *         if not 0 <= start < shape:
 *             _err_dim(IndexError, "Index out of bounds (axis %d)", dim)             # <<<<<<<<<<<<<<
 *     else:
 * 
 */
      __pyx_t_3 = __pyx_memoryview_err_dim(__pyx_builtin_IndexError, ((char *)"Index out of bounds (axis %d)"), __pyx_v_dim); if (unlikely(__pyx_t_3 == ((int)-1))) __PYX_ERR(2, 832, __pyx_L1_error)

      /* "View.MemoryView":831
 *         if start < 0:
 *             start += shape
 *         if not 0 <= start < shape:             # <<<<<<<<<<<<<<
 *             _err_dim(IndexError, "Index out of bounds (axis %d)", dim)
 *     else:
 */
    }

    /* "View.MemoryView":827
 *     cdef bint negative_step
 * 
 *     if not is_slice:             # <<<<<<<<<<<<<<
 * 
 *         if start < 0:
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":835
 *     else:
 * 
 *         negative_step = have_step != 0 and step < 0             # <<<<<<<<<<<<<<
 * 
 *         if have_step and step == 0:
 */
  /*else*/ {
    __pyx_t_1 = ((__pyx_v_have_step != 0) != 0);
    if (__pyx_t_1) {
    } else {
      __pyx_t_2 = __pyx_t_1;
      goto __pyx_L6_bool_binop_done;
    }
    __pyx_t_1 = ((__pyx_v_step < 0) != 0);
    __pyx_t_2 = __pyx_t_1;
    __pyx_L6_bool_binop_done:;
    __pyx_v_negative_step = __pyx_t_2;

    /* "View.MemoryView":837
 *         negative_step = have_step != 0 and step < 0
 * 
 *         if have_step and step == 0:             # <<<<<<<<<<<<<<
 *             _err_dim(ValueError, "Step may not be zero (axis %d)", dim)
 * 
 */
    __pyx_t_1 = (__pyx_v_have_step != 0);
    if (__pyx_t_1) {
    } else {
      __pyx_t_2 = __pyx_t_1;
      goto __pyx_L9_bool_binop_done;
    }
    __pyx_t_1 = ((__pyx_v_step == 0) != 0);
    __pyx_t_2 = __pyx_t_1;
    __pyx_L9_bool_binop_done:;
    if (__pyx_t_2) {

      /* "View.MemoryView":838
 * 
 *         if have_step and step == 0:
 *             _err_dim(ValueError, "Step may not be zero (axis %d)", dim)             # <<<<<<<<<<<<<<
 * 
 * 
 */
      __pyx_t_3 = __pyx_memoryview_err_dim(__pyx_builtin_ValueError, ((char *)"Step may not be zero (axis %d)"), __pyx_v_dim); if (unlikely(__pyx_t_3 == ((int)-1))) __PYX_ERR(2, 838, __pyx_L1_error)

      /* "View.MemoryView":837
 *         negative_step = have_step != 0 and step < 0
 * 
 *         if have_step and step == 0:             # <<<<<<<<<<<<<<
 *             _err_dim(ValueError, "Step may not be zero (axis %d)", dim)
 * 
 */
    }

    /* "View.MemoryView":841
 * 
 * 
 *         if have_start:             # <<<<<<<<<<<<<<
 *             if start < 0:
 *                 start += shape
 */
    __pyx_t_2 = (__pyx_v_have_start != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":842
 * 
 *         if have_start:
 *             if start < 0:             # <<<<<<<<<<<<<<
 *                 start += shape
 *                 if start < 0:
 */
      __pyx_t_2 = ((__pyx_v_start < 0) != 0);
      if (__pyx_t_2) {

        /* "View.MemoryView":843
 *         if have_start:
 *             if start < 0:
 *                 start += shape             # <<<<<<<<<<<<<<
 *                 if start < 0:
 *                     start = 0
 */
        __pyx_v_start = (__pyx_v_start + __pyx_v_shape);

        /* "View.MemoryView":844
 *             if start < 0:
 *                 start += shape
 *                 if start < 0:             # <<<<<<<<<<<<<<
 *                     start = 0
 *             elif start >= shape:
 */
        __pyx_t_2 = ((__pyx_v_start < 0) != 0);
        if (__pyx_t_2) {

          /* "View.MemoryView":845
 *                 start += shape
 *                 if start < 0:
 *                     start = 0             # <<<<<<<<<<<<<<
 *             elif start >= shape:
 *                 if negative_step:
 */
          __pyx_v_start = 0;

          /* "View.MemoryView":844
 *             if start < 0:
 *                 start += shape
 *                 if start < 0:             # <<<<<<<<<<<<<<
 *                     start = 0
 *             elif start >= shape:
 */
        }

        /* "View.MemoryView":842
 * 
 *         if have_start:
 *             if start < 0:             # <<<<<<<<<<<<<<
 *                 start += shape
 *                 if start < 0:
 */
        goto __pyx_L12;
      }

      /* "View.MemoryView":846
 *                 if start < 0:
 *                     start = 0
 *             elif start >= shape:             # <<<<<<<<<<<<<<
 *                 if negative_step:
 *                     start = shape - 1
 */
      __pyx_t_2 = ((__pyx_v_start >= __pyx_v_shape) != 0);
      if (__pyx_t_2) {

        /* "View.MemoryView":847
 *                     start = 0
 *             elif start >= shape:
 *                 if negative_step:             # <<<<<<<<<<<<<<
 *                     start = shape - 1
 *                 else:
 */
        __pyx_t_2 = (__pyx_v_negative_step != 0);
        if (__pyx_t_2) {

          /* "View.MemoryView":848
 *             elif start >= shape:
 *                 if negative_step:
 *                     start = shape - 1             # <<<<<<<<<<<<<<
 *                 else:
 *                     start = shape
 */
          __pyx_v_start = (__pyx_v_shape - 1);

          /* "View.MemoryView":847
 *                     start = 0
 *             elif start >= shape:
 *                 if negative_step:             # <<<<<<<<<<<<<<
 *                     start = shape - 1
 *                 else:
 */
          goto __pyx_L14;
        }

        /* "View.MemoryView":850
 *                     start = shape - 1
 *                 else:
 *                     start = shape             # <<<<<<<<<<<<<<
 *         else:
 *             if negative_step:
 */
        /*else*/ {
          __pyx_v_start = __pyx_v_shape;
        }
        __pyx_L14:;

        /* "View.MemoryView":846
 *                 if start < 0:
 *                     start = 0
 *             elif start >= shape:             # <<<<<<<<<<<<<<
 *                 if negative_step:
 *                     start = shape - 1
 */
      }
      __pyx_L12:;

      /* "View.MemoryView":841
 * 
 * 
 *         if have_start:             # <<<<<<<<<<<<<<
 *             if start < 0:
 *                 start += shape
 */
      goto __pyx_L11;
    }

    /* "View.MemoryView":852
 *                     start = shape
 *         else:
 *             if negative_step:             # <<<<<<<<<<<<<<
 *                 start = shape - 1
 *             else:
 */
    /*else*/ {
      __pyx_t_2 = (__pyx_v_negative_step != 0);
      if (__pyx_t_2) {

        /* "View.MemoryView":853
 *         else:
 *             if negative_step:
 *                 start = shape - 1             # <<<<<<<<<<<<<<
 *             else:
 *                 start = 0
 */
        __pyx_v_start = (__pyx_v_shape - 1);

        /* "View.MemoryView":852
 *                     start = shape
 *         else:
 *             if negative_step:             # <<<<<<<<<<<<<<
 *                 start = shape - 1
 *             else:
 */
        goto __pyx_L15;
      }

      /* "View.MemoryView":855
 *                 start = shape - 1
 *             else:
 *                 start = 0             # <<<<<<<<<<<<<<
 * 
 *         if have_stop:
 */
      /*else*/ {
        __pyx_v_start = 0;
      }
      __pyx_L15:;
    }
    __pyx_L11:;

    /* "View.MemoryView":857
 *                 start = 0
 * 
 *         if have_stop:             # <<<<<<<<<<<<<<
 *             if stop < 0:
 *                 stop += shape
 */
    __pyx_t_2 = (__pyx_v_have_stop != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":858
 * 
 *         if have_stop:
 *             if stop < 0:             # <<<<<<<<<<<<<<
 *                 stop += shape
 *                 if stop < 0:
 */
      __pyx_t_2 = ((__pyx_v_stop < 0) != 0);
      if (__pyx_t_2) {

        /* "View.MemoryView":859
 *         if have_stop:
 *             if stop < 0:
 *                 stop += shape             # <<<<<<<<<<<<<<
 *                 if stop < 0:
 *                     stop = 0
 */
        __pyx_v_stop = (__pyx_v_stop + __pyx_v_shape);

        /* "View.MemoryView":860
 *             if stop < 0:
 *                 stop += shape
 *                 if stop < 0:             # <<<<<<<<<<<<<<
 *                     stop = 0
 *             elif stop > shape:
 */
        __pyx_t_2 = ((__pyx_v_stop < 0) != 0);
        if (__pyx_t_2) {

          /* "View.MemoryView":861
 *                 stop += shape
 *                 if stop < 0:
 *                     stop = 0             # <<<<<<<<<<<<<<
 *             elif stop > shape:
 *                 stop = shape
 */
          __pyx_v_stop = 0;

          /* "View.MemoryView":860
 *             if stop < 0:
 *                 stop += shape
 *                 if stop < 0:             # <<<<<<<<<<<<<<
 *                     stop = 0
 *             elif stop > shape:
 */
        }

        /* "View.MemoryView":858
 * 
 *         if have_stop:
 *             if stop < 0:             # <<<<<<<<<<<<<<
 *                 stop += shape
 *                 if stop < 0:
 */
        goto __pyx_L17;
      }

      /* "View.MemoryView":862
 *                 if stop < 0:
 *                     stop = 0
 *             elif stop > shape:             # <<<<<<<<<<<<<<
 *                 stop = shape
 *         else:
 */
      __pyx_t_2 = ((__pyx_v_stop > __pyx_v_shape) != 0);
      if (__pyx_t_2) {

        /* "View.MemoryView":863
 *                     stop = 0
 *             elif stop > shape:
 *                 stop = shape             # <<<<<<<<<<<<<<
 *         else:
 *             if negative_step:
 */
        __pyx_v_stop = __pyx_v_shape;

        /* "View.MemoryView":862
 *                 if stop < 0:
 *                     stop = 0
 *             elif stop > shape:             # <<<<<<<<<<<<<<
 *                 stop = shape
 *         else:
 */
      }
      __pyx_L17:;

      /* "View.MemoryView":857
 *                 start = 0
 * 
 *         if have_stop:             # <<<<<<<<<<<<<<
 *             if stop < 0:
 *                 stop += shape
 */
      goto __pyx_L16;
    }

    /* "View.MemoryView":865
 *                 stop = shape
 *         else:
 *             if negative_step:             # <<<<<<<<<<<<<<
 *                 stop = -1
 *             else:
 */
    /*else*/ {
      __pyx_t_2 = (__pyx_v_negative_step != 0);
      if (__pyx_t_2) {

        /* "View.MemoryView":866
 *         else:
 *             if negative_step:
 *                 stop = -1             # <<<<<<<<<<<<<<
 *             else:
 *                 stop = shape
 */
        __pyx_v_stop = -1L;

        /* "View.MemoryView":865
 *                 stop = shape
 *         else:
 *             if negative_step:             # <<<<<<<<<<<<<<
 *                 stop = -1
 *             else:
 */
        goto __pyx_L19;
      }

      /* "View.MemoryView":868
 *                 stop = -1
 *             else:
 *                 stop = shape             # <<<<<<<<<<<<<<
 * 
 *         if not have_step:
 */
      /*else*/ {
        __pyx_v_stop = __pyx_v_shape;
      }
      __pyx_L19:;
    }
    __pyx_L16:;

    /* "View.MemoryView":870
 *                 stop = shape
 * 
 *         if not have_step:             # <<<<<<<<<<<<<<
 *             step = 1
 * 
 */
    __pyx_t_2 = ((!(__pyx_v_have_step != 0)) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":871
 * 
 *         if not have_step:
 *             step = 1             # <<<<<<<<<<<<<<
 * 
 * 
 */
      __pyx_v_step = 1;

      /* "View.MemoryView":870
 *                 stop = shape
 * 
 *         if not have_step:             # <<<<<<<<<<<<<<
 *             step = 1
 * 
 */
    }

    /* "View.MemoryView":875
 * 
 *         with cython.cdivision(True):
 *             new_shape = (stop - start) // step             # <<<<<<<<<<<<<<
 * 
 *             if (stop - start) - step * new_shape:
 */
    __pyx_v_new_shape = ((__pyx_v_stop - __pyx_v_start) / __pyx_v_step);

    /* "View.MemoryView":877
 *             new_shape = (stop - start) // step
 * 
 *             if (stop - start) - step * new_shape:             # <<<<<<<<<<<<<<
 *                 new_shape += 1
 * 
 */
    __pyx_t_2 = (((__pyx_v_stop - __pyx_v_start) - (__pyx_v_step * __pyx_v_new_shape)) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":878
 * 
 *             if (stop - start) - step * new_shape:
 *                 new_shape += 1             # <<<<<<<<<<<<<<
 * 
 *         if new_shape < 0:
 */
      __pyx_v_new_shape = (__pyx_v_new_shape + 1);

      /* "View.MemoryView":877
 *             new_shape = (stop - start) // step
 * 
 *             if (stop - start) - step * new_shape:             # <<<<<<<<<<<<<<
 *                 new_shape += 1
 * 
 */
    }

    /* "View.MemoryView":880
 *                 new_shape += 1
 * 
 *         if new_shape < 0:             # <<<<<<<<<<<<<<
 *             new_shape = 0
 * 
 */
    __pyx_t_2 = ((__pyx_v_new_shape < 0) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":881
 * 
 *         if new_shape < 0:
 *             new_shape = 0             # <<<<<<<<<<<<<<
 * 
 * 
 */
      __pyx_v_new_shape = 0;

      /* "View.MemoryView":880
 *                 new_shape += 1
 * 
 *         if new_shape < 0:             # <<<<<<<<<<<<<<
 *             new_shape = 0
 * 
 */
    }

    /* "View.MemoryView":884
 * 
 * 
 *         dst.strides[new_ndim] = stride * step             # <<<<<<<<<<<<<<
 *         dst.shape[new_ndim] = new_shape
 *         dst.suboffsets[new_ndim] = suboffset
 */
    (__pyx_v_dst->strides[__pyx_v_new_ndim]) = (__pyx_v_stride * __pyx_v_step);

    /* "View.MemoryView":885
 * 
 *         dst.strides[new_ndim] = stride * step
 *         dst.shape[new_ndim] = new_shape             # <<<<<<<<<<<<<<
 *         dst.suboffsets[new_ndim] = suboffset
 * 
 */
    (__pyx_v_dst->shape[__pyx_v_new_ndim]) = __pyx_v_new_shape;

    /* "View.MemoryView":886
 *         dst.strides[new_ndim] = stride * step
 *         dst.shape[new_ndim] = new_shape
 *         dst.suboffsets[new_ndim] = suboffset             # <<<<<<<<<<<<<<
 * 
 * 
 */
    (__pyx_v_dst->suboffsets[__pyx_v_new_ndim]) = __pyx_v_suboffset;
  }
  __pyx_L3:;

  /* "View.MemoryView":889
 * 
 * 
 *     if suboffset_dim[0] < 0:             # <<<<<<<<<<<<<<
 *         dst.data += start * stride
 *     else:
 */
  __pyx_t_2 = (((__pyx_v_suboffset_dim[0]) < 0) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":890
 * 
 *     if suboffset_dim[0] < 0:
 *         dst.data += start * stride             # <<<<<<<<<<<<<<
 *     else:
 *         dst.suboffsets[suboffset_dim[0]] += start * stride
 */
    __pyx_v_dst->data = (__pyx_v_dst->data + (__pyx_v_start * __pyx_v_stride));

    /* "View.MemoryView":889
 * 
 * 
 *     if suboffset_dim[0] < 0:             # <<<<<<<<<<<<<<
 *         dst.data += start * stride
 *     else:
 */
    goto __pyx_L23;
  }

  /* "View.MemoryView":892
 *         dst.data += start * stride
 *     else:
 *         dst.suboffsets[suboffset_dim[0]] += start * stride             # <<<<<<<<<<<<<<
 * 
 *     if suboffset >= 0:
 */
  /*else*/ {
    __pyx_t_3 = (__pyx_v_suboffset_dim[0]);
    (__pyx_v_dst->suboffsets[__pyx_t_3]) = ((__pyx_v_dst->suboffsets[__pyx_t_3]) + (__pyx_v_start * __pyx_v_stride));
  }
  __pyx_L23:;

  /* "View.MemoryView":894
 *         dst.suboffsets[suboffset_dim[0]] += start * stride
 * 
 *     if suboffset >= 0:             # <<<<<<<<<<<<<<
 *         if not is_slice:
 *             if new_ndim == 0:
 */
  __pyx_t_2 = ((__pyx_v_suboffset >= 0) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":895
 * 
 *     if suboffset >= 0:
 *         if not is_slice:             # <<<<<<<<<<<<<<
 *             if new_ndim == 0:
 *                 dst.data = (<char **> dst.data)[0] + suboffset
 */
    __pyx_t_2 = ((!(__pyx_v_is_slice != 0)) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":896
 *     if suboffset >= 0:
 *         if not is_slice:
 *             if new_ndim == 0:             # <<<<<<<<<<<<<<
 *                 dst.data = (<char **> dst.data)[0] + suboffset
 *             else:
 */
      __pyx_t_2 = ((__pyx_v_new_ndim == 0) != 0);
      if (__pyx_t_2) {

        /* "View.MemoryView":897
 *         if not is_slice:
 *             if new_ndim == 0:
 *                 dst.data = (<char **> dst.data)[0] + suboffset             # <<<<<<<<<<<<<<
 *             else:
 *                 _err_dim(IndexError, "All dimensions preceding dimension %d "
 */
        __pyx_v_dst->data = ((((char **)__pyx_v_dst->data)[0]) + __pyx_v_suboffset);

        /* "View.MemoryView":896
 *     if suboffset >= 0:
 *         if not is_slice:
 *             if new_ndim == 0:             # <<<<<<<<<<<<<<
 *                 dst.data = (<char **> dst.data)[0] + suboffset
 *             else:
 */
        goto __pyx_L26;
      }

      /* "View.MemoryView":899
 *                 dst.data = (<char **> dst.data)[0] + suboffset
 *             else:
 *                 _err_dim(IndexError, "All dimensions preceding dimension %d "             # <<<<<<<<<<<<<<
 *                                      "must be indexed and not sliced", dim)
 *         else:
 */
      /*else*/ {

        /* "View.MemoryView":900
 *             else:
 *                 _err_dim(IndexError, "All dimensions preceding dimension %d "
 *                                      "must be indexed and not sliced", dim)             # <<<<<<<<<<<<<<
 *         else:
 *             suboffset_dim[0] = new_ndim
 */
        __pyx_t_3 = __pyx_memoryview_err_dim(__pyx_builtin_IndexError, ((char *)"All dimensions preceding dimension %d must be indexed and not sliced"), __pyx_v_dim); if (unlikely(__pyx_t_3 == ((int)-1))) __PYX_ERR(2, 899, __pyx_L1_error)
      }
      __pyx_L26:;

      /* "View.MemoryView":895
 * 
 *     if suboffset >= 0:
 *         if not is_slice:             # <<<<<<<<<<<<<<
 *             if new_ndim == 0:
 *                 dst.data = (<char **> dst.data)[0] + suboffset
 */
      goto __pyx_L25;
    }

    /* "View.MemoryView":902
 *                                      "must be indexed and not sliced", dim)
 *         else:
 *             suboffset_dim[0] = new_ndim             # <<<<<<<<<<<<<<
 * 
 *     return 0
 */
    /*else*/ {
      (__pyx_v_suboffset_dim[0]) = __pyx_v_new_ndim;
    }
    __pyx_L25:;

    /* "View.MemoryView":894
 *         dst.suboffsets[suboffset_dim[0]] += start * stride
 * 
 *     if suboffset >= 0:             # <<<<<<<<<<<<<<
 *         if not is_slice:
 *             if new_ndim == 0:
 */
  }

  /* "View.MemoryView":904
 *             suboffset_dim[0] = new_ndim
 * 
 *     return 0             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_r = 0;
  goto __pyx_L0;

  /* "View.MemoryView":807
 * 
 * @cname('__pyx_memoryview_slice_memviewslice')
 * cdef int slice_memviewslice(             # <<<<<<<<<<<<<<
 *         __Pyx_memviewslice *dst,
 *         Py_ssize_t shape, Py_ssize_t stride, Py_ssize_t suboffset,
 */

  /* function exit code */
  __pyx_L1_error:;
  {
    #ifdef WITH_THREAD
    PyGILState_STATE __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
    #endif
    __Pyx_AddTraceback("View.MemoryView.slice_memviewslice", __pyx_clineno, __pyx_lineno, __pyx_filename);
    #ifdef WITH_THREAD
    __Pyx_PyGILState_Release(__pyx_gilstate_save);
    #endif
  }
  __pyx_r = -1;
  __pyx_L0:;
  return __pyx_r;
}

/* "View.MemoryView":910
 * 
 * @cname('__pyx_pybuffer_index')
 * cdef char *pybuffer_index(Py_buffer *view, char *bufp, Py_ssize_t index,             # <<<<<<<<<<<<<<
 *                           Py_ssize_t dim) except NULL:
 *     cdef Py_ssize_t shape, stride, suboffset = -1
 */

static char *__pyx_pybuffer_index(Py_buffer *__pyx_v_view, char *__pyx_v_bufp, Py_ssize_t __pyx_v_index, Py_ssize_t __pyx_v_dim) {
  Py_ssize_t __pyx_v_shape;
  Py_ssize_t __pyx_v_stride;
  Py_ssize_t __pyx_v_suboffset;
  Py_ssize_t __pyx_v_itemsize;
  char *__pyx_v_resultp;
  char *__pyx_r;
  __Pyx_RefNannyDeclarations
  Py_ssize_t __pyx_t_1;
  int __pyx_t_2;
  PyObject *__pyx_t_3 = NULL;
  PyObject *__pyx_t_4 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("pybuffer_index", 0);

  /* "View.MemoryView":912
 * cdef char *pybuffer_index(Py_buffer *view, char *bufp, Py_ssize_t index,
 *                           Py_ssize_t dim) except NULL:
 *     cdef Py_ssize_t shape, stride, suboffset = -1             # <<<<<<<<<<<<<<
 *     cdef Py_ssize_t itemsize = view.itemsize
 *     cdef char *resultp
 */
  __pyx_v_suboffset = -1L;

  /* "View.MemoryView":913
 *                           Py_ssize_t dim) except NULL:
 *     cdef Py_ssize_t shape, stride, suboffset = -1
 *     cdef Py_ssize_t itemsize = view.itemsize             # <<<<<<<<<<<<<<
 *     cdef char *resultp
 * 
 */
  __pyx_t_1 = __pyx_v_view->itemsize;
  __pyx_v_itemsize = __pyx_t_1;

  /* "View.MemoryView":916
 *     cdef char *resultp
 * 
 *     if view.ndim == 0:             # <<<<<<<<<<<<<<
 *         shape = view.len / itemsize
 *         stride = itemsize
 */
  __pyx_t_2 = ((__pyx_v_view->ndim == 0) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":917
 * 
 *     if view.ndim == 0:
 *         shape = view.len / itemsize             # <<<<<<<<<<<<<<
 *         stride = itemsize
 *     else:
 */
    if (unlikely(__pyx_v_itemsize == 0)) {
      PyErr_SetString(PyExc_ZeroDivisionError, "integer division or modulo by zero");
      __PYX_ERR(2, 917, __pyx_L1_error)
    }
    else if (sizeof(Py_ssize_t) == sizeof(long) && (!(((Py_ssize_t)-1) > 0)) && unlikely(__pyx_v_itemsize == (Py_ssize_t)-1)  && unlikely(UNARY_NEG_WOULD_OVERFLOW(__pyx_v_view->len))) {
      PyErr_SetString(PyExc_OverflowError, "value too large to perform division");
      __PYX_ERR(2, 917, __pyx_L1_error)
    }
    __pyx_v_shape = __Pyx_div_Py_ssize_t(__pyx_v_view->len, __pyx_v_itemsize);

    /* "View.MemoryView":918
 *     if view.ndim == 0:
 *         shape = view.len / itemsize
 *         stride = itemsize             # <<<<<<<<<<<<<<
 *     else:
 *         shape = view.shape[dim]
 */
    __pyx_v_stride = __pyx_v_itemsize;

    /* "View.MemoryView":916
 *     cdef char *resultp
 * 
 *     if view.ndim == 0:             # <<<<<<<<<<<<<<
 *         shape = view.len / itemsize
 *         stride = itemsize
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":920
 *         stride = itemsize
 *     else:
 *         shape = view.shape[dim]             # <<<<<<<<<<<<<<
 *         stride = view.strides[dim]
 *         if view.suboffsets != NULL:
 */
  /*else*/ {
    __pyx_v_shape = (__pyx_v_view->shape[__pyx_v_dim]);

    /* "View.MemoryView":921
 *     else:
 *         shape = view.shape[dim]
 *         stride = view.strides[dim]             # <<<<<<<<<<<<<<
 *         if view.suboffsets != NULL:
 *             suboffset = view.suboffsets[dim]
 */
    __pyx_v_stride = (__pyx_v_view->strides[__pyx_v_dim]);

    /* "View.MemoryView":922
 *         shape = view.shape[dim]
 *         stride = view.strides[dim]
 *         if view.suboffsets != NULL:             # <<<<<<<<<<<<<<
 *             suboffset = view.suboffsets[dim]
 * 
 */
    __pyx_t_2 = ((__pyx_v_view->suboffsets != NULL) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":923
 *         stride = view.strides[dim]
 *         if view.suboffsets != NULL:
 *             suboffset = view.suboffsets[dim]             # <<<<<<<<<<<<<<
 * 
 *     if index < 0:
 */
      __pyx_v_suboffset = (__pyx_v_view->suboffsets[__pyx_v_dim]);

      /* "View.MemoryView":922
 *         shape = view.shape[dim]
 *         stride = view.strides[dim]
 *         if view.suboffsets != NULL:             # <<<<<<<<<<<<<<
 *             suboffset = view.suboffsets[dim]
 * 
 */
    }
  }
  __pyx_L3:;

  /* "View.MemoryView":925
 *             suboffset = view.suboffsets[dim]
 * 
 *     if index < 0:             # <<<<<<<<<<<<<<
 *         index += view.shape[dim]
 *         if index < 0:
 */
  __pyx_t_2 = ((__pyx_v_index < 0) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":926
 * 
 *     if index < 0:
 *         index += view.shape[dim]             # <<<<<<<<<<<<<<
 *         if index < 0:
 *             raise IndexError("Out of bounds on buffer access (axis %d)" % dim)
 */
    __pyx_v_index = (__pyx_v_index + (__pyx_v_view->shape[__pyx_v_dim]));

    /* "View.MemoryView":927
 *     if index < 0:
 *         index += view.shape[dim]
 *         if index < 0:             # <<<<<<<<<<<<<<
 *             raise IndexError("Out of bounds on buffer access (axis %d)" % dim)
 * 
 */
    __pyx_t_2 = ((__pyx_v_index < 0) != 0);
    if (unlikely(__pyx_t_2)) {

      /* "View.MemoryView":928
 *         index += view.shape[dim]
 *         if index < 0:
 *             raise IndexError("Out of bounds on buffer access (axis %d)" % dim)             # <<<<<<<<<<<<<<
 * 
 *     if index >= shape:
 */
      __pyx_t_3 = PyInt_FromSsize_t(__pyx_v_dim); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 928, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __pyx_t_4 = __Pyx_PyString_Format(__pyx_kp_s_Out_of_bounds_on_buffer_access_a, __pyx_t_3); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 928, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_4);
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
      __pyx_t_3 = __Pyx_PyObject_CallOneArg(__pyx_builtin_IndexError, __pyx_t_4); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 928, __pyx_L1_error)
      __Pyx_GOTREF(__pyx_t_3);
      __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
      __Pyx_Raise(__pyx_t_3, 0, 0, 0);
      __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
      __PYX_ERR(2, 928, __pyx_L1_error)

      /* "View.MemoryView":927
 *     if index < 0:
 *         index += view.shape[dim]
 *         if index < 0:             # <<<<<<<<<<<<<<
 *             raise IndexError("Out of bounds on buffer access (axis %d)" % dim)
 * 
 */
    }

    /* "View.MemoryView":925
 *             suboffset = view.suboffsets[dim]
 * 
 *     if index < 0:             # <<<<<<<<<<<<<<
 *         index += view.shape[dim]
 *         if index < 0:
 */
  }

  /* "View.MemoryView":930
 *             raise IndexError("Out of bounds on buffer access (axis %d)" % dim)
 * 
 *     if index >= shape:             # <<<<<<<<<<<<<<
 *         raise IndexError("Out of bounds on buffer access (axis %d)" % dim)
 * 
 */
  __pyx_t_2 = ((__pyx_v_index >= __pyx_v_shape) != 0);
  if (unlikely(__pyx_t_2)) {

    /* "View.MemoryView":931
 * 
 *     if index >= shape:
 *         raise IndexError("Out of bounds on buffer access (axis %d)" % dim)             # <<<<<<<<<<<<<<
 * 
 *     resultp = bufp + index * stride
 */
    __pyx_t_3 = PyInt_FromSsize_t(__pyx_v_dim); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 931, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __pyx_t_4 = __Pyx_PyString_Format(__pyx_kp_s_Out_of_bounds_on_buffer_access_a, __pyx_t_3); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 931, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __pyx_t_3 = __Pyx_PyObject_CallOneArg(__pyx_builtin_IndexError, __pyx_t_4); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 931, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
    __Pyx_Raise(__pyx_t_3, 0, 0, 0);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __PYX_ERR(2, 931, __pyx_L1_error)

    /* "View.MemoryView":930
 *             raise IndexError("Out of bounds on buffer access (axis %d)" % dim)
 * 
 *     if index >= shape:             # <<<<<<<<<<<<<<
 *         raise IndexError("Out of bounds on buffer access (axis %d)" % dim)
 * 
 */
  }

  /* "View.MemoryView":933
 *         raise IndexError("Out of bounds on buffer access (axis %d)" % dim)
 * 
 *     resultp = bufp + index * stride             # <<<<<<<<<<<<<<
 *     if suboffset >= 0:
 *         resultp = (<char **> resultp)[0] + suboffset
 */
  __pyx_v_resultp = (__pyx_v_bufp + (__pyx_v_index * __pyx_v_stride));

  /* "View.MemoryView":934
 * 
 *     resultp = bufp + index * stride
 *     if suboffset >= 0:             # <<<<<<<<<<<<<<
 *         resultp = (<char **> resultp)[0] + suboffset
 * 
 */
  __pyx_t_2 = ((__pyx_v_suboffset >= 0) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":935
 *     resultp = bufp + index * stride
 *     if suboffset >= 0:
 *         resultp = (<char **> resultp)[0] + suboffset             # <<<<<<<<<<<<<<
 * 
 *     return resultp
 */
    __pyx_v_resultp = ((((char **)__pyx_v_resultp)[0]) + __pyx_v_suboffset);

    /* "View.MemoryView":934
 * 
 *     resultp = bufp + index * stride
 *     if suboffset >= 0:             # <<<<<<<<<<<<<<
 *         resultp = (<char **> resultp)[0] + suboffset
 * 
 */
  }

  /* "View.MemoryView":937
 *         resultp = (<char **> resultp)[0] + suboffset
 * 
 *     return resultp             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_r = __pyx_v_resultp;
  goto __pyx_L0;

  /* "View.MemoryView":910
 * 
 * @cname('__pyx_pybuffer_index')
 * cdef char *pybuffer_index(Py_buffer *view, char *bufp, Py_ssize_t index,             # <<<<<<<<<<<<<<
 *                           Py_ssize_t dim) except NULL:
 *     cdef Py_ssize_t shape, stride, suboffset = -1
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_AddTraceback("View.MemoryView.pybuffer_index", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":943
 * 
 * @cname('__pyx_memslice_transpose')
 * cdef int transpose_memslice(__Pyx_memviewslice *memslice) nogil except 0:             # <<<<<<<<<<<<<<
 *     cdef int ndim = memslice.memview.view.ndim
 * 
 */

static int __pyx_memslice_transpose(__Pyx_memviewslice *__pyx_v_memslice) {
  int __pyx_v_ndim;
  Py_ssize_t *__pyx_v_shape;
  Py_ssize_t *__pyx_v_strides;
  int __pyx_v_i;
  int __pyx_v_j;
  int __pyx_r;
  int __pyx_t_1;
  Py_ssize_t *__pyx_t_2;
  long __pyx_t_3;
  long __pyx_t_4;
  Py_ssize_t __pyx_t_5;
  Py_ssize_t __pyx_t_6;
  int __pyx_t_7;
  int __pyx_t_8;
  int __pyx_t_9;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;

  /* "View.MemoryView":944
 * @cname('__pyx_memslice_transpose')
 * cdef int transpose_memslice(__Pyx_memviewslice *memslice) nogil except 0:
 *     cdef int ndim = memslice.memview.view.ndim             # <<<<<<<<<<<<<<
 * 
 *     cdef Py_ssize_t *shape = memslice.shape
 */
  __pyx_t_1 = __pyx_v_memslice->memview->view.ndim;
  __pyx_v_ndim = __pyx_t_1;

  /* "View.MemoryView":946
 *     cdef int ndim = memslice.memview.view.ndim
 * 
 *     cdef Py_ssize_t *shape = memslice.shape             # <<<<<<<<<<<<<<
 *     cdef Py_ssize_t *strides = memslice.strides
 * 
 */
  __pyx_t_2 = __pyx_v_memslice->shape;
  __pyx_v_shape = __pyx_t_2;

  /* "View.MemoryView":947
 * 
 *     cdef Py_ssize_t *shape = memslice.shape
 *     cdef Py_ssize_t *strides = memslice.strides             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_t_2 = __pyx_v_memslice->strides;
  __pyx_v_strides = __pyx_t_2;

  /* "View.MemoryView":951
 * 
 *     cdef int i, j
 *     for i in range(ndim / 2):             # <<<<<<<<<<<<<<
 *         j = ndim - 1 - i
 *         strides[i], strides[j] = strides[j], strides[i]
 */
  __pyx_t_3 = __Pyx_div_long(__pyx_v_ndim, 2);
  __pyx_t_4 = __pyx_t_3;
  for (__pyx_t_1 = 0; __pyx_t_1 < __pyx_t_4; __pyx_t_1+=1) {
    __pyx_v_i = __pyx_t_1;

    /* "View.MemoryView":952
 *     cdef int i, j
 *     for i in range(ndim / 2):
 *         j = ndim - 1 - i             # <<<<<<<<<<<<<<
 *         strides[i], strides[j] = strides[j], strides[i]
 *         shape[i], shape[j] = shape[j], shape[i]
 */
    __pyx_v_j = ((__pyx_v_ndim - 1) - __pyx_v_i);

    /* "View.MemoryView":953
 *     for i in range(ndim / 2):
 *         j = ndim - 1 - i
 *         strides[i], strides[j] = strides[j], strides[i]             # <<<<<<<<<<<<<<
 *         shape[i], shape[j] = shape[j], shape[i]
 * 
 */
    __pyx_t_5 = (__pyx_v_strides[__pyx_v_j]);
    __pyx_t_6 = (__pyx_v_strides[__pyx_v_i]);
    (__pyx_v_strides[__pyx_v_i]) = __pyx_t_5;
    (__pyx_v_strides[__pyx_v_j]) = __pyx_t_6;

    /* "View.MemoryView":954
 *         j = ndim - 1 - i
 *         strides[i], strides[j] = strides[j], strides[i]
 *         shape[i], shape[j] = shape[j], shape[i]             # <<<<<<<<<<<<<<
 * 
 *         if memslice.suboffsets[i] >= 0 or memslice.suboffsets[j] >= 0:
 */
    __pyx_t_6 = (__pyx_v_shape[__pyx_v_j]);
    __pyx_t_5 = (__pyx_v_shape[__pyx_v_i]);
    (__pyx_v_shape[__pyx_v_i]) = __pyx_t_6;
    (__pyx_v_shape[__pyx_v_j]) = __pyx_t_5;

    /* "View.MemoryView":956
 *         shape[i], shape[j] = shape[j], shape[i]
 * 
 *         if memslice.suboffsets[i] >= 0 or memslice.suboffsets[j] >= 0:             # <<<<<<<<<<<<<<
 *             _err(ValueError, "Cannot transpose memoryview with indirect dimensions")
 * 
 */
    __pyx_t_8 = (((__pyx_v_memslice->suboffsets[__pyx_v_i]) >= 0) != 0);
    if (!__pyx_t_8) {
    } else {
      __pyx_t_7 = __pyx_t_8;
      goto __pyx_L6_bool_binop_done;
    }
    __pyx_t_8 = (((__pyx_v_memslice->suboffsets[__pyx_v_j]) >= 0) != 0);
    __pyx_t_7 = __pyx_t_8;
    __pyx_L6_bool_binop_done:;
    if (__pyx_t_7) {

      /* "View.MemoryView":957
 * 
 *         if memslice.suboffsets[i] >= 0 or memslice.suboffsets[j] >= 0:
 *             _err(ValueError, "Cannot transpose memoryview with indirect dimensions")             # <<<<<<<<<<<<<<
 * 
 *     return 1
 */
      __pyx_t_9 = __pyx_memoryview_err(__pyx_builtin_ValueError, ((char *)"Cannot transpose memoryview with indirect dimensions")); if (unlikely(__pyx_t_9 == ((int)-1))) __PYX_ERR(2, 957, __pyx_L1_error)

      /* "View.MemoryView":956
 *         shape[i], shape[j] = shape[j], shape[i]
 * 
 *         if memslice.suboffsets[i] >= 0 or memslice.suboffsets[j] >= 0:             # <<<<<<<<<<<<<<
 *             _err(ValueError, "Cannot transpose memoryview with indirect dimensions")
 * 
 */
    }
  }

  /* "View.MemoryView":959
 *             _err(ValueError, "Cannot transpose memoryview with indirect dimensions")
 * 
 *     return 1             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_r = 1;
  goto __pyx_L0;

  /* "View.MemoryView":943
 * 
 * @cname('__pyx_memslice_transpose')
 * cdef int transpose_memslice(__Pyx_memviewslice *memslice) nogil except 0:             # <<<<<<<<<<<<<<
 *     cdef int ndim = memslice.memview.view.ndim
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  {
    #ifdef WITH_THREAD
    PyGILState_STATE __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
    #endif
    __Pyx_AddTraceback("View.MemoryView.transpose_memslice", __pyx_clineno, __pyx_lineno, __pyx_filename);
    #ifdef WITH_THREAD
    __Pyx_PyGILState_Release(__pyx_gilstate_save);
    #endif
  }
  __pyx_r = 0;
  __pyx_L0:;
  return __pyx_r;
}

/* "View.MemoryView":976
 *     cdef int (*to_dtype_func)(char *, object) except 0
 * 
 *     def __dealloc__(self):             # <<<<<<<<<<<<<<
 *         __PYX_XDEC_MEMVIEW(&self.from_slice, 1)
 * 
 */

/* Python wrapper */
static void __pyx_memoryviewslice___dealloc__(PyObject *__pyx_v_self); /*proto*/
static void __pyx_memoryviewslice___dealloc__(PyObject *__pyx_v_self) {
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__dealloc__ (wrapper)", 0);
  __pyx_memoryviewslice___pyx_pf_15View_dot_MemoryView_16_memoryviewslice___dealloc__(((struct __pyx_memoryviewslice_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
}

static void __pyx_memoryviewslice___pyx_pf_15View_dot_MemoryView_16_memoryviewslice___dealloc__(struct __pyx_memoryviewslice_obj *__pyx_v_self) {
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__dealloc__", 0);

  /* "View.MemoryView":977
 * 
 *     def __dealloc__(self):
 *         __PYX_XDEC_MEMVIEW(&self.from_slice, 1)             # <<<<<<<<<<<<<<
 * 
 *     cdef convert_item_to_object(self, char *itemp):
 */
  __PYX_XDEC_MEMVIEW((&__pyx_v_self->from_slice), 1);

  /* "View.MemoryView":976
 *     cdef int (*to_dtype_func)(char *, object) except 0
 * 
 *     def __dealloc__(self):             # <<<<<<<<<<<<<<
 *         __PYX_XDEC_MEMVIEW(&self.from_slice, 1)
 * 
 */

  /* function exit code */
  __Pyx_RefNannyFinishContext();
}

/* "View.MemoryView":979
 *         __PYX_XDEC_MEMVIEW(&self.from_slice, 1)
 * 
 *     cdef convert_item_to_object(self, char *itemp):             # <<<<<<<<<<<<<<
 *         if self.to_object_func != NULL:
 *             return self.to_object_func(itemp)
 */

static PyObject *__pyx_memoryviewslice_convert_item_to_object(struct __pyx_memoryviewslice_obj *__pyx_v_self, char *__pyx_v_itemp) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("convert_item_to_object", 0);

  /* "View.MemoryView":980
 * 
 *     cdef convert_item_to_object(self, char *itemp):
 *         if self.to_object_func != NULL:             # <<<<<<<<<<<<<<
 *             return self.to_object_func(itemp)
 *         else:
 */
  __pyx_t_1 = ((__pyx_v_self->to_object_func != NULL) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":981
 *     cdef convert_item_to_object(self, char *itemp):
 *         if self.to_object_func != NULL:
 *             return self.to_object_func(itemp)             # <<<<<<<<<<<<<<
 *         else:
 *             return memoryview.convert_item_to_object(self, itemp)
 */
    __Pyx_XDECREF(__pyx_r);
    __pyx_t_2 = __pyx_v_self->to_object_func(__pyx_v_itemp); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 981, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __pyx_r = __pyx_t_2;
    __pyx_t_2 = 0;
    goto __pyx_L0;

    /* "View.MemoryView":980
 * 
 *     cdef convert_item_to_object(self, char *itemp):
 *         if self.to_object_func != NULL:             # <<<<<<<<<<<<<<
 *             return self.to_object_func(itemp)
 *         else:
 */
  }

  /* "View.MemoryView":983
 *             return self.to_object_func(itemp)
 *         else:
 *             return memoryview.convert_item_to_object(self, itemp)             # <<<<<<<<<<<<<<
 * 
 *     cdef assign_item_from_object(self, char *itemp, object value):
 */
  /*else*/ {
    __Pyx_XDECREF(__pyx_r);
    __pyx_t_2 = __pyx_memoryview_convert_item_to_object(((struct __pyx_memoryview_obj *)__pyx_v_self), __pyx_v_itemp); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 983, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __pyx_r = __pyx_t_2;
    __pyx_t_2 = 0;
    goto __pyx_L0;
  }

  /* "View.MemoryView":979
 *         __PYX_XDEC_MEMVIEW(&self.from_slice, 1)
 * 
 *     cdef convert_item_to_object(self, char *itemp):             # <<<<<<<<<<<<<<
 *         if self.to_object_func != NULL:
 *             return self.to_object_func(itemp)
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_AddTraceback("View.MemoryView._memoryviewslice.convert_item_to_object", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":985
 *             return memoryview.convert_item_to_object(self, itemp)
 * 
 *     cdef assign_item_from_object(self, char *itemp, object value):             # <<<<<<<<<<<<<<
 *         if self.to_dtype_func != NULL:
 *             self.to_dtype_func(itemp, value)
 */

static PyObject *__pyx_memoryviewslice_assign_item_from_object(struct __pyx_memoryviewslice_obj *__pyx_v_self, char *__pyx_v_itemp, PyObject *__pyx_v_value) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  int __pyx_t_2;
  PyObject *__pyx_t_3 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("assign_item_from_object", 0);

  /* "View.MemoryView":986
 * 
 *     cdef assign_item_from_object(self, char *itemp, object value):
 *         if self.to_dtype_func != NULL:             # <<<<<<<<<<<<<<
 *             self.to_dtype_func(itemp, value)
 *         else:
 */
  __pyx_t_1 = ((__pyx_v_self->to_dtype_func != NULL) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":987
 *     cdef assign_item_from_object(self, char *itemp, object value):
 *         if self.to_dtype_func != NULL:
 *             self.to_dtype_func(itemp, value)             # <<<<<<<<<<<<<<
 *         else:
 *             memoryview.assign_item_from_object(self, itemp, value)
 */
    __pyx_t_2 = __pyx_v_self->to_dtype_func(__pyx_v_itemp, __pyx_v_value); if (unlikely(__pyx_t_2 == ((int)0))) __PYX_ERR(2, 987, __pyx_L1_error)

    /* "View.MemoryView":986
 * 
 *     cdef assign_item_from_object(self, char *itemp, object value):
 *         if self.to_dtype_func != NULL:             # <<<<<<<<<<<<<<
 *             self.to_dtype_func(itemp, value)
 *         else:
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":989
 *             self.to_dtype_func(itemp, value)
 *         else:
 *             memoryview.assign_item_from_object(self, itemp, value)             # <<<<<<<<<<<<<<
 * 
 *     @property
 */
  /*else*/ {
    __pyx_t_3 = __pyx_memoryview_assign_item_from_object(((struct __pyx_memoryview_obj *)__pyx_v_self), __pyx_v_itemp, __pyx_v_value); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 989, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  }
  __pyx_L3:;

  /* "View.MemoryView":985
 *             return memoryview.convert_item_to_object(self, itemp)
 * 
 *     cdef assign_item_from_object(self, char *itemp, object value):             # <<<<<<<<<<<<<<
 *         if self.to_dtype_func != NULL:
 *             self.to_dtype_func(itemp, value)
 */

  /* function exit code */
  __pyx_r = Py_None; __Pyx_INCREF(Py_None);
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_AddTraceback("View.MemoryView._memoryviewslice.assign_item_from_object", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":992
 * 
 *     @property
 *     def base(self):             # <<<<<<<<<<<<<<
 *         return self.from_object
 * 
 */

/* Python wrapper */
static PyObject *__pyx_pw_15View_dot_MemoryView_16_memoryviewslice_4base_1__get__(PyObject *__pyx_v_self); /*proto*/
static PyObject *__pyx_pw_15View_dot_MemoryView_16_memoryviewslice_4base_1__get__(PyObject *__pyx_v_self) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__get__ (wrapper)", 0);
  __pyx_r = __pyx_pf_15View_dot_MemoryView_16_memoryviewslice_4base___get__(((struct __pyx_memoryviewslice_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_15View_dot_MemoryView_16_memoryviewslice_4base___get__(struct __pyx_memoryviewslice_obj *__pyx_v_self) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__get__", 0);

  /* "View.MemoryView":993
 *     @property
 *     def base(self):
 *         return self.from_object             # <<<<<<<<<<<<<<
 * 
 *     __pyx_getbuffer = capsule(<void *> &__pyx_memoryview_getbuffer, "getbuffer(obj, view, flags)")
 */
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v_self->from_object);
  __pyx_r = __pyx_v_self->from_object;
  goto __pyx_L0;

  /* "View.MemoryView":992
 * 
 *     @property
 *     def base(self):             # <<<<<<<<<<<<<<
 *         return self.from_object
 * 
 */

  /* function exit code */
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "(tree fragment)":1
 * def __reduce_cython__(self):             # <<<<<<<<<<<<<<
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):
 */

/* Python wrapper */
static PyObject *__pyx_pw___pyx_memoryviewslice_1__reduce_cython__(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused); /*proto*/
static PyObject *__pyx_pw___pyx_memoryviewslice_1__reduce_cython__(PyObject *__pyx_v_self, CYTHON_UNUSED PyObject *unused) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__reduce_cython__ (wrapper)", 0);
  __pyx_r = __pyx_pf___pyx_memoryviewslice___reduce_cython__(((struct __pyx_memoryviewslice_obj *)__pyx_v_self));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf___pyx_memoryviewslice___reduce_cython__(CYTHON_UNUSED struct __pyx_memoryviewslice_obj *__pyx_v_self) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__reduce_cython__", 0);

  /* "(tree fragment)":2
 * def __reduce_cython__(self):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")             # <<<<<<<<<<<<<<
 * def __setstate_cython__(self, __pyx_state):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 */
  __pyx_t_1 = __Pyx_PyObject_Call(__pyx_builtin_TypeError, __pyx_tuple__20, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 2, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_Raise(__pyx_t_1, 0, 0, 0);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __PYX_ERR(2, 2, __pyx_L1_error)

  /* "(tree fragment)":1
 * def __reduce_cython__(self):             # <<<<<<<<<<<<<<
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("View.MemoryView._memoryviewslice.__reduce_cython__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "(tree fragment)":3
 * def __reduce_cython__(self):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):             # <<<<<<<<<<<<<<
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 */

/* Python wrapper */
static PyObject *__pyx_pw___pyx_memoryviewslice_3__setstate_cython__(PyObject *__pyx_v_self, PyObject *__pyx_v___pyx_state); /*proto*/
static PyObject *__pyx_pw___pyx_memoryviewslice_3__setstate_cython__(PyObject *__pyx_v_self, PyObject *__pyx_v___pyx_state) {
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__setstate_cython__ (wrapper)", 0);
  __pyx_r = __pyx_pf___pyx_memoryviewslice_2__setstate_cython__(((struct __pyx_memoryviewslice_obj *)__pyx_v_self), ((PyObject *)__pyx_v___pyx_state));

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf___pyx_memoryviewslice_2__setstate_cython__(CYTHON_UNUSED struct __pyx_memoryviewslice_obj *__pyx_v_self, CYTHON_UNUSED PyObject *__pyx_v___pyx_state) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__setstate_cython__", 0);

  /* "(tree fragment)":4
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")             # <<<<<<<<<<<<<<
 */
  __pyx_t_1 = __Pyx_PyObject_Call(__pyx_builtin_TypeError, __pyx_tuple__21, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_Raise(__pyx_t_1, 0, 0, 0);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __PYX_ERR(2, 4, __pyx_L1_error)

  /* "(tree fragment)":3
 * def __reduce_cython__(self):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):             # <<<<<<<<<<<<<<
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("View.MemoryView._memoryviewslice.__setstate_cython__", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":999
 * 
 * @cname('__pyx_memoryview_fromslice')
 * cdef memoryview_fromslice(__Pyx_memviewslice memviewslice,             # <<<<<<<<<<<<<<
 *                           int ndim,
 *                           object (*to_object_func)(char *),
 */

static PyObject *__pyx_memoryview_fromslice(__Pyx_memviewslice __pyx_v_memviewslice, int __pyx_v_ndim, PyObject *(*__pyx_v_to_object_func)(char *), int (*__pyx_v_to_dtype_func)(char *, PyObject *), int __pyx_v_dtype_is_object) {
  struct __pyx_memoryviewslice_obj *__pyx_v_result = 0;
  Py_ssize_t __pyx_v_suboffset;
  PyObject *__pyx_v_length = NULL;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  __Pyx_TypeInfo *__pyx_t_4;
  Py_buffer __pyx_t_5;
  Py_ssize_t *__pyx_t_6;
  Py_ssize_t *__pyx_t_7;
  Py_ssize_t *__pyx_t_8;
  Py_ssize_t __pyx_t_9;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("memoryview_fromslice", 0);

  /* "View.MemoryView":1007
 *     cdef _memoryviewslice result
 * 
 *     if <PyObject *> memviewslice.memview == Py_None:             # <<<<<<<<<<<<<<
 *         return None
 * 
 */
  __pyx_t_1 = ((((PyObject *)__pyx_v_memviewslice.memview) == Py_None) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":1008
 * 
 *     if <PyObject *> memviewslice.memview == Py_None:
 *         return None             # <<<<<<<<<<<<<<
 * 
 * 
 */
    __Pyx_XDECREF(__pyx_r);
    __pyx_r = Py_None; __Pyx_INCREF(Py_None);
    goto __pyx_L0;

    /* "View.MemoryView":1007
 *     cdef _memoryviewslice result
 * 
 *     if <PyObject *> memviewslice.memview == Py_None:             # <<<<<<<<<<<<<<
 *         return None
 * 
 */
  }

  /* "View.MemoryView":1013
 * 
 * 
 *     result = _memoryviewslice(None, 0, dtype_is_object)             # <<<<<<<<<<<<<<
 * 
 *     result.from_slice = memviewslice
 */
  __pyx_t_2 = __Pyx_PyBool_FromLong(__pyx_v_dtype_is_object); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 1013, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = PyTuple_New(3); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 1013, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_INCREF(Py_None);
  __Pyx_GIVEREF(Py_None);
  PyTuple_SET_ITEM(__pyx_t_3, 0, Py_None);
  __Pyx_INCREF(__pyx_int_0);
  __Pyx_GIVEREF(__pyx_int_0);
  PyTuple_SET_ITEM(__pyx_t_3, 1, __pyx_int_0);
  __Pyx_GIVEREF(__pyx_t_2);
  PyTuple_SET_ITEM(__pyx_t_3, 2, __pyx_t_2);
  __pyx_t_2 = 0;
  __pyx_t_2 = __Pyx_PyObject_Call(((PyObject *)__pyx_memoryviewslice_type), __pyx_t_3, NULL); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 1013, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __pyx_v_result = ((struct __pyx_memoryviewslice_obj *)__pyx_t_2);
  __pyx_t_2 = 0;

  /* "View.MemoryView":1015
 *     result = _memoryviewslice(None, 0, dtype_is_object)
 * 
 *     result.from_slice = memviewslice             # <<<<<<<<<<<<<<
 *     __PYX_INC_MEMVIEW(&memviewslice, 1)
 * 
 */
  __pyx_v_result->from_slice = __pyx_v_memviewslice;

  /* "View.MemoryView":1016
 * 
 *     result.from_slice = memviewslice
 *     __PYX_INC_MEMVIEW(&memviewslice, 1)             # <<<<<<<<<<<<<<
 * 
 *     result.from_object = (<memoryview> memviewslice.memview).base
 */
  __PYX_INC_MEMVIEW((&__pyx_v_memviewslice), 1);

  /* "View.MemoryView":1018
 *     __PYX_INC_MEMVIEW(&memviewslice, 1)
 * 
 *     result.from_object = (<memoryview> memviewslice.memview).base             # <<<<<<<<<<<<<<
 *     result.typeinfo = memviewslice.memview.typeinfo
 * 
 */
  __pyx_t_2 = __Pyx_PyObject_GetAttrStr(((PyObject *)__pyx_v_memviewslice.memview), __pyx_n_s_base); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 1018, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __Pyx_GIVEREF(__pyx_t_2);
  __Pyx_GOTREF(__pyx_v_result->from_object);
  __Pyx_DECREF(__pyx_v_result->from_object);
  __pyx_v_result->from_object = __pyx_t_2;
  __pyx_t_2 = 0;

  /* "View.MemoryView":1019
 * 
 *     result.from_object = (<memoryview> memviewslice.memview).base
 *     result.typeinfo = memviewslice.memview.typeinfo             # <<<<<<<<<<<<<<
 * 
 *     result.view = memviewslice.memview.view
 */
  __pyx_t_4 = __pyx_v_memviewslice.memview->typeinfo;
  __pyx_v_result->__pyx_base.typeinfo = __pyx_t_4;

  /* "View.MemoryView":1021
 *     result.typeinfo = memviewslice.memview.typeinfo
 * 
 *     result.view = memviewslice.memview.view             # <<<<<<<<<<<<<<
 *     result.view.buf = <void *> memviewslice.data
 *     result.view.ndim = ndim
 */
  __pyx_t_5 = __pyx_v_memviewslice.memview->view;
  __pyx_v_result->__pyx_base.view = __pyx_t_5;

  /* "View.MemoryView":1022
 * 
 *     result.view = memviewslice.memview.view
 *     result.view.buf = <void *> memviewslice.data             # <<<<<<<<<<<<<<
 *     result.view.ndim = ndim
 *     (<__pyx_buffer *> &result.view).obj = Py_None
 */
  __pyx_v_result->__pyx_base.view.buf = ((void *)__pyx_v_memviewslice.data);

  /* "View.MemoryView":1023
 *     result.view = memviewslice.memview.view
 *     result.view.buf = <void *> memviewslice.data
 *     result.view.ndim = ndim             # <<<<<<<<<<<<<<
 *     (<__pyx_buffer *> &result.view).obj = Py_None
 *     Py_INCREF(Py_None)
 */
  __pyx_v_result->__pyx_base.view.ndim = __pyx_v_ndim;

  /* "View.MemoryView":1024
 *     result.view.buf = <void *> memviewslice.data
 *     result.view.ndim = ndim
 *     (<__pyx_buffer *> &result.view).obj = Py_None             # <<<<<<<<<<<<<<
 *     Py_INCREF(Py_None)
 * 
 */
  ((Py_buffer *)(&__pyx_v_result->__pyx_base.view))->obj = Py_None;

  /* "View.MemoryView":1025
 *     result.view.ndim = ndim
 *     (<__pyx_buffer *> &result.view).obj = Py_None
 *     Py_INCREF(Py_None)             # <<<<<<<<<<<<<<
 * 
 *     if (<memoryview>memviewslice.memview).flags & PyBUF_WRITABLE:
 */
  Py_INCREF(Py_None);

  /* "View.MemoryView":1027
 *     Py_INCREF(Py_None)
 * 
 *     if (<memoryview>memviewslice.memview).flags & PyBUF_WRITABLE:             # <<<<<<<<<<<<<<
 *         result.flags = PyBUF_RECORDS
 *     else:
 */
  __pyx_t_1 = ((((struct __pyx_memoryview_obj *)__pyx_v_memviewslice.memview)->flags & PyBUF_WRITABLE) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":1028
 * 
 *     if (<memoryview>memviewslice.memview).flags & PyBUF_WRITABLE:
 *         result.flags = PyBUF_RECORDS             # <<<<<<<<<<<<<<
 *     else:
 *         result.flags = PyBUF_RECORDS_RO
 */
    __pyx_v_result->__pyx_base.flags = PyBUF_RECORDS;

    /* "View.MemoryView":1027
 *     Py_INCREF(Py_None)
 * 
 *     if (<memoryview>memviewslice.memview).flags & PyBUF_WRITABLE:             # <<<<<<<<<<<<<<
 *         result.flags = PyBUF_RECORDS
 *     else:
 */
    goto __pyx_L4;
  }

  /* "View.MemoryView":1030
 *         result.flags = PyBUF_RECORDS
 *     else:
 *         result.flags = PyBUF_RECORDS_RO             # <<<<<<<<<<<<<<
 * 
 *     result.view.shape = <Py_ssize_t *> result.from_slice.shape
 */
  /*else*/ {
    __pyx_v_result->__pyx_base.flags = PyBUF_RECORDS_RO;
  }
  __pyx_L4:;

  /* "View.MemoryView":1032
 *         result.flags = PyBUF_RECORDS_RO
 * 
 *     result.view.shape = <Py_ssize_t *> result.from_slice.shape             # <<<<<<<<<<<<<<
 *     result.view.strides = <Py_ssize_t *> result.from_slice.strides
 * 
 */
  __pyx_v_result->__pyx_base.view.shape = ((Py_ssize_t *)__pyx_v_result->from_slice.shape);

  /* "View.MemoryView":1033
 * 
 *     result.view.shape = <Py_ssize_t *> result.from_slice.shape
 *     result.view.strides = <Py_ssize_t *> result.from_slice.strides             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_v_result->__pyx_base.view.strides = ((Py_ssize_t *)__pyx_v_result->from_slice.strides);

  /* "View.MemoryView":1036
 * 
 * 
 *     result.view.suboffsets = NULL             # <<<<<<<<<<<<<<
 *     for suboffset in result.from_slice.suboffsets[:ndim]:
 *         if suboffset >= 0:
 */
  __pyx_v_result->__pyx_base.view.suboffsets = NULL;

  /* "View.MemoryView":1037
 * 
 *     result.view.suboffsets = NULL
 *     for suboffset in result.from_slice.suboffsets[:ndim]:             # <<<<<<<<<<<<<<
 *         if suboffset >= 0:
 *             result.view.suboffsets = <Py_ssize_t *> result.from_slice.suboffsets
 */
  __pyx_t_7 = (__pyx_v_result->from_slice.suboffsets + __pyx_v_ndim);
  for (__pyx_t_8 = __pyx_v_result->from_slice.suboffsets; __pyx_t_8 < __pyx_t_7; __pyx_t_8++) {
    __pyx_t_6 = __pyx_t_8;
    __pyx_v_suboffset = (__pyx_t_6[0]);

    /* "View.MemoryView":1038
 *     result.view.suboffsets = NULL
 *     for suboffset in result.from_slice.suboffsets[:ndim]:
 *         if suboffset >= 0:             # <<<<<<<<<<<<<<
 *             result.view.suboffsets = <Py_ssize_t *> result.from_slice.suboffsets
 *             break
 */
    __pyx_t_1 = ((__pyx_v_suboffset >= 0) != 0);
    if (__pyx_t_1) {

      /* "View.MemoryView":1039
 *     for suboffset in result.from_slice.suboffsets[:ndim]:
 *         if suboffset >= 0:
 *             result.view.suboffsets = <Py_ssize_t *> result.from_slice.suboffsets             # <<<<<<<<<<<<<<
 *             break
 * 
 */
      __pyx_v_result->__pyx_base.view.suboffsets = ((Py_ssize_t *)__pyx_v_result->from_slice.suboffsets);

      /* "View.MemoryView":1040
 *         if suboffset >= 0:
 *             result.view.suboffsets = <Py_ssize_t *> result.from_slice.suboffsets
 *             break             # <<<<<<<<<<<<<<
 * 
 *     result.view.len = result.view.itemsize
 */
      goto __pyx_L6_break;

      /* "View.MemoryView":1038
 *     result.view.suboffsets = NULL
 *     for suboffset in result.from_slice.suboffsets[:ndim]:
 *         if suboffset >= 0:             # <<<<<<<<<<<<<<
 *             result.view.suboffsets = <Py_ssize_t *> result.from_slice.suboffsets
 *             break
 */
    }
  }
  __pyx_L6_break:;

  /* "View.MemoryView":1042
 *             break
 * 
 *     result.view.len = result.view.itemsize             # <<<<<<<<<<<<<<
 *     for length in result.view.shape[:ndim]:
 *         result.view.len *= length
 */
  __pyx_t_9 = __pyx_v_result->__pyx_base.view.itemsize;
  __pyx_v_result->__pyx_base.view.len = __pyx_t_9;

  /* "View.MemoryView":1043
 * 
 *     result.view.len = result.view.itemsize
 *     for length in result.view.shape[:ndim]:             # <<<<<<<<<<<<<<
 *         result.view.len *= length
 * 
 */
  __pyx_t_7 = (__pyx_v_result->__pyx_base.view.shape + __pyx_v_ndim);
  for (__pyx_t_8 = __pyx_v_result->__pyx_base.view.shape; __pyx_t_8 < __pyx_t_7; __pyx_t_8++) {
    __pyx_t_6 = __pyx_t_8;
    __pyx_t_2 = PyInt_FromSsize_t((__pyx_t_6[0])); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 1043, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __Pyx_XDECREF_SET(__pyx_v_length, __pyx_t_2);
    __pyx_t_2 = 0;

    /* "View.MemoryView":1044
 *     result.view.len = result.view.itemsize
 *     for length in result.view.shape[:ndim]:
 *         result.view.len *= length             # <<<<<<<<<<<<<<
 * 
 *     result.to_object_func = to_object_func
 */
    __pyx_t_2 = PyInt_FromSsize_t(__pyx_v_result->__pyx_base.view.len); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 1044, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __pyx_t_3 = PyNumber_InPlaceMultiply(__pyx_t_2, __pyx_v_length); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 1044, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    __pyx_t_9 = __Pyx_PyIndex_AsSsize_t(__pyx_t_3); if (unlikely((__pyx_t_9 == (Py_ssize_t)-1) && PyErr_Occurred())) __PYX_ERR(2, 1044, __pyx_L1_error)
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __pyx_v_result->__pyx_base.view.len = __pyx_t_9;
  }

  /* "View.MemoryView":1046
 *         result.view.len *= length
 * 
 *     result.to_object_func = to_object_func             # <<<<<<<<<<<<<<
 *     result.to_dtype_func = to_dtype_func
 * 
 */
  __pyx_v_result->to_object_func = __pyx_v_to_object_func;

  /* "View.MemoryView":1047
 * 
 *     result.to_object_func = to_object_func
 *     result.to_dtype_func = to_dtype_func             # <<<<<<<<<<<<<<
 * 
 *     return result
 */
  __pyx_v_result->to_dtype_func = __pyx_v_to_dtype_func;

  /* "View.MemoryView":1049
 *     result.to_dtype_func = to_dtype_func
 * 
 *     return result             # <<<<<<<<<<<<<<
 * 
 * @cname('__pyx_memoryview_get_slice_from_memoryview')
 */
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(((PyObject *)__pyx_v_result));
  __pyx_r = ((PyObject *)__pyx_v_result);
  goto __pyx_L0;

  /* "View.MemoryView":999
 * 
 * @cname('__pyx_memoryview_fromslice')
 * cdef memoryview_fromslice(__Pyx_memviewslice memviewslice,             # <<<<<<<<<<<<<<
 *                           int ndim,
 *                           object (*to_object_func)(char *),
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_AddTraceback("View.MemoryView.memoryview_fromslice", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XDECREF((PyObject *)__pyx_v_result);
  __Pyx_XDECREF(__pyx_v_length);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":1052
 * 
 * @cname('__pyx_memoryview_get_slice_from_memoryview')
 * cdef __Pyx_memviewslice *get_slice_from_memview(memoryview memview,             # <<<<<<<<<<<<<<
 *                                                    __Pyx_memviewslice *mslice) except NULL:
 *     cdef _memoryviewslice obj
 */

static __Pyx_memviewslice *__pyx_memoryview_get_slice_from_memoryview(struct __pyx_memoryview_obj *__pyx_v_memview, __Pyx_memviewslice *__pyx_v_mslice) {
  struct __pyx_memoryviewslice_obj *__pyx_v_obj = 0;
  __Pyx_memviewslice *__pyx_r;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  int __pyx_t_2;
  PyObject *__pyx_t_3 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("get_slice_from_memview", 0);

  /* "View.MemoryView":1055
 *                                                    __Pyx_memviewslice *mslice) except NULL:
 *     cdef _memoryviewslice obj
 *     if isinstance(memview, _memoryviewslice):             # <<<<<<<<<<<<<<
 *         obj = memview
 *         return &obj.from_slice
 */
  __pyx_t_1 = __Pyx_TypeCheck(((PyObject *)__pyx_v_memview), __pyx_memoryviewslice_type); 
  __pyx_t_2 = (__pyx_t_1 != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":1056
 *     cdef _memoryviewslice obj
 *     if isinstance(memview, _memoryviewslice):
 *         obj = memview             # <<<<<<<<<<<<<<
 *         return &obj.from_slice
 *     else:
 */
    if (!(likely(((((PyObject *)__pyx_v_memview)) == Py_None) || likely(__Pyx_TypeTest(((PyObject *)__pyx_v_memview), __pyx_memoryviewslice_type))))) __PYX_ERR(2, 1056, __pyx_L1_error)
    __pyx_t_3 = ((PyObject *)__pyx_v_memview);
    __Pyx_INCREF(__pyx_t_3);
    __pyx_v_obj = ((struct __pyx_memoryviewslice_obj *)__pyx_t_3);
    __pyx_t_3 = 0;

    /* "View.MemoryView":1057
 *     if isinstance(memview, _memoryviewslice):
 *         obj = memview
 *         return &obj.from_slice             # <<<<<<<<<<<<<<
 *     else:
 *         slice_copy(memview, mslice)
 */
    __pyx_r = (&__pyx_v_obj->from_slice);
    goto __pyx_L0;

    /* "View.MemoryView":1055
 *                                                    __Pyx_memviewslice *mslice) except NULL:
 *     cdef _memoryviewslice obj
 *     if isinstance(memview, _memoryviewslice):             # <<<<<<<<<<<<<<
 *         obj = memview
 *         return &obj.from_slice
 */
  }

  /* "View.MemoryView":1059
 *         return &obj.from_slice
 *     else:
 *         slice_copy(memview, mslice)             # <<<<<<<<<<<<<<
 *         return mslice
 * 
 */
  /*else*/ {
    __pyx_memoryview_slice_copy(__pyx_v_memview, __pyx_v_mslice);

    /* "View.MemoryView":1060
 *     else:
 *         slice_copy(memview, mslice)
 *         return mslice             # <<<<<<<<<<<<<<
 * 
 * @cname('__pyx_memoryview_slice_copy')
 */
    __pyx_r = __pyx_v_mslice;
    goto __pyx_L0;
  }

  /* "View.MemoryView":1052
 * 
 * @cname('__pyx_memoryview_get_slice_from_memoryview')
 * cdef __Pyx_memviewslice *get_slice_from_memview(memoryview memview,             # <<<<<<<<<<<<<<
 *                                                    __Pyx_memviewslice *mslice) except NULL:
 *     cdef _memoryviewslice obj
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_AddTraceback("View.MemoryView.get_slice_from_memview", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF((PyObject *)__pyx_v_obj);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":1063
 * 
 * @cname('__pyx_memoryview_slice_copy')
 * cdef void slice_copy(memoryview memview, __Pyx_memviewslice *dst):             # <<<<<<<<<<<<<<
 *     cdef int dim
 *     cdef (Py_ssize_t*) shape, strides, suboffsets
 */

static void __pyx_memoryview_slice_copy(struct __pyx_memoryview_obj *__pyx_v_memview, __Pyx_memviewslice *__pyx_v_dst) {
  int __pyx_v_dim;
  Py_ssize_t *__pyx_v_shape;
  Py_ssize_t *__pyx_v_strides;
  Py_ssize_t *__pyx_v_suboffsets;
  __Pyx_RefNannyDeclarations
  Py_ssize_t *__pyx_t_1;
  int __pyx_t_2;
  int __pyx_t_3;
  int __pyx_t_4;
  Py_ssize_t __pyx_t_5;
  __Pyx_RefNannySetupContext("slice_copy", 0);

  /* "View.MemoryView":1067
 *     cdef (Py_ssize_t*) shape, strides, suboffsets
 * 
 *     shape = memview.view.shape             # <<<<<<<<<<<<<<
 *     strides = memview.view.strides
 *     suboffsets = memview.view.suboffsets
 */
  __pyx_t_1 = __pyx_v_memview->view.shape;
  __pyx_v_shape = __pyx_t_1;

  /* "View.MemoryView":1068
 * 
 *     shape = memview.view.shape
 *     strides = memview.view.strides             # <<<<<<<<<<<<<<
 *     suboffsets = memview.view.suboffsets
 * 
 */
  __pyx_t_1 = __pyx_v_memview->view.strides;
  __pyx_v_strides = __pyx_t_1;

  /* "View.MemoryView":1069
 *     shape = memview.view.shape
 *     strides = memview.view.strides
 *     suboffsets = memview.view.suboffsets             # <<<<<<<<<<<<<<
 * 
 *     dst.memview = <__pyx_memoryview *> memview
 */
  __pyx_t_1 = __pyx_v_memview->view.suboffsets;
  __pyx_v_suboffsets = __pyx_t_1;

  /* "View.MemoryView":1071
 *     suboffsets = memview.view.suboffsets
 * 
 *     dst.memview = <__pyx_memoryview *> memview             # <<<<<<<<<<<<<<
 *     dst.data = <char *> memview.view.buf
 * 
 */
  __pyx_v_dst->memview = ((struct __pyx_memoryview_obj *)__pyx_v_memview);

  /* "View.MemoryView":1072
 * 
 *     dst.memview = <__pyx_memoryview *> memview
 *     dst.data = <char *> memview.view.buf             # <<<<<<<<<<<<<<
 * 
 *     for dim in range(memview.view.ndim):
 */
  __pyx_v_dst->data = ((char *)__pyx_v_memview->view.buf);

  /* "View.MemoryView":1074
 *     dst.data = <char *> memview.view.buf
 * 
 *     for dim in range(memview.view.ndim):             # <<<<<<<<<<<<<<
 *         dst.shape[dim] = shape[dim]
 *         dst.strides[dim] = strides[dim]
 */
  __pyx_t_2 = __pyx_v_memview->view.ndim;
  __pyx_t_3 = __pyx_t_2;
  for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) {
    __pyx_v_dim = __pyx_t_4;

    /* "View.MemoryView":1075
 * 
 *     for dim in range(memview.view.ndim):
 *         dst.shape[dim] = shape[dim]             # <<<<<<<<<<<<<<
 *         dst.strides[dim] = strides[dim]
 *         dst.suboffsets[dim] = suboffsets[dim] if suboffsets else -1
 */
    (__pyx_v_dst->shape[__pyx_v_dim]) = (__pyx_v_shape[__pyx_v_dim]);

    /* "View.MemoryView":1076
 *     for dim in range(memview.view.ndim):
 *         dst.shape[dim] = shape[dim]
 *         dst.strides[dim] = strides[dim]             # <<<<<<<<<<<<<<
 *         dst.suboffsets[dim] = suboffsets[dim] if suboffsets else -1
 * 
 */
    (__pyx_v_dst->strides[__pyx_v_dim]) = (__pyx_v_strides[__pyx_v_dim]);

    /* "View.MemoryView":1077
 *         dst.shape[dim] = shape[dim]
 *         dst.strides[dim] = strides[dim]
 *         dst.suboffsets[dim] = suboffsets[dim] if suboffsets else -1             # <<<<<<<<<<<<<<
 * 
 * @cname('__pyx_memoryview_copy_object')
 */
    if ((__pyx_v_suboffsets != 0)) {
      __pyx_t_5 = (__pyx_v_suboffsets[__pyx_v_dim]);
    } else {
      __pyx_t_5 = -1L;
    }
    (__pyx_v_dst->suboffsets[__pyx_v_dim]) = __pyx_t_5;
  }

  /* "View.MemoryView":1063
 * 
 * @cname('__pyx_memoryview_slice_copy')
 * cdef void slice_copy(memoryview memview, __Pyx_memviewslice *dst):             # <<<<<<<<<<<<<<
 *     cdef int dim
 *     cdef (Py_ssize_t*) shape, strides, suboffsets
 */

  /* function exit code */
  __Pyx_RefNannyFinishContext();
}

/* "View.MemoryView":1080
 * 
 * @cname('__pyx_memoryview_copy_object')
 * cdef memoryview_copy(memoryview memview):             # <<<<<<<<<<<<<<
 *     "Create a new memoryview object"
 *     cdef __Pyx_memviewslice memviewslice
 */

static PyObject *__pyx_memoryview_copy_object(struct __pyx_memoryview_obj *__pyx_v_memview) {
  __Pyx_memviewslice __pyx_v_memviewslice;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("memoryview_copy", 0);

  /* "View.MemoryView":1083
 *     "Create a new memoryview object"
 *     cdef __Pyx_memviewslice memviewslice
 *     slice_copy(memview, &memviewslice)             # <<<<<<<<<<<<<<
 *     return memoryview_copy_from_slice(memview, &memviewslice)
 * 
 */
  __pyx_memoryview_slice_copy(__pyx_v_memview, (&__pyx_v_memviewslice));

  /* "View.MemoryView":1084
 *     cdef __Pyx_memviewslice memviewslice
 *     slice_copy(memview, &memviewslice)
 *     return memoryview_copy_from_slice(memview, &memviewslice)             # <<<<<<<<<<<<<<
 * 
 * @cname('__pyx_memoryview_copy_object_from_slice')
 */
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_1 = __pyx_memoryview_copy_object_from_slice(__pyx_v_memview, (&__pyx_v_memviewslice)); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 1084, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_r = __pyx_t_1;
  __pyx_t_1 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":1080
 * 
 * @cname('__pyx_memoryview_copy_object')
 * cdef memoryview_copy(memoryview memview):             # <<<<<<<<<<<<<<
 *     "Create a new memoryview object"
 *     cdef __Pyx_memviewslice memviewslice
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_AddTraceback("View.MemoryView.memoryview_copy", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":1087
 * 
 * @cname('__pyx_memoryview_copy_object_from_slice')
 * cdef memoryview_copy_from_slice(memoryview memview, __Pyx_memviewslice *memviewslice):             # <<<<<<<<<<<<<<
 *     """
 *     Create a new memoryview object from a given memoryview object and slice.
 */

static PyObject *__pyx_memoryview_copy_object_from_slice(struct __pyx_memoryview_obj *__pyx_v_memview, __Pyx_memviewslice *__pyx_v_memviewslice) {
  PyObject *(*__pyx_v_to_object_func)(char *);
  int (*__pyx_v_to_dtype_func)(char *, PyObject *);
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  int __pyx_t_2;
  PyObject *(*__pyx_t_3)(char *);
  int (*__pyx_t_4)(char *, PyObject *);
  PyObject *__pyx_t_5 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("memoryview_copy_from_slice", 0);

  /* "View.MemoryView":1094
 *     cdef int (*to_dtype_func)(char *, object) except 0
 * 
 *     if isinstance(memview, _memoryviewslice):             # <<<<<<<<<<<<<<
 *         to_object_func = (<_memoryviewslice> memview).to_object_func
 *         to_dtype_func = (<_memoryviewslice> memview).to_dtype_func
 */
  __pyx_t_1 = __Pyx_TypeCheck(((PyObject *)__pyx_v_memview), __pyx_memoryviewslice_type); 
  __pyx_t_2 = (__pyx_t_1 != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":1095
 * 
 *     if isinstance(memview, _memoryviewslice):
 *         to_object_func = (<_memoryviewslice> memview).to_object_func             # <<<<<<<<<<<<<<
 *         to_dtype_func = (<_memoryviewslice> memview).to_dtype_func
 *     else:
 */
    __pyx_t_3 = ((struct __pyx_memoryviewslice_obj *)__pyx_v_memview)->to_object_func;
    __pyx_v_to_object_func = __pyx_t_3;

    /* "View.MemoryView":1096
 *     if isinstance(memview, _memoryviewslice):
 *         to_object_func = (<_memoryviewslice> memview).to_object_func
 *         to_dtype_func = (<_memoryviewslice> memview).to_dtype_func             # <<<<<<<<<<<<<<
 *     else:
 *         to_object_func = NULL
 */
    __pyx_t_4 = ((struct __pyx_memoryviewslice_obj *)__pyx_v_memview)->to_dtype_func;
    __pyx_v_to_dtype_func = __pyx_t_4;

    /* "View.MemoryView":1094
 *     cdef int (*to_dtype_func)(char *, object) except 0
 * 
 *     if isinstance(memview, _memoryviewslice):             # <<<<<<<<<<<<<<
 *         to_object_func = (<_memoryviewslice> memview).to_object_func
 *         to_dtype_func = (<_memoryviewslice> memview).to_dtype_func
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":1098
 *         to_dtype_func = (<_memoryviewslice> memview).to_dtype_func
 *     else:
 *         to_object_func = NULL             # <<<<<<<<<<<<<<
 *         to_dtype_func = NULL
 * 
 */
  /*else*/ {
    __pyx_v_to_object_func = NULL;

    /* "View.MemoryView":1099
 *     else:
 *         to_object_func = NULL
 *         to_dtype_func = NULL             # <<<<<<<<<<<<<<
 * 
 *     return memoryview_fromslice(memviewslice[0], memview.view.ndim,
 */
    __pyx_v_to_dtype_func = NULL;
  }
  __pyx_L3:;

  /* "View.MemoryView":1101
 *         to_dtype_func = NULL
 * 
 *     return memoryview_fromslice(memviewslice[0], memview.view.ndim,             # <<<<<<<<<<<<<<
 *                                 to_object_func, to_dtype_func,
 *                                 memview.dtype_is_object)
 */
  __Pyx_XDECREF(__pyx_r);

  /* "View.MemoryView":1103
 *     return memoryview_fromslice(memviewslice[0], memview.view.ndim,
 *                                 to_object_func, to_dtype_func,
 *                                 memview.dtype_is_object)             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_t_5 = __pyx_memoryview_fromslice((__pyx_v_memviewslice[0]), __pyx_v_memview->view.ndim, __pyx_v_to_object_func, __pyx_v_to_dtype_func, __pyx_v_memview->dtype_is_object); if (unlikely(!__pyx_t_5)) __PYX_ERR(2, 1101, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_5);
  __pyx_r = __pyx_t_5;
  __pyx_t_5 = 0;
  goto __pyx_L0;

  /* "View.MemoryView":1087
 * 
 * @cname('__pyx_memoryview_copy_object_from_slice')
 * cdef memoryview_copy_from_slice(memoryview memview, __Pyx_memviewslice *memviewslice):             # <<<<<<<<<<<<<<
 *     """
 *     Create a new memoryview object from a given memoryview object and slice.
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_AddTraceback("View.MemoryView.memoryview_copy_from_slice", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "View.MemoryView":1109
 * 
 * 
 * cdef Py_ssize_t abs_py_ssize_t(Py_ssize_t arg) nogil:             # <<<<<<<<<<<<<<
 *     if arg < 0:
 *         return -arg
 */

static Py_ssize_t abs_py_ssize_t(Py_ssize_t __pyx_v_arg) {
  Py_ssize_t __pyx_r;
  int __pyx_t_1;

  /* "View.MemoryView":1110
 * 
 * cdef Py_ssize_t abs_py_ssize_t(Py_ssize_t arg) nogil:
 *     if arg < 0:             # <<<<<<<<<<<<<<
 *         return -arg
 *     else:
 */
  __pyx_t_1 = ((__pyx_v_arg < 0) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":1111
 * cdef Py_ssize_t abs_py_ssize_t(Py_ssize_t arg) nogil:
 *     if arg < 0:
 *         return -arg             # <<<<<<<<<<<<<<
 *     else:
 *         return arg
 */
    __pyx_r = (-__pyx_v_arg);
    goto __pyx_L0;

    /* "View.MemoryView":1110
 * 
 * cdef Py_ssize_t abs_py_ssize_t(Py_ssize_t arg) nogil:
 *     if arg < 0:             # <<<<<<<<<<<<<<
 *         return -arg
 *     else:
 */
  }

  /* "View.MemoryView":1113
 *         return -arg
 *     else:
 *         return arg             # <<<<<<<<<<<<<<
 * 
 * @cname('__pyx_get_best_slice_order')
 */
  /*else*/ {
    __pyx_r = __pyx_v_arg;
    goto __pyx_L0;
  }

  /* "View.MemoryView":1109
 * 
 * 
 * cdef Py_ssize_t abs_py_ssize_t(Py_ssize_t arg) nogil:             # <<<<<<<<<<<<<<
 *     if arg < 0:
 *         return -arg
 */

  /* function exit code */
  __pyx_L0:;
  return __pyx_r;
}

/* "View.MemoryView":1116
 * 
 * @cname('__pyx_get_best_slice_order')
 * cdef char get_best_order(__Pyx_memviewslice *mslice, int ndim) nogil:             # <<<<<<<<<<<<<<
 *     """
 *     Figure out the best memory access order for a given slice.
 */

static char __pyx_get_best_slice_order(__Pyx_memviewslice *__pyx_v_mslice, int __pyx_v_ndim) {
  int __pyx_v_i;
  Py_ssize_t __pyx_v_c_stride;
  Py_ssize_t __pyx_v_f_stride;
  char __pyx_r;
  int __pyx_t_1;
  int __pyx_t_2;
  int __pyx_t_3;
  int __pyx_t_4;

  /* "View.MemoryView":1121
 *     """
 *     cdef int i
 *     cdef Py_ssize_t c_stride = 0             # <<<<<<<<<<<<<<
 *     cdef Py_ssize_t f_stride = 0
 * 
 */
  __pyx_v_c_stride = 0;

  /* "View.MemoryView":1122
 *     cdef int i
 *     cdef Py_ssize_t c_stride = 0
 *     cdef Py_ssize_t f_stride = 0             # <<<<<<<<<<<<<<
 * 
 *     for i in range(ndim - 1, -1, -1):
 */
  __pyx_v_f_stride = 0;

  /* "View.MemoryView":1124
 *     cdef Py_ssize_t f_stride = 0
 * 
 *     for i in range(ndim - 1, -1, -1):             # <<<<<<<<<<<<<<
 *         if mslice.shape[i] > 1:
 *             c_stride = mslice.strides[i]
 */
  for (__pyx_t_1 = (__pyx_v_ndim - 1); __pyx_t_1 > -1; __pyx_t_1-=1) {
    __pyx_v_i = __pyx_t_1;

    /* "View.MemoryView":1125
 * 
 *     for i in range(ndim - 1, -1, -1):
 *         if mslice.shape[i] > 1:             # <<<<<<<<<<<<<<
 *             c_stride = mslice.strides[i]
 *             break
 */
    __pyx_t_2 = (((__pyx_v_mslice->shape[__pyx_v_i]) > 1) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":1126
 *     for i in range(ndim - 1, -1, -1):
 *         if mslice.shape[i] > 1:
 *             c_stride = mslice.strides[i]             # <<<<<<<<<<<<<<
 *             break
 * 
 */
      __pyx_v_c_stride = (__pyx_v_mslice->strides[__pyx_v_i]);

      /* "View.MemoryView":1127
 *         if mslice.shape[i] > 1:
 *             c_stride = mslice.strides[i]
 *             break             # <<<<<<<<<<<<<<
 * 
 *     for i in range(ndim):
 */
      goto __pyx_L4_break;

      /* "View.MemoryView":1125
 * 
 *     for i in range(ndim - 1, -1, -1):
 *         if mslice.shape[i] > 1:             # <<<<<<<<<<<<<<
 *             c_stride = mslice.strides[i]
 *             break
 */
    }
  }
  __pyx_L4_break:;

  /* "View.MemoryView":1129
 *             break
 * 
 *     for i in range(ndim):             # <<<<<<<<<<<<<<
 *         if mslice.shape[i] > 1:
 *             f_stride = mslice.strides[i]
 */
  __pyx_t_1 = __pyx_v_ndim;
  __pyx_t_3 = __pyx_t_1;
  for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) {
    __pyx_v_i = __pyx_t_4;

    /* "View.MemoryView":1130
 * 
 *     for i in range(ndim):
 *         if mslice.shape[i] > 1:             # <<<<<<<<<<<<<<
 *             f_stride = mslice.strides[i]
 *             break
 */
    __pyx_t_2 = (((__pyx_v_mslice->shape[__pyx_v_i]) > 1) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":1131
 *     for i in range(ndim):
 *         if mslice.shape[i] > 1:
 *             f_stride = mslice.strides[i]             # <<<<<<<<<<<<<<
 *             break
 * 
 */
      __pyx_v_f_stride = (__pyx_v_mslice->strides[__pyx_v_i]);

      /* "View.MemoryView":1132
 *         if mslice.shape[i] > 1:
 *             f_stride = mslice.strides[i]
 *             break             # <<<<<<<<<<<<<<
 * 
 *     if abs_py_ssize_t(c_stride) <= abs_py_ssize_t(f_stride):
 */
      goto __pyx_L7_break;

      /* "View.MemoryView":1130
 * 
 *     for i in range(ndim):
 *         if mslice.shape[i] > 1:             # <<<<<<<<<<<<<<
 *             f_stride = mslice.strides[i]
 *             break
 */
    }
  }
  __pyx_L7_break:;

  /* "View.MemoryView":1134
 *             break
 * 
 *     if abs_py_ssize_t(c_stride) <= abs_py_ssize_t(f_stride):             # <<<<<<<<<<<<<<
 *         return 'C'
 *     else:
 */
  __pyx_t_2 = ((abs_py_ssize_t(__pyx_v_c_stride) <= abs_py_ssize_t(__pyx_v_f_stride)) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":1135
 * 
 *     if abs_py_ssize_t(c_stride) <= abs_py_ssize_t(f_stride):
 *         return 'C'             # <<<<<<<<<<<<<<
 *     else:
 *         return 'F'
 */
    __pyx_r = 'C';
    goto __pyx_L0;

    /* "View.MemoryView":1134
 *             break
 * 
 *     if abs_py_ssize_t(c_stride) <= abs_py_ssize_t(f_stride):             # <<<<<<<<<<<<<<
 *         return 'C'
 *     else:
 */
  }

  /* "View.MemoryView":1137
 *         return 'C'
 *     else:
 *         return 'F'             # <<<<<<<<<<<<<<
 * 
 * @cython.cdivision(True)
 */
  /*else*/ {
    __pyx_r = 'F';
    goto __pyx_L0;
  }

  /* "View.MemoryView":1116
 * 
 * @cname('__pyx_get_best_slice_order')
 * cdef char get_best_order(__Pyx_memviewslice *mslice, int ndim) nogil:             # <<<<<<<<<<<<<<
 *     """
 *     Figure out the best memory access order for a given slice.
 */

  /* function exit code */
  __pyx_L0:;
  return __pyx_r;
}

/* "View.MemoryView":1140
 * 
 * @cython.cdivision(True)
 * cdef void _copy_strided_to_strided(char *src_data, Py_ssize_t *src_strides,             # <<<<<<<<<<<<<<
 *                                    char *dst_data, Py_ssize_t *dst_strides,
 *                                    Py_ssize_t *src_shape, Py_ssize_t *dst_shape,
 */

static void _copy_strided_to_strided(char *__pyx_v_src_data, Py_ssize_t *__pyx_v_src_strides, char *__pyx_v_dst_data, Py_ssize_t *__pyx_v_dst_strides, Py_ssize_t *__pyx_v_src_shape, Py_ssize_t *__pyx_v_dst_shape, int __pyx_v_ndim, size_t __pyx_v_itemsize) {
  CYTHON_UNUSED Py_ssize_t __pyx_v_i;
  CYTHON_UNUSED Py_ssize_t __pyx_v_src_extent;
  Py_ssize_t __pyx_v_dst_extent;
  Py_ssize_t __pyx_v_src_stride;
  Py_ssize_t __pyx_v_dst_stride;
  int __pyx_t_1;
  int __pyx_t_2;
  int __pyx_t_3;
  Py_ssize_t __pyx_t_4;
  Py_ssize_t __pyx_t_5;
  Py_ssize_t __pyx_t_6;

  /* "View.MemoryView":1147
 * 
 *     cdef Py_ssize_t i
 *     cdef Py_ssize_t src_extent = src_shape[0]             # <<<<<<<<<<<<<<
 *     cdef Py_ssize_t dst_extent = dst_shape[0]
 *     cdef Py_ssize_t src_stride = src_strides[0]
 */
  __pyx_v_src_extent = (__pyx_v_src_shape[0]);

  /* "View.MemoryView":1148
 *     cdef Py_ssize_t i
 *     cdef Py_ssize_t src_extent = src_shape[0]
 *     cdef Py_ssize_t dst_extent = dst_shape[0]             # <<<<<<<<<<<<<<
 *     cdef Py_ssize_t src_stride = src_strides[0]
 *     cdef Py_ssize_t dst_stride = dst_strides[0]
 */
  __pyx_v_dst_extent = (__pyx_v_dst_shape[0]);

  /* "View.MemoryView":1149
 *     cdef Py_ssize_t src_extent = src_shape[0]
 *     cdef Py_ssize_t dst_extent = dst_shape[0]
 *     cdef Py_ssize_t src_stride = src_strides[0]             # <<<<<<<<<<<<<<
 *     cdef Py_ssize_t dst_stride = dst_strides[0]
 * 
 */
  __pyx_v_src_stride = (__pyx_v_src_strides[0]);

  /* "View.MemoryView":1150
 *     cdef Py_ssize_t dst_extent = dst_shape[0]
 *     cdef Py_ssize_t src_stride = src_strides[0]
 *     cdef Py_ssize_t dst_stride = dst_strides[0]             # <<<<<<<<<<<<<<
 * 
 *     if ndim == 1:
 */
  __pyx_v_dst_stride = (__pyx_v_dst_strides[0]);

  /* "View.MemoryView":1152
 *     cdef Py_ssize_t dst_stride = dst_strides[0]
 * 
 *     if ndim == 1:             # <<<<<<<<<<<<<<
 *        if (src_stride > 0 and dst_stride > 0 and
 *            <size_t> src_stride == itemsize == <size_t> dst_stride):
 */
  __pyx_t_1 = ((__pyx_v_ndim == 1) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":1153
 * 
 *     if ndim == 1:
 *        if (src_stride > 0 and dst_stride > 0 and             # <<<<<<<<<<<<<<
 *            <size_t> src_stride == itemsize == <size_t> dst_stride):
 *            memcpy(dst_data, src_data, itemsize * dst_extent)
 */
    __pyx_t_2 = ((__pyx_v_src_stride > 0) != 0);
    if (__pyx_t_2) {
    } else {
      __pyx_t_1 = __pyx_t_2;
      goto __pyx_L5_bool_binop_done;
    }
    __pyx_t_2 = ((__pyx_v_dst_stride > 0) != 0);
    if (__pyx_t_2) {
    } else {
      __pyx_t_1 = __pyx_t_2;
      goto __pyx_L5_bool_binop_done;
    }

    /* "View.MemoryView":1154
 *     if ndim == 1:
 *        if (src_stride > 0 and dst_stride > 0 and
 *            <size_t> src_stride == itemsize == <size_t> dst_stride):             # <<<<<<<<<<<<<<
 *            memcpy(dst_data, src_data, itemsize * dst_extent)
 *        else:
 */
    __pyx_t_2 = (((size_t)__pyx_v_src_stride) == __pyx_v_itemsize);
    if (__pyx_t_2) {
      __pyx_t_2 = (__pyx_v_itemsize == ((size_t)__pyx_v_dst_stride));
    }
    __pyx_t_3 = (__pyx_t_2 != 0);
    __pyx_t_1 = __pyx_t_3;
    __pyx_L5_bool_binop_done:;

    /* "View.MemoryView":1153
 * 
 *     if ndim == 1:
 *        if (src_stride > 0 and dst_stride > 0 and             # <<<<<<<<<<<<<<
 *            <size_t> src_stride == itemsize == <size_t> dst_stride):
 *            memcpy(dst_data, src_data, itemsize * dst_extent)
 */
    if (__pyx_t_1) {

      /* "View.MemoryView":1155
 *        if (src_stride > 0 and dst_stride > 0 and
 *            <size_t> src_stride == itemsize == <size_t> dst_stride):
 *            memcpy(dst_data, src_data, itemsize * dst_extent)             # <<<<<<<<<<<<<<
 *        else:
 *            for i in range(dst_extent):
 */
      (void)(memcpy(__pyx_v_dst_data, __pyx_v_src_data, (__pyx_v_itemsize * __pyx_v_dst_extent)));

      /* "View.MemoryView":1153
 * 
 *     if ndim == 1:
 *        if (src_stride > 0 and dst_stride > 0 and             # <<<<<<<<<<<<<<
 *            <size_t> src_stride == itemsize == <size_t> dst_stride):
 *            memcpy(dst_data, src_data, itemsize * dst_extent)
 */
      goto __pyx_L4;
    }

    /* "View.MemoryView":1157
 *            memcpy(dst_data, src_data, itemsize * dst_extent)
 *        else:
 *            for i in range(dst_extent):             # <<<<<<<<<<<<<<
 *                memcpy(dst_data, src_data, itemsize)
 *                src_data += src_stride
 */
    /*else*/ {
      __pyx_t_4 = __pyx_v_dst_extent;
      __pyx_t_5 = __pyx_t_4;
      for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) {
        __pyx_v_i = __pyx_t_6;

        /* "View.MemoryView":1158
 *        else:
 *            for i in range(dst_extent):
 *                memcpy(dst_data, src_data, itemsize)             # <<<<<<<<<<<<<<
 *                src_data += src_stride
 *                dst_data += dst_stride
 */
        (void)(memcpy(__pyx_v_dst_data, __pyx_v_src_data, __pyx_v_itemsize));

        /* "View.MemoryView":1159
 *            for i in range(dst_extent):
 *                memcpy(dst_data, src_data, itemsize)
 *                src_data += src_stride             # <<<<<<<<<<<<<<
 *                dst_data += dst_stride
 *     else:
 */
        __pyx_v_src_data = (__pyx_v_src_data + __pyx_v_src_stride);

        /* "View.MemoryView":1160
 *                memcpy(dst_data, src_data, itemsize)
 *                src_data += src_stride
 *                dst_data += dst_stride             # <<<<<<<<<<<<<<
 *     else:
 *         for i in range(dst_extent):
 */
        __pyx_v_dst_data = (__pyx_v_dst_data + __pyx_v_dst_stride);
      }
    }
    __pyx_L4:;

    /* "View.MemoryView":1152
 *     cdef Py_ssize_t dst_stride = dst_strides[0]
 * 
 *     if ndim == 1:             # <<<<<<<<<<<<<<
 *        if (src_stride > 0 and dst_stride > 0 and
 *            <size_t> src_stride == itemsize == <size_t> dst_stride):
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":1162
 *                dst_data += dst_stride
 *     else:
 *         for i in range(dst_extent):             # <<<<<<<<<<<<<<
 *             _copy_strided_to_strided(src_data, src_strides + 1,
 *                                      dst_data, dst_strides + 1,
 */
  /*else*/ {
    __pyx_t_4 = __pyx_v_dst_extent;
    __pyx_t_5 = __pyx_t_4;
    for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) {
      __pyx_v_i = __pyx_t_6;

      /* "View.MemoryView":1163
 *     else:
 *         for i in range(dst_extent):
 *             _copy_strided_to_strided(src_data, src_strides + 1,             # <<<<<<<<<<<<<<
 *                                      dst_data, dst_strides + 1,
 *                                      src_shape + 1, dst_shape + 1,
 */
      _copy_strided_to_strided(__pyx_v_src_data, (__pyx_v_src_strides + 1), __pyx_v_dst_data, (__pyx_v_dst_strides + 1), (__pyx_v_src_shape + 1), (__pyx_v_dst_shape + 1), (__pyx_v_ndim - 1), __pyx_v_itemsize);

      /* "View.MemoryView":1167
 *                                      src_shape + 1, dst_shape + 1,
 *                                      ndim - 1, itemsize)
 *             src_data += src_stride             # <<<<<<<<<<<<<<
 *             dst_data += dst_stride
 * 
 */
      __pyx_v_src_data = (__pyx_v_src_data + __pyx_v_src_stride);

      /* "View.MemoryView":1168
 *                                      ndim - 1, itemsize)
 *             src_data += src_stride
 *             dst_data += dst_stride             # <<<<<<<<<<<<<<
 * 
 * cdef void copy_strided_to_strided(__Pyx_memviewslice *src,
 */
      __pyx_v_dst_data = (__pyx_v_dst_data + __pyx_v_dst_stride);
    }
  }
  __pyx_L3:;

  /* "View.MemoryView":1140
 * 
 * @cython.cdivision(True)
 * cdef void _copy_strided_to_strided(char *src_data, Py_ssize_t *src_strides,             # <<<<<<<<<<<<<<
 *                                    char *dst_data, Py_ssize_t *dst_strides,
 *                                    Py_ssize_t *src_shape, Py_ssize_t *dst_shape,
 */

  /* function exit code */
}

/* "View.MemoryView":1170
 *             dst_data += dst_stride
 * 
 * cdef void copy_strided_to_strided(__Pyx_memviewslice *src,             # <<<<<<<<<<<<<<
 *                                   __Pyx_memviewslice *dst,
 *                                   int ndim, size_t itemsize) nogil:
 */

static void copy_strided_to_strided(__Pyx_memviewslice *__pyx_v_src, __Pyx_memviewslice *__pyx_v_dst, int __pyx_v_ndim, size_t __pyx_v_itemsize) {

  /* "View.MemoryView":1173
 *                                   __Pyx_memviewslice *dst,
 *                                   int ndim, size_t itemsize) nogil:
 *     _copy_strided_to_strided(src.data, src.strides, dst.data, dst.strides,             # <<<<<<<<<<<<<<
 *                              src.shape, dst.shape, ndim, itemsize)
 * 
 */
  _copy_strided_to_strided(__pyx_v_src->data, __pyx_v_src->strides, __pyx_v_dst->data, __pyx_v_dst->strides, __pyx_v_src->shape, __pyx_v_dst->shape, __pyx_v_ndim, __pyx_v_itemsize);

  /* "View.MemoryView":1170
 *             dst_data += dst_stride
 * 
 * cdef void copy_strided_to_strided(__Pyx_memviewslice *src,             # <<<<<<<<<<<<<<
 *                                   __Pyx_memviewslice *dst,
 *                                   int ndim, size_t itemsize) nogil:
 */

  /* function exit code */
}

/* "View.MemoryView":1177
 * 
 * @cname('__pyx_memoryview_slice_get_size')
 * cdef Py_ssize_t slice_get_size(__Pyx_memviewslice *src, int ndim) nogil:             # <<<<<<<<<<<<<<
 *     "Return the size of the memory occupied by the slice in number of bytes"
 *     cdef Py_ssize_t shape, size = src.memview.view.itemsize
 */

static Py_ssize_t __pyx_memoryview_slice_get_size(__Pyx_memviewslice *__pyx_v_src, int __pyx_v_ndim) {
  Py_ssize_t __pyx_v_shape;
  Py_ssize_t __pyx_v_size;
  Py_ssize_t __pyx_r;
  Py_ssize_t __pyx_t_1;
  Py_ssize_t *__pyx_t_2;
  Py_ssize_t *__pyx_t_3;
  Py_ssize_t *__pyx_t_4;

  /* "View.MemoryView":1179
 * cdef Py_ssize_t slice_get_size(__Pyx_memviewslice *src, int ndim) nogil:
 *     "Return the size of the memory occupied by the slice in number of bytes"
 *     cdef Py_ssize_t shape, size = src.memview.view.itemsize             # <<<<<<<<<<<<<<
 * 
 *     for shape in src.shape[:ndim]:
 */
  __pyx_t_1 = __pyx_v_src->memview->view.itemsize;
  __pyx_v_size = __pyx_t_1;

  /* "View.MemoryView":1181
 *     cdef Py_ssize_t shape, size = src.memview.view.itemsize
 * 
 *     for shape in src.shape[:ndim]:             # <<<<<<<<<<<<<<
 *         size *= shape
 * 
 */
  __pyx_t_3 = (__pyx_v_src->shape + __pyx_v_ndim);
  for (__pyx_t_4 = __pyx_v_src->shape; __pyx_t_4 < __pyx_t_3; __pyx_t_4++) {
    __pyx_t_2 = __pyx_t_4;
    __pyx_v_shape = (__pyx_t_2[0]);

    /* "View.MemoryView":1182
 * 
 *     for shape in src.shape[:ndim]:
 *         size *= shape             # <<<<<<<<<<<<<<
 * 
 *     return size
 */
    __pyx_v_size = (__pyx_v_size * __pyx_v_shape);
  }

  /* "View.MemoryView":1184
 *         size *= shape
 * 
 *     return size             # <<<<<<<<<<<<<<
 * 
 * @cname('__pyx_fill_contig_strides_array')
 */
  __pyx_r = __pyx_v_size;
  goto __pyx_L0;

  /* "View.MemoryView":1177
 * 
 * @cname('__pyx_memoryview_slice_get_size')
 * cdef Py_ssize_t slice_get_size(__Pyx_memviewslice *src, int ndim) nogil:             # <<<<<<<<<<<<<<
 *     "Return the size of the memory occupied by the slice in number of bytes"
 *     cdef Py_ssize_t shape, size = src.memview.view.itemsize
 */

  /* function exit code */
  __pyx_L0:;
  return __pyx_r;
}

/* "View.MemoryView":1187
 * 
 * @cname('__pyx_fill_contig_strides_array')
 * cdef Py_ssize_t fill_contig_strides_array(             # <<<<<<<<<<<<<<
 *                 Py_ssize_t *shape, Py_ssize_t *strides, Py_ssize_t stride,
 *                 int ndim, char order) nogil:
 */

static Py_ssize_t __pyx_fill_contig_strides_array(Py_ssize_t *__pyx_v_shape, Py_ssize_t *__pyx_v_strides, Py_ssize_t __pyx_v_stride, int __pyx_v_ndim, char __pyx_v_order) {
  int __pyx_v_idx;
  Py_ssize_t __pyx_r;
  int __pyx_t_1;
  int __pyx_t_2;
  int __pyx_t_3;
  int __pyx_t_4;

  /* "View.MemoryView":1196
 *     cdef int idx
 * 
 *     if order == 'F':             # <<<<<<<<<<<<<<
 *         for idx in range(ndim):
 *             strides[idx] = stride
 */
  __pyx_t_1 = ((__pyx_v_order == 'F') != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":1197
 * 
 *     if order == 'F':
 *         for idx in range(ndim):             # <<<<<<<<<<<<<<
 *             strides[idx] = stride
 *             stride *= shape[idx]
 */
    __pyx_t_2 = __pyx_v_ndim;
    __pyx_t_3 = __pyx_t_2;
    for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) {
      __pyx_v_idx = __pyx_t_4;

      /* "View.MemoryView":1198
 *     if order == 'F':
 *         for idx in range(ndim):
 *             strides[idx] = stride             # <<<<<<<<<<<<<<
 *             stride *= shape[idx]
 *     else:
 */
      (__pyx_v_strides[__pyx_v_idx]) = __pyx_v_stride;

      /* "View.MemoryView":1199
 *         for idx in range(ndim):
 *             strides[idx] = stride
 *             stride *= shape[idx]             # <<<<<<<<<<<<<<
 *     else:
 *         for idx in range(ndim - 1, -1, -1):
 */
      __pyx_v_stride = (__pyx_v_stride * (__pyx_v_shape[__pyx_v_idx]));
    }

    /* "View.MemoryView":1196
 *     cdef int idx
 * 
 *     if order == 'F':             # <<<<<<<<<<<<<<
 *         for idx in range(ndim):
 *             strides[idx] = stride
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":1201
 *             stride *= shape[idx]
 *     else:
 *         for idx in range(ndim - 1, -1, -1):             # <<<<<<<<<<<<<<
 *             strides[idx] = stride
 *             stride *= shape[idx]
 */
  /*else*/ {
    for (__pyx_t_2 = (__pyx_v_ndim - 1); __pyx_t_2 > -1; __pyx_t_2-=1) {
      __pyx_v_idx = __pyx_t_2;

      /* "View.MemoryView":1202
 *     else:
 *         for idx in range(ndim - 1, -1, -1):
 *             strides[idx] = stride             # <<<<<<<<<<<<<<
 *             stride *= shape[idx]
 * 
 */
      (__pyx_v_strides[__pyx_v_idx]) = __pyx_v_stride;

      /* "View.MemoryView":1203
 *         for idx in range(ndim - 1, -1, -1):
 *             strides[idx] = stride
 *             stride *= shape[idx]             # <<<<<<<<<<<<<<
 * 
 *     return stride
 */
      __pyx_v_stride = (__pyx_v_stride * (__pyx_v_shape[__pyx_v_idx]));
    }
  }
  __pyx_L3:;

  /* "View.MemoryView":1205
 *             stride *= shape[idx]
 * 
 *     return stride             # <<<<<<<<<<<<<<
 * 
 * @cname('__pyx_memoryview_copy_data_to_temp')
 */
  __pyx_r = __pyx_v_stride;
  goto __pyx_L0;

  /* "View.MemoryView":1187
 * 
 * @cname('__pyx_fill_contig_strides_array')
 * cdef Py_ssize_t fill_contig_strides_array(             # <<<<<<<<<<<<<<
 *                 Py_ssize_t *shape, Py_ssize_t *strides, Py_ssize_t stride,
 *                 int ndim, char order) nogil:
 */

  /* function exit code */
  __pyx_L0:;
  return __pyx_r;
}

/* "View.MemoryView":1208
 * 
 * @cname('__pyx_memoryview_copy_data_to_temp')
 * cdef void *copy_data_to_temp(__Pyx_memviewslice *src,             # <<<<<<<<<<<<<<
 *                              __Pyx_memviewslice *tmpslice,
 *                              char order,
 */

static void *__pyx_memoryview_copy_data_to_temp(__Pyx_memviewslice *__pyx_v_src, __Pyx_memviewslice *__pyx_v_tmpslice, char __pyx_v_order, int __pyx_v_ndim) {
  int __pyx_v_i;
  void *__pyx_v_result;
  size_t __pyx_v_itemsize;
  size_t __pyx_v_size;
  void *__pyx_r;
  Py_ssize_t __pyx_t_1;
  int __pyx_t_2;
  int __pyx_t_3;
  struct __pyx_memoryview_obj *__pyx_t_4;
  int __pyx_t_5;
  int __pyx_t_6;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;

  /* "View.MemoryView":1219
 *     cdef void *result
 * 
 *     cdef size_t itemsize = src.memview.view.itemsize             # <<<<<<<<<<<<<<
 *     cdef size_t size = slice_get_size(src, ndim)
 * 
 */
  __pyx_t_1 = __pyx_v_src->memview->view.itemsize;
  __pyx_v_itemsize = __pyx_t_1;

  /* "View.MemoryView":1220
 * 
 *     cdef size_t itemsize = src.memview.view.itemsize
 *     cdef size_t size = slice_get_size(src, ndim)             # <<<<<<<<<<<<<<
 * 
 *     result = malloc(size)
 */
  __pyx_v_size = __pyx_memoryview_slice_get_size(__pyx_v_src, __pyx_v_ndim);

  /* "View.MemoryView":1222
 *     cdef size_t size = slice_get_size(src, ndim)
 * 
 *     result = malloc(size)             # <<<<<<<<<<<<<<
 *     if not result:
 *         _err(MemoryError, NULL)
 */
  __pyx_v_result = malloc(__pyx_v_size);

  /* "View.MemoryView":1223
 * 
 *     result = malloc(size)
 *     if not result:             # <<<<<<<<<<<<<<
 *         _err(MemoryError, NULL)
 * 
 */
  __pyx_t_2 = ((!(__pyx_v_result != 0)) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":1224
 *     result = malloc(size)
 *     if not result:
 *         _err(MemoryError, NULL)             # <<<<<<<<<<<<<<
 * 
 * 
 */
    __pyx_t_3 = __pyx_memoryview_err(__pyx_builtin_MemoryError, NULL); if (unlikely(__pyx_t_3 == ((int)-1))) __PYX_ERR(2, 1224, __pyx_L1_error)

    /* "View.MemoryView":1223
 * 
 *     result = malloc(size)
 *     if not result:             # <<<<<<<<<<<<<<
 *         _err(MemoryError, NULL)
 * 
 */
  }

  /* "View.MemoryView":1227
 * 
 * 
 *     tmpslice.data = <char *> result             # <<<<<<<<<<<<<<
 *     tmpslice.memview = src.memview
 *     for i in range(ndim):
 */
  __pyx_v_tmpslice->data = ((char *)__pyx_v_result);

  /* "View.MemoryView":1228
 * 
 *     tmpslice.data = <char *> result
 *     tmpslice.memview = src.memview             # <<<<<<<<<<<<<<
 *     for i in range(ndim):
 *         tmpslice.shape[i] = src.shape[i]
 */
  __pyx_t_4 = __pyx_v_src->memview;
  __pyx_v_tmpslice->memview = __pyx_t_4;

  /* "View.MemoryView":1229
 *     tmpslice.data = <char *> result
 *     tmpslice.memview = src.memview
 *     for i in range(ndim):             # <<<<<<<<<<<<<<
 *         tmpslice.shape[i] = src.shape[i]
 *         tmpslice.suboffsets[i] = -1
 */
  __pyx_t_3 = __pyx_v_ndim;
  __pyx_t_5 = __pyx_t_3;
  for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) {
    __pyx_v_i = __pyx_t_6;

    /* "View.MemoryView":1230
 *     tmpslice.memview = src.memview
 *     for i in range(ndim):
 *         tmpslice.shape[i] = src.shape[i]             # <<<<<<<<<<<<<<
 *         tmpslice.suboffsets[i] = -1
 * 
 */
    (__pyx_v_tmpslice->shape[__pyx_v_i]) = (__pyx_v_src->shape[__pyx_v_i]);

    /* "View.MemoryView":1231
 *     for i in range(ndim):
 *         tmpslice.shape[i] = src.shape[i]
 *         tmpslice.suboffsets[i] = -1             # <<<<<<<<<<<<<<
 * 
 *     fill_contig_strides_array(&tmpslice.shape[0], &tmpslice.strides[0], itemsize,
 */
    (__pyx_v_tmpslice->suboffsets[__pyx_v_i]) = -1L;
  }

  /* "View.MemoryView":1233
 *         tmpslice.suboffsets[i] = -1
 * 
 *     fill_contig_strides_array(&tmpslice.shape[0], &tmpslice.strides[0], itemsize,             # <<<<<<<<<<<<<<
 *                               ndim, order)
 * 
 */
  (void)(__pyx_fill_contig_strides_array((&(__pyx_v_tmpslice->shape[0])), (&(__pyx_v_tmpslice->strides[0])), __pyx_v_itemsize, __pyx_v_ndim, __pyx_v_order));

  /* "View.MemoryView":1237
 * 
 * 
 *     for i in range(ndim):             # <<<<<<<<<<<<<<
 *         if tmpslice.shape[i] == 1:
 *             tmpslice.strides[i] = 0
 */
  __pyx_t_3 = __pyx_v_ndim;
  __pyx_t_5 = __pyx_t_3;
  for (__pyx_t_6 = 0; __pyx_t_6 < __pyx_t_5; __pyx_t_6+=1) {
    __pyx_v_i = __pyx_t_6;

    /* "View.MemoryView":1238
 * 
 *     for i in range(ndim):
 *         if tmpslice.shape[i] == 1:             # <<<<<<<<<<<<<<
 *             tmpslice.strides[i] = 0
 * 
 */
    __pyx_t_2 = (((__pyx_v_tmpslice->shape[__pyx_v_i]) == 1) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":1239
 *     for i in range(ndim):
 *         if tmpslice.shape[i] == 1:
 *             tmpslice.strides[i] = 0             # <<<<<<<<<<<<<<
 * 
 *     if slice_is_contig(src[0], order, ndim):
 */
      (__pyx_v_tmpslice->strides[__pyx_v_i]) = 0;

      /* "View.MemoryView":1238
 * 
 *     for i in range(ndim):
 *         if tmpslice.shape[i] == 1:             # <<<<<<<<<<<<<<
 *             tmpslice.strides[i] = 0
 * 
 */
    }
  }

  /* "View.MemoryView":1241
 *             tmpslice.strides[i] = 0
 * 
 *     if slice_is_contig(src[0], order, ndim):             # <<<<<<<<<<<<<<
 *         memcpy(result, src.data, size)
 *     else:
 */
  __pyx_t_2 = (__pyx_memviewslice_is_contig((__pyx_v_src[0]), __pyx_v_order, __pyx_v_ndim) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":1242
 * 
 *     if slice_is_contig(src[0], order, ndim):
 *         memcpy(result, src.data, size)             # <<<<<<<<<<<<<<
 *     else:
 *         copy_strided_to_strided(src, tmpslice, ndim, itemsize)
 */
    (void)(memcpy(__pyx_v_result, __pyx_v_src->data, __pyx_v_size));

    /* "View.MemoryView":1241
 *             tmpslice.strides[i] = 0
 * 
 *     if slice_is_contig(src[0], order, ndim):             # <<<<<<<<<<<<<<
 *         memcpy(result, src.data, size)
 *     else:
 */
    goto __pyx_L9;
  }

  /* "View.MemoryView":1244
 *         memcpy(result, src.data, size)
 *     else:
 *         copy_strided_to_strided(src, tmpslice, ndim, itemsize)             # <<<<<<<<<<<<<<
 * 
 *     return result
 */
  /*else*/ {
    copy_strided_to_strided(__pyx_v_src, __pyx_v_tmpslice, __pyx_v_ndim, __pyx_v_itemsize);
  }
  __pyx_L9:;

  /* "View.MemoryView":1246
 *         copy_strided_to_strided(src, tmpslice, ndim, itemsize)
 * 
 *     return result             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_r = __pyx_v_result;
  goto __pyx_L0;

  /* "View.MemoryView":1208
 * 
 * @cname('__pyx_memoryview_copy_data_to_temp')
 * cdef void *copy_data_to_temp(__Pyx_memviewslice *src,             # <<<<<<<<<<<<<<
 *                              __Pyx_memviewslice *tmpslice,
 *                              char order,
 */

  /* function exit code */
  __pyx_L1_error:;
  {
    #ifdef WITH_THREAD
    PyGILState_STATE __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
    #endif
    __Pyx_AddTraceback("View.MemoryView.copy_data_to_temp", __pyx_clineno, __pyx_lineno, __pyx_filename);
    #ifdef WITH_THREAD
    __Pyx_PyGILState_Release(__pyx_gilstate_save);
    #endif
  }
  __pyx_r = NULL;
  __pyx_L0:;
  return __pyx_r;
}

/* "View.MemoryView":1251
 * 
 * @cname('__pyx_memoryview_err_extents')
 * cdef int _err_extents(int i, Py_ssize_t extent1,             # <<<<<<<<<<<<<<
 *                              Py_ssize_t extent2) except -1 with gil:
 *     raise ValueError("got differing extents in dimension %d (got %d and %d)" %
 */

static int __pyx_memoryview_err_extents(int __pyx_v_i, Py_ssize_t __pyx_v_extent1, Py_ssize_t __pyx_v_extent2) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  PyObject *__pyx_t_4 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  #ifdef WITH_THREAD
  PyGILState_STATE __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
  #endif
  __Pyx_RefNannySetupContext("_err_extents", 0);

  /* "View.MemoryView":1254
 *                              Py_ssize_t extent2) except -1 with gil:
 *     raise ValueError("got differing extents in dimension %d (got %d and %d)" %
 *                                                         (i, extent1, extent2))             # <<<<<<<<<<<<<<
 * 
 * @cname('__pyx_memoryview_err_dim')
 */
  __pyx_t_1 = __Pyx_PyInt_From_int(__pyx_v_i); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 1254, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_t_2 = PyInt_FromSsize_t(__pyx_v_extent1); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 1254, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = PyInt_FromSsize_t(__pyx_v_extent2); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 1254, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_t_4 = PyTuple_New(3); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 1254, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __Pyx_GIVEREF(__pyx_t_1);
  PyTuple_SET_ITEM(__pyx_t_4, 0, __pyx_t_1);
  __Pyx_GIVEREF(__pyx_t_2);
  PyTuple_SET_ITEM(__pyx_t_4, 1, __pyx_t_2);
  __Pyx_GIVEREF(__pyx_t_3);
  PyTuple_SET_ITEM(__pyx_t_4, 2, __pyx_t_3);
  __pyx_t_1 = 0;
  __pyx_t_2 = 0;
  __pyx_t_3 = 0;

  /* "View.MemoryView":1253
 * cdef int _err_extents(int i, Py_ssize_t extent1,
 *                              Py_ssize_t extent2) except -1 with gil:
 *     raise ValueError("got differing extents in dimension %d (got %d and %d)" %             # <<<<<<<<<<<<<<
 *                                                         (i, extent1, extent2))
 * 
 */
  __pyx_t_3 = __Pyx_PyString_Format(__pyx_kp_s_got_differing_extents_in_dimensi, __pyx_t_4); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 1253, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  __pyx_t_4 = __Pyx_PyObject_CallOneArg(__pyx_builtin_ValueError, __pyx_t_3); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 1253, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_Raise(__pyx_t_4, 0, 0, 0);
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  __PYX_ERR(2, 1253, __pyx_L1_error)

  /* "View.MemoryView":1251
 * 
 * @cname('__pyx_memoryview_err_extents')
 * cdef int _err_extents(int i, Py_ssize_t extent1,             # <<<<<<<<<<<<<<
 *                              Py_ssize_t extent2) except -1 with gil:
 *     raise ValueError("got differing extents in dimension %d (got %d and %d)" %
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_AddTraceback("View.MemoryView._err_extents", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = -1;
  __Pyx_RefNannyFinishContext();
  #ifdef WITH_THREAD
  __Pyx_PyGILState_Release(__pyx_gilstate_save);
  #endif
  return __pyx_r;
}

/* "View.MemoryView":1257
 * 
 * @cname('__pyx_memoryview_err_dim')
 * cdef int _err_dim(object error, char *msg, int dim) except -1 with gil:             # <<<<<<<<<<<<<<
 *     raise error(msg.decode('ascii') % dim)
 * 
 */

static int __pyx_memoryview_err_dim(PyObject *__pyx_v_error, char *__pyx_v_msg, int __pyx_v_dim) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  PyObject *__pyx_t_4 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  #ifdef WITH_THREAD
  PyGILState_STATE __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
  #endif
  __Pyx_RefNannySetupContext("_err_dim", 0);
  __Pyx_INCREF(__pyx_v_error);

  /* "View.MemoryView":1258
 * @cname('__pyx_memoryview_err_dim')
 * cdef int _err_dim(object error, char *msg, int dim) except -1 with gil:
 *     raise error(msg.decode('ascii') % dim)             # <<<<<<<<<<<<<<
 * 
 * @cname('__pyx_memoryview_err')
 */
  __pyx_t_2 = __Pyx_decode_c_string(__pyx_v_msg, 0, strlen(__pyx_v_msg), NULL, NULL, PyUnicode_DecodeASCII); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 1258, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_3 = __Pyx_PyInt_From_int(__pyx_v_dim); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 1258, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __pyx_t_4 = PyUnicode_Format(__pyx_t_2, __pyx_t_3); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 1258, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_4);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_INCREF(__pyx_v_error);
  __pyx_t_3 = __pyx_v_error; __pyx_t_2 = NULL;
  if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_3))) {
    __pyx_t_2 = PyMethod_GET_SELF(__pyx_t_3);
    if (likely(__pyx_t_2)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_3);
      __Pyx_INCREF(__pyx_t_2);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_3, function);
    }
  }
  __pyx_t_1 = (__pyx_t_2) ? __Pyx_PyObject_Call2Args(__pyx_t_3, __pyx_t_2, __pyx_t_4) : __Pyx_PyObject_CallOneArg(__pyx_t_3, __pyx_t_4);
  __Pyx_XDECREF(__pyx_t_2); __pyx_t_2 = 0;
  __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
  if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 1258, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
  __Pyx_Raise(__pyx_t_1, 0, 0, 0);
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __PYX_ERR(2, 1258, __pyx_L1_error)

  /* "View.MemoryView":1257
 * 
 * @cname('__pyx_memoryview_err_dim')
 * cdef int _err_dim(object error, char *msg, int dim) except -1 with gil:             # <<<<<<<<<<<<<<
 *     raise error(msg.decode('ascii') % dim)
 * 
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_AddTraceback("View.MemoryView._err_dim", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = -1;
  __Pyx_XDECREF(__pyx_v_error);
  __Pyx_RefNannyFinishContext();
  #ifdef WITH_THREAD
  __Pyx_PyGILState_Release(__pyx_gilstate_save);
  #endif
  return __pyx_r;
}

/* "View.MemoryView":1261
 * 
 * @cname('__pyx_memoryview_err')
 * cdef int _err(object error, char *msg) except -1 with gil:             # <<<<<<<<<<<<<<
 *     if msg != NULL:
 *         raise error(msg.decode('ascii'))
 */

static int __pyx_memoryview_err(PyObject *__pyx_v_error, char *__pyx_v_msg) {
  int __pyx_r;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  PyObject *__pyx_t_4 = NULL;
  PyObject *__pyx_t_5 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  #ifdef WITH_THREAD
  PyGILState_STATE __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
  #endif
  __Pyx_RefNannySetupContext("_err", 0);
  __Pyx_INCREF(__pyx_v_error);

  /* "View.MemoryView":1262
 * @cname('__pyx_memoryview_err')
 * cdef int _err(object error, char *msg) except -1 with gil:
 *     if msg != NULL:             # <<<<<<<<<<<<<<
 *         raise error(msg.decode('ascii'))
 *     else:
 */
  __pyx_t_1 = ((__pyx_v_msg != NULL) != 0);
  if (unlikely(__pyx_t_1)) {

    /* "View.MemoryView":1263
 * cdef int _err(object error, char *msg) except -1 with gil:
 *     if msg != NULL:
 *         raise error(msg.decode('ascii'))             # <<<<<<<<<<<<<<
 *     else:
 *         raise error
 */
    __pyx_t_3 = __Pyx_decode_c_string(__pyx_v_msg, 0, strlen(__pyx_v_msg), NULL, NULL, PyUnicode_DecodeASCII); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 1263, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_INCREF(__pyx_v_error);
    __pyx_t_4 = __pyx_v_error; __pyx_t_5 = NULL;
    if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_4))) {
      __pyx_t_5 = PyMethod_GET_SELF(__pyx_t_4);
      if (likely(__pyx_t_5)) {
        PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_4);
        __Pyx_INCREF(__pyx_t_5);
        __Pyx_INCREF(function);
        __Pyx_DECREF_SET(__pyx_t_4, function);
      }
    }
    __pyx_t_2 = (__pyx_t_5) ? __Pyx_PyObject_Call2Args(__pyx_t_4, __pyx_t_5, __pyx_t_3) : __Pyx_PyObject_CallOneArg(__pyx_t_4, __pyx_t_3);
    __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0;
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 1263, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
    __Pyx_Raise(__pyx_t_2, 0, 0, 0);
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    __PYX_ERR(2, 1263, __pyx_L1_error)

    /* "View.MemoryView":1262
 * @cname('__pyx_memoryview_err')
 * cdef int _err(object error, char *msg) except -1 with gil:
 *     if msg != NULL:             # <<<<<<<<<<<<<<
 *         raise error(msg.decode('ascii'))
 *     else:
 */
  }

  /* "View.MemoryView":1265
 *         raise error(msg.decode('ascii'))
 *     else:
 *         raise error             # <<<<<<<<<<<<<<
 * 
 * @cname('__pyx_memoryview_copy_contents')
 */
  /*else*/ {
    __Pyx_Raise(__pyx_v_error, 0, 0, 0);
    __PYX_ERR(2, 1265, __pyx_L1_error)
  }

  /* "View.MemoryView":1261
 * 
 * @cname('__pyx_memoryview_err')
 * cdef int _err(object error, char *msg) except -1 with gil:             # <<<<<<<<<<<<<<
 *     if msg != NULL:
 *         raise error(msg.decode('ascii'))
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_AddTraceback("View.MemoryView._err", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = -1;
  __Pyx_XDECREF(__pyx_v_error);
  __Pyx_RefNannyFinishContext();
  #ifdef WITH_THREAD
  __Pyx_PyGILState_Release(__pyx_gilstate_save);
  #endif
  return __pyx_r;
}

/* "View.MemoryView":1268
 * 
 * @cname('__pyx_memoryview_copy_contents')
 * cdef int memoryview_copy_contents(__Pyx_memviewslice src,             # <<<<<<<<<<<<<<
 *                                   __Pyx_memviewslice dst,
 *                                   int src_ndim, int dst_ndim,
 */

static int __pyx_memoryview_copy_contents(__Pyx_memviewslice __pyx_v_src, __Pyx_memviewslice __pyx_v_dst, int __pyx_v_src_ndim, int __pyx_v_dst_ndim, int __pyx_v_dtype_is_object) {
  void *__pyx_v_tmpdata;
  size_t __pyx_v_itemsize;
  int __pyx_v_i;
  char __pyx_v_order;
  int __pyx_v_broadcasting;
  int __pyx_v_direct_copy;
  __Pyx_memviewslice __pyx_v_tmp;
  int __pyx_v_ndim;
  int __pyx_r;
  Py_ssize_t __pyx_t_1;
  int __pyx_t_2;
  int __pyx_t_3;
  int __pyx_t_4;
  int __pyx_t_5;
  int __pyx_t_6;
  void *__pyx_t_7;
  int __pyx_t_8;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;

  /* "View.MemoryView":1276
 *     Check for overlapping memory and verify the shapes.
 *     """
 *     cdef void *tmpdata = NULL             # <<<<<<<<<<<<<<
 *     cdef size_t itemsize = src.memview.view.itemsize
 *     cdef int i
 */
  __pyx_v_tmpdata = NULL;

  /* "View.MemoryView":1277
 *     """
 *     cdef void *tmpdata = NULL
 *     cdef size_t itemsize = src.memview.view.itemsize             # <<<<<<<<<<<<<<
 *     cdef int i
 *     cdef char order = get_best_order(&src, src_ndim)
 */
  __pyx_t_1 = __pyx_v_src.memview->view.itemsize;
  __pyx_v_itemsize = __pyx_t_1;

  /* "View.MemoryView":1279
 *     cdef size_t itemsize = src.memview.view.itemsize
 *     cdef int i
 *     cdef char order = get_best_order(&src, src_ndim)             # <<<<<<<<<<<<<<
 *     cdef bint broadcasting = False
 *     cdef bint direct_copy = False
 */
  __pyx_v_order = __pyx_get_best_slice_order((&__pyx_v_src), __pyx_v_src_ndim);

  /* "View.MemoryView":1280
 *     cdef int i
 *     cdef char order = get_best_order(&src, src_ndim)
 *     cdef bint broadcasting = False             # <<<<<<<<<<<<<<
 *     cdef bint direct_copy = False
 *     cdef __Pyx_memviewslice tmp
 */
  __pyx_v_broadcasting = 0;

  /* "View.MemoryView":1281
 *     cdef char order = get_best_order(&src, src_ndim)
 *     cdef bint broadcasting = False
 *     cdef bint direct_copy = False             # <<<<<<<<<<<<<<
 *     cdef __Pyx_memviewslice tmp
 * 
 */
  __pyx_v_direct_copy = 0;

  /* "View.MemoryView":1284
 *     cdef __Pyx_memviewslice tmp
 * 
 *     if src_ndim < dst_ndim:             # <<<<<<<<<<<<<<
 *         broadcast_leading(&src, src_ndim, dst_ndim)
 *     elif dst_ndim < src_ndim:
 */
  __pyx_t_2 = ((__pyx_v_src_ndim < __pyx_v_dst_ndim) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":1285
 * 
 *     if src_ndim < dst_ndim:
 *         broadcast_leading(&src, src_ndim, dst_ndim)             # <<<<<<<<<<<<<<
 *     elif dst_ndim < src_ndim:
 *         broadcast_leading(&dst, dst_ndim, src_ndim)
 */
    __pyx_memoryview_broadcast_leading((&__pyx_v_src), __pyx_v_src_ndim, __pyx_v_dst_ndim);

    /* "View.MemoryView":1284
 *     cdef __Pyx_memviewslice tmp
 * 
 *     if src_ndim < dst_ndim:             # <<<<<<<<<<<<<<
 *         broadcast_leading(&src, src_ndim, dst_ndim)
 *     elif dst_ndim < src_ndim:
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":1286
 *     if src_ndim < dst_ndim:
 *         broadcast_leading(&src, src_ndim, dst_ndim)
 *     elif dst_ndim < src_ndim:             # <<<<<<<<<<<<<<
 *         broadcast_leading(&dst, dst_ndim, src_ndim)
 * 
 */
  __pyx_t_2 = ((__pyx_v_dst_ndim < __pyx_v_src_ndim) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":1287
 *         broadcast_leading(&src, src_ndim, dst_ndim)
 *     elif dst_ndim < src_ndim:
 *         broadcast_leading(&dst, dst_ndim, src_ndim)             # <<<<<<<<<<<<<<
 * 
 *     cdef int ndim = max(src_ndim, dst_ndim)
 */
    __pyx_memoryview_broadcast_leading((&__pyx_v_dst), __pyx_v_dst_ndim, __pyx_v_src_ndim);

    /* "View.MemoryView":1286
 *     if src_ndim < dst_ndim:
 *         broadcast_leading(&src, src_ndim, dst_ndim)
 *     elif dst_ndim < src_ndim:             # <<<<<<<<<<<<<<
 *         broadcast_leading(&dst, dst_ndim, src_ndim)
 * 
 */
  }
  __pyx_L3:;

  /* "View.MemoryView":1289
 *         broadcast_leading(&dst, dst_ndim, src_ndim)
 * 
 *     cdef int ndim = max(src_ndim, dst_ndim)             # <<<<<<<<<<<<<<
 * 
 *     for i in range(ndim):
 */
  __pyx_t_3 = __pyx_v_dst_ndim;
  __pyx_t_4 = __pyx_v_src_ndim;
  if (((__pyx_t_3 > __pyx_t_4) != 0)) {
    __pyx_t_5 = __pyx_t_3;
  } else {
    __pyx_t_5 = __pyx_t_4;
  }
  __pyx_v_ndim = __pyx_t_5;

  /* "View.MemoryView":1291
 *     cdef int ndim = max(src_ndim, dst_ndim)
 * 
 *     for i in range(ndim):             # <<<<<<<<<<<<<<
 *         if src.shape[i] != dst.shape[i]:
 *             if src.shape[i] == 1:
 */
  __pyx_t_5 = __pyx_v_ndim;
  __pyx_t_3 = __pyx_t_5;
  for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) {
    __pyx_v_i = __pyx_t_4;

    /* "View.MemoryView":1292
 * 
 *     for i in range(ndim):
 *         if src.shape[i] != dst.shape[i]:             # <<<<<<<<<<<<<<
 *             if src.shape[i] == 1:
 *                 broadcasting = True
 */
    __pyx_t_2 = (((__pyx_v_src.shape[__pyx_v_i]) != (__pyx_v_dst.shape[__pyx_v_i])) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":1293
 *     for i in range(ndim):
 *         if src.shape[i] != dst.shape[i]:
 *             if src.shape[i] == 1:             # <<<<<<<<<<<<<<
 *                 broadcasting = True
 *                 src.strides[i] = 0
 */
      __pyx_t_2 = (((__pyx_v_src.shape[__pyx_v_i]) == 1) != 0);
      if (__pyx_t_2) {

        /* "View.MemoryView":1294
 *         if src.shape[i] != dst.shape[i]:
 *             if src.shape[i] == 1:
 *                 broadcasting = True             # <<<<<<<<<<<<<<
 *                 src.strides[i] = 0
 *             else:
 */
        __pyx_v_broadcasting = 1;

        /* "View.MemoryView":1295
 *             if src.shape[i] == 1:
 *                 broadcasting = True
 *                 src.strides[i] = 0             # <<<<<<<<<<<<<<
 *             else:
 *                 _err_extents(i, dst.shape[i], src.shape[i])
 */
        (__pyx_v_src.strides[__pyx_v_i]) = 0;

        /* "View.MemoryView":1293
 *     for i in range(ndim):
 *         if src.shape[i] != dst.shape[i]:
 *             if src.shape[i] == 1:             # <<<<<<<<<<<<<<
 *                 broadcasting = True
 *                 src.strides[i] = 0
 */
        goto __pyx_L7;
      }

      /* "View.MemoryView":1297
 *                 src.strides[i] = 0
 *             else:
 *                 _err_extents(i, dst.shape[i], src.shape[i])             # <<<<<<<<<<<<<<
 * 
 *         if src.suboffsets[i] >= 0:
 */
      /*else*/ {
        __pyx_t_6 = __pyx_memoryview_err_extents(__pyx_v_i, (__pyx_v_dst.shape[__pyx_v_i]), (__pyx_v_src.shape[__pyx_v_i])); if (unlikely(__pyx_t_6 == ((int)-1))) __PYX_ERR(2, 1297, __pyx_L1_error)
      }
      __pyx_L7:;

      /* "View.MemoryView":1292
 * 
 *     for i in range(ndim):
 *         if src.shape[i] != dst.shape[i]:             # <<<<<<<<<<<<<<
 *             if src.shape[i] == 1:
 *                 broadcasting = True
 */
    }

    /* "View.MemoryView":1299
 *                 _err_extents(i, dst.shape[i], src.shape[i])
 * 
 *         if src.suboffsets[i] >= 0:             # <<<<<<<<<<<<<<
 *             _err_dim(ValueError, "Dimension %d is not direct", i)
 * 
 */
    __pyx_t_2 = (((__pyx_v_src.suboffsets[__pyx_v_i]) >= 0) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":1300
 * 
 *         if src.suboffsets[i] >= 0:
 *             _err_dim(ValueError, "Dimension %d is not direct", i)             # <<<<<<<<<<<<<<
 * 
 *     if slices_overlap(&src, &dst, ndim, itemsize):
 */
      __pyx_t_6 = __pyx_memoryview_err_dim(__pyx_builtin_ValueError, ((char *)"Dimension %d is not direct"), __pyx_v_i); if (unlikely(__pyx_t_6 == ((int)-1))) __PYX_ERR(2, 1300, __pyx_L1_error)

      /* "View.MemoryView":1299
 *                 _err_extents(i, dst.shape[i], src.shape[i])
 * 
 *         if src.suboffsets[i] >= 0:             # <<<<<<<<<<<<<<
 *             _err_dim(ValueError, "Dimension %d is not direct", i)
 * 
 */
    }
  }

  /* "View.MemoryView":1302
 *             _err_dim(ValueError, "Dimension %d is not direct", i)
 * 
 *     if slices_overlap(&src, &dst, ndim, itemsize):             # <<<<<<<<<<<<<<
 * 
 *         if not slice_is_contig(src, order, ndim):
 */
  __pyx_t_2 = (__pyx_slices_overlap((&__pyx_v_src), (&__pyx_v_dst), __pyx_v_ndim, __pyx_v_itemsize) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":1304
 *     if slices_overlap(&src, &dst, ndim, itemsize):
 * 
 *         if not slice_is_contig(src, order, ndim):             # <<<<<<<<<<<<<<
 *             order = get_best_order(&dst, ndim)
 * 
 */
    __pyx_t_2 = ((!(__pyx_memviewslice_is_contig(__pyx_v_src, __pyx_v_order, __pyx_v_ndim) != 0)) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":1305
 * 
 *         if not slice_is_contig(src, order, ndim):
 *             order = get_best_order(&dst, ndim)             # <<<<<<<<<<<<<<
 * 
 *         tmpdata = copy_data_to_temp(&src, &tmp, order, ndim)
 */
      __pyx_v_order = __pyx_get_best_slice_order((&__pyx_v_dst), __pyx_v_ndim);

      /* "View.MemoryView":1304
 *     if slices_overlap(&src, &dst, ndim, itemsize):
 * 
 *         if not slice_is_contig(src, order, ndim):             # <<<<<<<<<<<<<<
 *             order = get_best_order(&dst, ndim)
 * 
 */
    }

    /* "View.MemoryView":1307
 *             order = get_best_order(&dst, ndim)
 * 
 *         tmpdata = copy_data_to_temp(&src, &tmp, order, ndim)             # <<<<<<<<<<<<<<
 *         src = tmp
 * 
 */
    __pyx_t_7 = __pyx_memoryview_copy_data_to_temp((&__pyx_v_src), (&__pyx_v_tmp), __pyx_v_order, __pyx_v_ndim); if (unlikely(__pyx_t_7 == ((void *)NULL))) __PYX_ERR(2, 1307, __pyx_L1_error)
    __pyx_v_tmpdata = __pyx_t_7;

    /* "View.MemoryView":1308
 * 
 *         tmpdata = copy_data_to_temp(&src, &tmp, order, ndim)
 *         src = tmp             # <<<<<<<<<<<<<<
 * 
 *     if not broadcasting:
 */
    __pyx_v_src = __pyx_v_tmp;

    /* "View.MemoryView":1302
 *             _err_dim(ValueError, "Dimension %d is not direct", i)
 * 
 *     if slices_overlap(&src, &dst, ndim, itemsize):             # <<<<<<<<<<<<<<
 * 
 *         if not slice_is_contig(src, order, ndim):
 */
  }

  /* "View.MemoryView":1310
 *         src = tmp
 * 
 *     if not broadcasting:             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_t_2 = ((!(__pyx_v_broadcasting != 0)) != 0);
  if (__pyx_t_2) {

    /* "View.MemoryView":1313
 * 
 * 
 *         if slice_is_contig(src, 'C', ndim):             # <<<<<<<<<<<<<<
 *             direct_copy = slice_is_contig(dst, 'C', ndim)
 *         elif slice_is_contig(src, 'F', ndim):
 */
    __pyx_t_2 = (__pyx_memviewslice_is_contig(__pyx_v_src, 'C', __pyx_v_ndim) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":1314
 * 
 *         if slice_is_contig(src, 'C', ndim):
 *             direct_copy = slice_is_contig(dst, 'C', ndim)             # <<<<<<<<<<<<<<
 *         elif slice_is_contig(src, 'F', ndim):
 *             direct_copy = slice_is_contig(dst, 'F', ndim)
 */
      __pyx_v_direct_copy = __pyx_memviewslice_is_contig(__pyx_v_dst, 'C', __pyx_v_ndim);

      /* "View.MemoryView":1313
 * 
 * 
 *         if slice_is_contig(src, 'C', ndim):             # <<<<<<<<<<<<<<
 *             direct_copy = slice_is_contig(dst, 'C', ndim)
 *         elif slice_is_contig(src, 'F', ndim):
 */
      goto __pyx_L12;
    }

    /* "View.MemoryView":1315
 *         if slice_is_contig(src, 'C', ndim):
 *             direct_copy = slice_is_contig(dst, 'C', ndim)
 *         elif slice_is_contig(src, 'F', ndim):             # <<<<<<<<<<<<<<
 *             direct_copy = slice_is_contig(dst, 'F', ndim)
 * 
 */
    __pyx_t_2 = (__pyx_memviewslice_is_contig(__pyx_v_src, 'F', __pyx_v_ndim) != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":1316
 *             direct_copy = slice_is_contig(dst, 'C', ndim)
 *         elif slice_is_contig(src, 'F', ndim):
 *             direct_copy = slice_is_contig(dst, 'F', ndim)             # <<<<<<<<<<<<<<
 * 
 *         if direct_copy:
 */
      __pyx_v_direct_copy = __pyx_memviewslice_is_contig(__pyx_v_dst, 'F', __pyx_v_ndim);

      /* "View.MemoryView":1315
 *         if slice_is_contig(src, 'C', ndim):
 *             direct_copy = slice_is_contig(dst, 'C', ndim)
 *         elif slice_is_contig(src, 'F', ndim):             # <<<<<<<<<<<<<<
 *             direct_copy = slice_is_contig(dst, 'F', ndim)
 * 
 */
    }
    __pyx_L12:;

    /* "View.MemoryView":1318
 *             direct_copy = slice_is_contig(dst, 'F', ndim)
 * 
 *         if direct_copy:             # <<<<<<<<<<<<<<
 * 
 *             refcount_copying(&dst, dtype_is_object, ndim, False)
 */
    __pyx_t_2 = (__pyx_v_direct_copy != 0);
    if (__pyx_t_2) {

      /* "View.MemoryView":1320
 *         if direct_copy:
 * 
 *             refcount_copying(&dst, dtype_is_object, ndim, False)             # <<<<<<<<<<<<<<
 *             memcpy(dst.data, src.data, slice_get_size(&src, ndim))
 *             refcount_copying(&dst, dtype_is_object, ndim, True)
 */
      __pyx_memoryview_refcount_copying((&__pyx_v_dst), __pyx_v_dtype_is_object, __pyx_v_ndim, 0);

      /* "View.MemoryView":1321
 * 
 *             refcount_copying(&dst, dtype_is_object, ndim, False)
 *             memcpy(dst.data, src.data, slice_get_size(&src, ndim))             # <<<<<<<<<<<<<<
 *             refcount_copying(&dst, dtype_is_object, ndim, True)
 *             free(tmpdata)
 */
      (void)(memcpy(__pyx_v_dst.data, __pyx_v_src.data, __pyx_memoryview_slice_get_size((&__pyx_v_src), __pyx_v_ndim)));

      /* "View.MemoryView":1322
 *             refcount_copying(&dst, dtype_is_object, ndim, False)
 *             memcpy(dst.data, src.data, slice_get_size(&src, ndim))
 *             refcount_copying(&dst, dtype_is_object, ndim, True)             # <<<<<<<<<<<<<<
 *             free(tmpdata)
 *             return 0
 */
      __pyx_memoryview_refcount_copying((&__pyx_v_dst), __pyx_v_dtype_is_object, __pyx_v_ndim, 1);

      /* "View.MemoryView":1323
 *             memcpy(dst.data, src.data, slice_get_size(&src, ndim))
 *             refcount_copying(&dst, dtype_is_object, ndim, True)
 *             free(tmpdata)             # <<<<<<<<<<<<<<
 *             return 0
 * 
 */
      free(__pyx_v_tmpdata);

      /* "View.MemoryView":1324
 *             refcount_copying(&dst, dtype_is_object, ndim, True)
 *             free(tmpdata)
 *             return 0             # <<<<<<<<<<<<<<
 * 
 *     if order == 'F' == get_best_order(&dst, ndim):
 */
      __pyx_r = 0;
      goto __pyx_L0;

      /* "View.MemoryView":1318
 *             direct_copy = slice_is_contig(dst, 'F', ndim)
 * 
 *         if direct_copy:             # <<<<<<<<<<<<<<
 * 
 *             refcount_copying(&dst, dtype_is_object, ndim, False)
 */
    }

    /* "View.MemoryView":1310
 *         src = tmp
 * 
 *     if not broadcasting:             # <<<<<<<<<<<<<<
 * 
 * 
 */
  }

  /* "View.MemoryView":1326
 *             return 0
 * 
 *     if order == 'F' == get_best_order(&dst, ndim):             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_t_2 = (__pyx_v_order == 'F');
  if (__pyx_t_2) {
    __pyx_t_2 = ('F' == __pyx_get_best_slice_order((&__pyx_v_dst), __pyx_v_ndim));
  }
  __pyx_t_8 = (__pyx_t_2 != 0);
  if (__pyx_t_8) {

    /* "View.MemoryView":1329
 * 
 * 
 *         transpose_memslice(&src)             # <<<<<<<<<<<<<<
 *         transpose_memslice(&dst)
 * 
 */
    __pyx_t_5 = __pyx_memslice_transpose((&__pyx_v_src)); if (unlikely(__pyx_t_5 == ((int)0))) __PYX_ERR(2, 1329, __pyx_L1_error)

    /* "View.MemoryView":1330
 * 
 *         transpose_memslice(&src)
 *         transpose_memslice(&dst)             # <<<<<<<<<<<<<<
 * 
 *     refcount_copying(&dst, dtype_is_object, ndim, False)
 */
    __pyx_t_5 = __pyx_memslice_transpose((&__pyx_v_dst)); if (unlikely(__pyx_t_5 == ((int)0))) __PYX_ERR(2, 1330, __pyx_L1_error)

    /* "View.MemoryView":1326
 *             return 0
 * 
 *     if order == 'F' == get_best_order(&dst, ndim):             # <<<<<<<<<<<<<<
 * 
 * 
 */
  }

  /* "View.MemoryView":1332
 *         transpose_memslice(&dst)
 * 
 *     refcount_copying(&dst, dtype_is_object, ndim, False)             # <<<<<<<<<<<<<<
 *     copy_strided_to_strided(&src, &dst, ndim, itemsize)
 *     refcount_copying(&dst, dtype_is_object, ndim, True)
 */
  __pyx_memoryview_refcount_copying((&__pyx_v_dst), __pyx_v_dtype_is_object, __pyx_v_ndim, 0);

  /* "View.MemoryView":1333
 * 
 *     refcount_copying(&dst, dtype_is_object, ndim, False)
 *     copy_strided_to_strided(&src, &dst, ndim, itemsize)             # <<<<<<<<<<<<<<
 *     refcount_copying(&dst, dtype_is_object, ndim, True)
 * 
 */
  copy_strided_to_strided((&__pyx_v_src), (&__pyx_v_dst), __pyx_v_ndim, __pyx_v_itemsize);

  /* "View.MemoryView":1334
 *     refcount_copying(&dst, dtype_is_object, ndim, False)
 *     copy_strided_to_strided(&src, &dst, ndim, itemsize)
 *     refcount_copying(&dst, dtype_is_object, ndim, True)             # <<<<<<<<<<<<<<
 * 
 *     free(tmpdata)
 */
  __pyx_memoryview_refcount_copying((&__pyx_v_dst), __pyx_v_dtype_is_object, __pyx_v_ndim, 1);

  /* "View.MemoryView":1336
 *     refcount_copying(&dst, dtype_is_object, ndim, True)
 * 
 *     free(tmpdata)             # <<<<<<<<<<<<<<
 *     return 0
 * 
 */
  free(__pyx_v_tmpdata);

  /* "View.MemoryView":1337
 * 
 *     free(tmpdata)
 *     return 0             # <<<<<<<<<<<<<<
 * 
 * @cname('__pyx_memoryview_broadcast_leading')
 */
  __pyx_r = 0;
  goto __pyx_L0;

  /* "View.MemoryView":1268
 * 
 * @cname('__pyx_memoryview_copy_contents')
 * cdef int memoryview_copy_contents(__Pyx_memviewslice src,             # <<<<<<<<<<<<<<
 *                                   __Pyx_memviewslice dst,
 *                                   int src_ndim, int dst_ndim,
 */

  /* function exit code */
  __pyx_L1_error:;
  {
    #ifdef WITH_THREAD
    PyGILState_STATE __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
    #endif
    __Pyx_AddTraceback("View.MemoryView.memoryview_copy_contents", __pyx_clineno, __pyx_lineno, __pyx_filename);
    #ifdef WITH_THREAD
    __Pyx_PyGILState_Release(__pyx_gilstate_save);
    #endif
  }
  __pyx_r = -1;
  __pyx_L0:;
  return __pyx_r;
}

/* "View.MemoryView":1340
 * 
 * @cname('__pyx_memoryview_broadcast_leading')
 * cdef void broadcast_leading(__Pyx_memviewslice *mslice,             # <<<<<<<<<<<<<<
 *                             int ndim,
 *                             int ndim_other) nogil:
 */

static void __pyx_memoryview_broadcast_leading(__Pyx_memviewslice *__pyx_v_mslice, int __pyx_v_ndim, int __pyx_v_ndim_other) {
  int __pyx_v_i;
  int __pyx_v_offset;
  int __pyx_t_1;
  int __pyx_t_2;
  int __pyx_t_3;

  /* "View.MemoryView":1344
 *                             int ndim_other) nogil:
 *     cdef int i
 *     cdef int offset = ndim_other - ndim             # <<<<<<<<<<<<<<
 * 
 *     for i in range(ndim - 1, -1, -1):
 */
  __pyx_v_offset = (__pyx_v_ndim_other - __pyx_v_ndim);

  /* "View.MemoryView":1346
 *     cdef int offset = ndim_other - ndim
 * 
 *     for i in range(ndim - 1, -1, -1):             # <<<<<<<<<<<<<<
 *         mslice.shape[i + offset] = mslice.shape[i]
 *         mslice.strides[i + offset] = mslice.strides[i]
 */
  for (__pyx_t_1 = (__pyx_v_ndim - 1); __pyx_t_1 > -1; __pyx_t_1-=1) {
    __pyx_v_i = __pyx_t_1;

    /* "View.MemoryView":1347
 * 
 *     for i in range(ndim - 1, -1, -1):
 *         mslice.shape[i + offset] = mslice.shape[i]             # <<<<<<<<<<<<<<
 *         mslice.strides[i + offset] = mslice.strides[i]
 *         mslice.suboffsets[i + offset] = mslice.suboffsets[i]
 */
    (__pyx_v_mslice->shape[(__pyx_v_i + __pyx_v_offset)]) = (__pyx_v_mslice->shape[__pyx_v_i]);

    /* "View.MemoryView":1348
 *     for i in range(ndim - 1, -1, -1):
 *         mslice.shape[i + offset] = mslice.shape[i]
 *         mslice.strides[i + offset] = mslice.strides[i]             # <<<<<<<<<<<<<<
 *         mslice.suboffsets[i + offset] = mslice.suboffsets[i]
 * 
 */
    (__pyx_v_mslice->strides[(__pyx_v_i + __pyx_v_offset)]) = (__pyx_v_mslice->strides[__pyx_v_i]);

    /* "View.MemoryView":1349
 *         mslice.shape[i + offset] = mslice.shape[i]
 *         mslice.strides[i + offset] = mslice.strides[i]
 *         mslice.suboffsets[i + offset] = mslice.suboffsets[i]             # <<<<<<<<<<<<<<
 * 
 *     for i in range(offset):
 */
    (__pyx_v_mslice->suboffsets[(__pyx_v_i + __pyx_v_offset)]) = (__pyx_v_mslice->suboffsets[__pyx_v_i]);
  }

  /* "View.MemoryView":1351
 *         mslice.suboffsets[i + offset] = mslice.suboffsets[i]
 * 
 *     for i in range(offset):             # <<<<<<<<<<<<<<
 *         mslice.shape[i] = 1
 *         mslice.strides[i] = mslice.strides[0]
 */
  __pyx_t_1 = __pyx_v_offset;
  __pyx_t_2 = __pyx_t_1;
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
    __pyx_v_i = __pyx_t_3;

    /* "View.MemoryView":1352
 * 
 *     for i in range(offset):
 *         mslice.shape[i] = 1             # <<<<<<<<<<<<<<
 *         mslice.strides[i] = mslice.strides[0]
 *         mslice.suboffsets[i] = -1
 */
    (__pyx_v_mslice->shape[__pyx_v_i]) = 1;

    /* "View.MemoryView":1353
 *     for i in range(offset):
 *         mslice.shape[i] = 1
 *         mslice.strides[i] = mslice.strides[0]             # <<<<<<<<<<<<<<
 *         mslice.suboffsets[i] = -1
 * 
 */
    (__pyx_v_mslice->strides[__pyx_v_i]) = (__pyx_v_mslice->strides[0]);

    /* "View.MemoryView":1354
 *         mslice.shape[i] = 1
 *         mslice.strides[i] = mslice.strides[0]
 *         mslice.suboffsets[i] = -1             # <<<<<<<<<<<<<<
 * 
 * 
 */
    (__pyx_v_mslice->suboffsets[__pyx_v_i]) = -1L;
  }

  /* "View.MemoryView":1340
 * 
 * @cname('__pyx_memoryview_broadcast_leading')
 * cdef void broadcast_leading(__Pyx_memviewslice *mslice,             # <<<<<<<<<<<<<<
 *                             int ndim,
 *                             int ndim_other) nogil:
 */

  /* function exit code */
}

/* "View.MemoryView":1362
 * 
 * @cname('__pyx_memoryview_refcount_copying')
 * cdef void refcount_copying(__Pyx_memviewslice *dst, bint dtype_is_object,             # <<<<<<<<<<<<<<
 *                            int ndim, bint inc) nogil:
 * 
 */

static void __pyx_memoryview_refcount_copying(__Pyx_memviewslice *__pyx_v_dst, int __pyx_v_dtype_is_object, int __pyx_v_ndim, int __pyx_v_inc) {
  int __pyx_t_1;

  /* "View.MemoryView":1366
 * 
 * 
 *     if dtype_is_object:             # <<<<<<<<<<<<<<
 *         refcount_objects_in_slice_with_gil(dst.data, dst.shape,
 *                                            dst.strides, ndim, inc)
 */
  __pyx_t_1 = (__pyx_v_dtype_is_object != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":1367
 * 
 *     if dtype_is_object:
 *         refcount_objects_in_slice_with_gil(dst.data, dst.shape,             # <<<<<<<<<<<<<<
 *                                            dst.strides, ndim, inc)
 * 
 */
    __pyx_memoryview_refcount_objects_in_slice_with_gil(__pyx_v_dst->data, __pyx_v_dst->shape, __pyx_v_dst->strides, __pyx_v_ndim, __pyx_v_inc);

    /* "View.MemoryView":1366
 * 
 * 
 *     if dtype_is_object:             # <<<<<<<<<<<<<<
 *         refcount_objects_in_slice_with_gil(dst.data, dst.shape,
 *                                            dst.strides, ndim, inc)
 */
  }

  /* "View.MemoryView":1362
 * 
 * @cname('__pyx_memoryview_refcount_copying')
 * cdef void refcount_copying(__Pyx_memviewslice *dst, bint dtype_is_object,             # <<<<<<<<<<<<<<
 *                            int ndim, bint inc) nogil:
 * 
 */

  /* function exit code */
}

/* "View.MemoryView":1371
 * 
 * @cname('__pyx_memoryview_refcount_objects_in_slice_with_gil')
 * cdef void refcount_objects_in_slice_with_gil(char *data, Py_ssize_t *shape,             # <<<<<<<<<<<<<<
 *                                              Py_ssize_t *strides, int ndim,
 *                                              bint inc) with gil:
 */

static void __pyx_memoryview_refcount_objects_in_slice_with_gil(char *__pyx_v_data, Py_ssize_t *__pyx_v_shape, Py_ssize_t *__pyx_v_strides, int __pyx_v_ndim, int __pyx_v_inc) {
  __Pyx_RefNannyDeclarations
  #ifdef WITH_THREAD
  PyGILState_STATE __pyx_gilstate_save = __Pyx_PyGILState_Ensure();
  #endif
  __Pyx_RefNannySetupContext("refcount_objects_in_slice_with_gil", 0);

  /* "View.MemoryView":1374
 *                                              Py_ssize_t *strides, int ndim,
 *                                              bint inc) with gil:
 *     refcount_objects_in_slice(data, shape, strides, ndim, inc)             # <<<<<<<<<<<<<<
 * 
 * @cname('__pyx_memoryview_refcount_objects_in_slice')
 */
  __pyx_memoryview_refcount_objects_in_slice(__pyx_v_data, __pyx_v_shape, __pyx_v_strides, __pyx_v_ndim, __pyx_v_inc);

  /* "View.MemoryView":1371
 * 
 * @cname('__pyx_memoryview_refcount_objects_in_slice_with_gil')
 * cdef void refcount_objects_in_slice_with_gil(char *data, Py_ssize_t *shape,             # <<<<<<<<<<<<<<
 *                                              Py_ssize_t *strides, int ndim,
 *                                              bint inc) with gil:
 */

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  #ifdef WITH_THREAD
  __Pyx_PyGILState_Release(__pyx_gilstate_save);
  #endif
}

/* "View.MemoryView":1377
 * 
 * @cname('__pyx_memoryview_refcount_objects_in_slice')
 * cdef void refcount_objects_in_slice(char *data, Py_ssize_t *shape,             # <<<<<<<<<<<<<<
 *                                     Py_ssize_t *strides, int ndim, bint inc):
 *     cdef Py_ssize_t i
 */

static void __pyx_memoryview_refcount_objects_in_slice(char *__pyx_v_data, Py_ssize_t *__pyx_v_shape, Py_ssize_t *__pyx_v_strides, int __pyx_v_ndim, int __pyx_v_inc) {
  CYTHON_UNUSED Py_ssize_t __pyx_v_i;
  __Pyx_RefNannyDeclarations
  Py_ssize_t __pyx_t_1;
  Py_ssize_t __pyx_t_2;
  Py_ssize_t __pyx_t_3;
  int __pyx_t_4;
  __Pyx_RefNannySetupContext("refcount_objects_in_slice", 0);

  /* "View.MemoryView":1381
 *     cdef Py_ssize_t i
 * 
 *     for i in range(shape[0]):             # <<<<<<<<<<<<<<
 *         if ndim == 1:
 *             if inc:
 */
  __pyx_t_1 = (__pyx_v_shape[0]);
  __pyx_t_2 = __pyx_t_1;
  for (__pyx_t_3 = 0; __pyx_t_3 < __pyx_t_2; __pyx_t_3+=1) {
    __pyx_v_i = __pyx_t_3;

    /* "View.MemoryView":1382
 * 
 *     for i in range(shape[0]):
 *         if ndim == 1:             # <<<<<<<<<<<<<<
 *             if inc:
 *                 Py_INCREF((<PyObject **> data)[0])
 */
    __pyx_t_4 = ((__pyx_v_ndim == 1) != 0);
    if (__pyx_t_4) {

      /* "View.MemoryView":1383
 *     for i in range(shape[0]):
 *         if ndim == 1:
 *             if inc:             # <<<<<<<<<<<<<<
 *                 Py_INCREF((<PyObject **> data)[0])
 *             else:
 */
      __pyx_t_4 = (__pyx_v_inc != 0);
      if (__pyx_t_4) {

        /* "View.MemoryView":1384
 *         if ndim == 1:
 *             if inc:
 *                 Py_INCREF((<PyObject **> data)[0])             # <<<<<<<<<<<<<<
 *             else:
 *                 Py_DECREF((<PyObject **> data)[0])
 */
        Py_INCREF((((PyObject **)__pyx_v_data)[0]));

        /* "View.MemoryView":1383
 *     for i in range(shape[0]):
 *         if ndim == 1:
 *             if inc:             # <<<<<<<<<<<<<<
 *                 Py_INCREF((<PyObject **> data)[0])
 *             else:
 */
        goto __pyx_L6;
      }

      /* "View.MemoryView":1386
 *                 Py_INCREF((<PyObject **> data)[0])
 *             else:
 *                 Py_DECREF((<PyObject **> data)[0])             # <<<<<<<<<<<<<<
 *         else:
 *             refcount_objects_in_slice(data, shape + 1, strides + 1,
 */
      /*else*/ {
        Py_DECREF((((PyObject **)__pyx_v_data)[0]));
      }
      __pyx_L6:;

      /* "View.MemoryView":1382
 * 
 *     for i in range(shape[0]):
 *         if ndim == 1:             # <<<<<<<<<<<<<<
 *             if inc:
 *                 Py_INCREF((<PyObject **> data)[0])
 */
      goto __pyx_L5;
    }

    /* "View.MemoryView":1388
 *                 Py_DECREF((<PyObject **> data)[0])
 *         else:
 *             refcount_objects_in_slice(data, shape + 1, strides + 1,             # <<<<<<<<<<<<<<
 *                                       ndim - 1, inc)
 * 
 */
    /*else*/ {

      /* "View.MemoryView":1389
 *         else:
 *             refcount_objects_in_slice(data, shape + 1, strides + 1,
 *                                       ndim - 1, inc)             # <<<<<<<<<<<<<<
 * 
 *         data += strides[0]
 */
      __pyx_memoryview_refcount_objects_in_slice(__pyx_v_data, (__pyx_v_shape + 1), (__pyx_v_strides + 1), (__pyx_v_ndim - 1), __pyx_v_inc);
    }
    __pyx_L5:;

    /* "View.MemoryView":1391
 *                                       ndim - 1, inc)
 * 
 *         data += strides[0]             # <<<<<<<<<<<<<<
 * 
 * 
 */
    __pyx_v_data = (__pyx_v_data + (__pyx_v_strides[0]));
  }

  /* "View.MemoryView":1377
 * 
 * @cname('__pyx_memoryview_refcount_objects_in_slice')
 * cdef void refcount_objects_in_slice(char *data, Py_ssize_t *shape,             # <<<<<<<<<<<<<<
 *                                     Py_ssize_t *strides, int ndim, bint inc):
 *     cdef Py_ssize_t i
 */

  /* function exit code */
  __Pyx_RefNannyFinishContext();
}

/* "View.MemoryView":1397
 * 
 * @cname('__pyx_memoryview_slice_assign_scalar')
 * cdef void slice_assign_scalar(__Pyx_memviewslice *dst, int ndim,             # <<<<<<<<<<<<<<
 *                               size_t itemsize, void *item,
 *                               bint dtype_is_object) nogil:
 */

static void __pyx_memoryview_slice_assign_scalar(__Pyx_memviewslice *__pyx_v_dst, int __pyx_v_ndim, size_t __pyx_v_itemsize, void *__pyx_v_item, int __pyx_v_dtype_is_object) {

  /* "View.MemoryView":1400
 *                               size_t itemsize, void *item,
 *                               bint dtype_is_object) nogil:
 *     refcount_copying(dst, dtype_is_object, ndim, False)             # <<<<<<<<<<<<<<
 *     _slice_assign_scalar(dst.data, dst.shape, dst.strides, ndim,
 *                          itemsize, item)
 */
  __pyx_memoryview_refcount_copying(__pyx_v_dst, __pyx_v_dtype_is_object, __pyx_v_ndim, 0);

  /* "View.MemoryView":1401
 *                               bint dtype_is_object) nogil:
 *     refcount_copying(dst, dtype_is_object, ndim, False)
 *     _slice_assign_scalar(dst.data, dst.shape, dst.strides, ndim,             # <<<<<<<<<<<<<<
 *                          itemsize, item)
 *     refcount_copying(dst, dtype_is_object, ndim, True)
 */
  __pyx_memoryview__slice_assign_scalar(__pyx_v_dst->data, __pyx_v_dst->shape, __pyx_v_dst->strides, __pyx_v_ndim, __pyx_v_itemsize, __pyx_v_item);

  /* "View.MemoryView":1403
 *     _slice_assign_scalar(dst.data, dst.shape, dst.strides, ndim,
 *                          itemsize, item)
 *     refcount_copying(dst, dtype_is_object, ndim, True)             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_memoryview_refcount_copying(__pyx_v_dst, __pyx_v_dtype_is_object, __pyx_v_ndim, 1);

  /* "View.MemoryView":1397
 * 
 * @cname('__pyx_memoryview_slice_assign_scalar')
 * cdef void slice_assign_scalar(__Pyx_memviewslice *dst, int ndim,             # <<<<<<<<<<<<<<
 *                               size_t itemsize, void *item,
 *                               bint dtype_is_object) nogil:
 */

  /* function exit code */
}

/* "View.MemoryView":1407
 * 
 * @cname('__pyx_memoryview__slice_assign_scalar')
 * cdef void _slice_assign_scalar(char *data, Py_ssize_t *shape,             # <<<<<<<<<<<<<<
 *                               Py_ssize_t *strides, int ndim,
 *                               size_t itemsize, void *item) nogil:
 */

static void __pyx_memoryview__slice_assign_scalar(char *__pyx_v_data, Py_ssize_t *__pyx_v_shape, Py_ssize_t *__pyx_v_strides, int __pyx_v_ndim, size_t __pyx_v_itemsize, void *__pyx_v_item) {
  CYTHON_UNUSED Py_ssize_t __pyx_v_i;
  Py_ssize_t __pyx_v_stride;
  Py_ssize_t __pyx_v_extent;
  int __pyx_t_1;
  Py_ssize_t __pyx_t_2;
  Py_ssize_t __pyx_t_3;
  Py_ssize_t __pyx_t_4;

  /* "View.MemoryView":1411
 *                               size_t itemsize, void *item) nogil:
 *     cdef Py_ssize_t i
 *     cdef Py_ssize_t stride = strides[0]             # <<<<<<<<<<<<<<
 *     cdef Py_ssize_t extent = shape[0]
 * 
 */
  __pyx_v_stride = (__pyx_v_strides[0]);

  /* "View.MemoryView":1412
 *     cdef Py_ssize_t i
 *     cdef Py_ssize_t stride = strides[0]
 *     cdef Py_ssize_t extent = shape[0]             # <<<<<<<<<<<<<<
 * 
 *     if ndim == 1:
 */
  __pyx_v_extent = (__pyx_v_shape[0]);

  /* "View.MemoryView":1414
 *     cdef Py_ssize_t extent = shape[0]
 * 
 *     if ndim == 1:             # <<<<<<<<<<<<<<
 *         for i in range(extent):
 *             memcpy(data, item, itemsize)
 */
  __pyx_t_1 = ((__pyx_v_ndim == 1) != 0);
  if (__pyx_t_1) {

    /* "View.MemoryView":1415
 * 
 *     if ndim == 1:
 *         for i in range(extent):             # <<<<<<<<<<<<<<
 *             memcpy(data, item, itemsize)
 *             data += stride
 */
    __pyx_t_2 = __pyx_v_extent;
    __pyx_t_3 = __pyx_t_2;
    for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) {
      __pyx_v_i = __pyx_t_4;

      /* "View.MemoryView":1416
 *     if ndim == 1:
 *         for i in range(extent):
 *             memcpy(data, item, itemsize)             # <<<<<<<<<<<<<<
 *             data += stride
 *     else:
 */
      (void)(memcpy(__pyx_v_data, __pyx_v_item, __pyx_v_itemsize));

      /* "View.MemoryView":1417
 *         for i in range(extent):
 *             memcpy(data, item, itemsize)
 *             data += stride             # <<<<<<<<<<<<<<
 *     else:
 *         for i in range(extent):
 */
      __pyx_v_data = (__pyx_v_data + __pyx_v_stride);
    }

    /* "View.MemoryView":1414
 *     cdef Py_ssize_t extent = shape[0]
 * 
 *     if ndim == 1:             # <<<<<<<<<<<<<<
 *         for i in range(extent):
 *             memcpy(data, item, itemsize)
 */
    goto __pyx_L3;
  }

  /* "View.MemoryView":1419
 *             data += stride
 *     else:
 *         for i in range(extent):             # <<<<<<<<<<<<<<
 *             _slice_assign_scalar(data, shape + 1, strides + 1,
 *                                 ndim - 1, itemsize, item)
 */
  /*else*/ {
    __pyx_t_2 = __pyx_v_extent;
    __pyx_t_3 = __pyx_t_2;
    for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) {
      __pyx_v_i = __pyx_t_4;

      /* "View.MemoryView":1420
 *     else:
 *         for i in range(extent):
 *             _slice_assign_scalar(data, shape + 1, strides + 1,             # <<<<<<<<<<<<<<
 *                                 ndim - 1, itemsize, item)
 *             data += stride
 */
      __pyx_memoryview__slice_assign_scalar(__pyx_v_data, (__pyx_v_shape + 1), (__pyx_v_strides + 1), (__pyx_v_ndim - 1), __pyx_v_itemsize, __pyx_v_item);

      /* "View.MemoryView":1422
 *             _slice_assign_scalar(data, shape + 1, strides + 1,
 *                                 ndim - 1, itemsize, item)
 *             data += stride             # <<<<<<<<<<<<<<
 * 
 * 
 */
      __pyx_v_data = (__pyx_v_data + __pyx_v_stride);
    }
  }
  __pyx_L3:;

  /* "View.MemoryView":1407
 * 
 * @cname('__pyx_memoryview__slice_assign_scalar')
 * cdef void _slice_assign_scalar(char *data, Py_ssize_t *shape,             # <<<<<<<<<<<<<<
 *                               Py_ssize_t *strides, int ndim,
 *                               size_t itemsize, void *item) nogil:
 */

  /* function exit code */
}

/* "(tree fragment)":1
 * def __pyx_unpickle_Enum(__pyx_type, long __pyx_checksum, __pyx_state):             # <<<<<<<<<<<<<<
 *     cdef object __pyx_PickleError
 *     cdef object __pyx_result
 */

/* Python wrapper */
static PyObject *__pyx_pw_15View_dot_MemoryView_1__pyx_unpickle_Enum(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static PyMethodDef __pyx_mdef_15View_dot_MemoryView_1__pyx_unpickle_Enum = {"__pyx_unpickle_Enum", (PyCFunction)(void*)(PyCFunctionWithKeywords)__pyx_pw_15View_dot_MemoryView_1__pyx_unpickle_Enum, METH_VARARGS|METH_KEYWORDS, 0};
static PyObject *__pyx_pw_15View_dot_MemoryView_1__pyx_unpickle_Enum(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
  PyObject *__pyx_v___pyx_type = 0;
  long __pyx_v___pyx_checksum;
  PyObject *__pyx_v___pyx_state = 0;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__pyx_unpickle_Enum (wrapper)", 0);
  {
    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_pyx_type,&__pyx_n_s_pyx_checksum,&__pyx_n_s_pyx_state,0};
    PyObject* values[3] = {0,0,0};
    if (unlikely(__pyx_kwds)) {
      Py_ssize_t kw_args;
      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);
      switch (pos_args) {
        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        CYTHON_FALLTHROUGH;
        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        CYTHON_FALLTHROUGH;
        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        CYTHON_FALLTHROUGH;
        case  0: break;
        default: goto __pyx_L5_argtuple_error;
      }
      kw_args = PyDict_Size(__pyx_kwds);
      switch (pos_args) {
        case  0:
        if (likely((values[0] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_pyx_type)) != 0)) kw_args--;
        else goto __pyx_L5_argtuple_error;
        CYTHON_FALLTHROUGH;
        case  1:
        if (likely((values[1] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_pyx_checksum)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("__pyx_unpickle_Enum", 1, 3, 3, 1); __PYX_ERR(2, 1, __pyx_L3_error)
        }
        CYTHON_FALLTHROUGH;
        case  2:
        if (likely((values[2] = __Pyx_PyDict_GetItemStr(__pyx_kwds, __pyx_n_s_pyx_state)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("__pyx_unpickle_Enum", 1, 3, 3, 2); __PYX_ERR(2, 1, __pyx_L3_error)
        }
      }
      if (unlikely(kw_args > 0)) {
        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "__pyx_unpickle_Enum") < 0)) __PYX_ERR(2, 1, __pyx_L3_error)
      }
    } else if (PyTuple_GET_SIZE(__pyx_args) != 3) {
      goto __pyx_L5_argtuple_error;
    } else {
      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
      values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
    }
    __pyx_v___pyx_type = values[0];
    __pyx_v___pyx_checksum = __Pyx_PyInt_As_long(values[1]); if (unlikely((__pyx_v___pyx_checksum == (long)-1) && PyErr_Occurred())) __PYX_ERR(2, 1, __pyx_L3_error)
    __pyx_v___pyx_state = values[2];
  }
  goto __pyx_L4_argument_unpacking_done;
  __pyx_L5_argtuple_error:;
  __Pyx_RaiseArgtupleInvalid("__pyx_unpickle_Enum", 1, 3, 3, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(2, 1, __pyx_L3_error)
  __pyx_L3_error:;
  __Pyx_AddTraceback("View.MemoryView.__pyx_unpickle_Enum", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __Pyx_RefNannyFinishContext();
  return NULL;
  __pyx_L4_argument_unpacking_done:;
  __pyx_r = __pyx_pf_15View_dot_MemoryView___pyx_unpickle_Enum(__pyx_self, __pyx_v___pyx_type, __pyx_v___pyx_checksum, __pyx_v___pyx_state);

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_15View_dot_MemoryView___pyx_unpickle_Enum(CYTHON_UNUSED PyObject *__pyx_self, PyObject *__pyx_v___pyx_type, long __pyx_v___pyx_checksum, PyObject *__pyx_v___pyx_state) {
  PyObject *__pyx_v___pyx_PickleError = 0;
  PyObject *__pyx_v___pyx_result = 0;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  int __pyx_t_1;
  PyObject *__pyx_t_2 = NULL;
  PyObject *__pyx_t_3 = NULL;
  PyObject *__pyx_t_4 = NULL;
  PyObject *__pyx_t_5 = NULL;
  int __pyx_t_6;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__pyx_unpickle_Enum", 0);

  /* "(tree fragment)":4
 *     cdef object __pyx_PickleError
 *     cdef object __pyx_result
 *     if __pyx_checksum != 0xb068931:             # <<<<<<<<<<<<<<
 *         from pickle import PickleError as __pyx_PickleError
 *         raise __pyx_PickleError("Incompatible checksums (%s vs 0xb068931 = (name))" % __pyx_checksum)
 */
  __pyx_t_1 = ((__pyx_v___pyx_checksum != 0xb068931) != 0);
  if (__pyx_t_1) {

    /* "(tree fragment)":5
 *     cdef object __pyx_result
 *     if __pyx_checksum != 0xb068931:
 *         from pickle import PickleError as __pyx_PickleError             # <<<<<<<<<<<<<<
 *         raise __pyx_PickleError("Incompatible checksums (%s vs 0xb068931 = (name))" % __pyx_checksum)
 *     __pyx_result = Enum.__new__(__pyx_type)
 */
    __pyx_t_2 = PyList_New(1); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 5, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __Pyx_INCREF(__pyx_n_s_PickleError);
    __Pyx_GIVEREF(__pyx_n_s_PickleError);
    PyList_SET_ITEM(__pyx_t_2, 0, __pyx_n_s_PickleError);
    __pyx_t_3 = __Pyx_Import(__pyx_n_s_pickle, __pyx_t_2, 0); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 5, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    __pyx_t_2 = __Pyx_ImportFrom(__pyx_t_3, __pyx_n_s_PickleError); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 5, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __Pyx_INCREF(__pyx_t_2);
    __pyx_v___pyx_PickleError = __pyx_t_2;
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;

    /* "(tree fragment)":6
 *     if __pyx_checksum != 0xb068931:
 *         from pickle import PickleError as __pyx_PickleError
 *         raise __pyx_PickleError("Incompatible checksums (%s vs 0xb068931 = (name))" % __pyx_checksum)             # <<<<<<<<<<<<<<
 *     __pyx_result = Enum.__new__(__pyx_type)
 *     if __pyx_state is not None:
 */
    __pyx_t_2 = __Pyx_PyInt_From_long(__pyx_v___pyx_checksum); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 6, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_2);
    __pyx_t_4 = __Pyx_PyString_Format(__pyx_kp_s_Incompatible_checksums_s_vs_0xb0, __pyx_t_2); if (unlikely(!__pyx_t_4)) __PYX_ERR(2, 6, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_4);
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    __Pyx_INCREF(__pyx_v___pyx_PickleError);
    __pyx_t_2 = __pyx_v___pyx_PickleError; __pyx_t_5 = NULL;
    if (CYTHON_UNPACK_METHODS && unlikely(PyMethod_Check(__pyx_t_2))) {
      __pyx_t_5 = PyMethod_GET_SELF(__pyx_t_2);
      if (likely(__pyx_t_5)) {
        PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_2);
        __Pyx_INCREF(__pyx_t_5);
        __Pyx_INCREF(function);
        __Pyx_DECREF_SET(__pyx_t_2, function);
      }
    }
    __pyx_t_3 = (__pyx_t_5) ? __Pyx_PyObject_Call2Args(__pyx_t_2, __pyx_t_5, __pyx_t_4) : __Pyx_PyObject_CallOneArg(__pyx_t_2, __pyx_t_4);
    __Pyx_XDECREF(__pyx_t_5); __pyx_t_5 = 0;
    __Pyx_DECREF(__pyx_t_4); __pyx_t_4 = 0;
    if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 6, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
    __Pyx_Raise(__pyx_t_3, 0, 0, 0);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;
    __PYX_ERR(2, 6, __pyx_L1_error)

    /* "(tree fragment)":4
 *     cdef object __pyx_PickleError
 *     cdef object __pyx_result
 *     if __pyx_checksum != 0xb068931:             # <<<<<<<<<<<<<<
 *         from pickle import PickleError as __pyx_PickleError
 *         raise __pyx_PickleError("Incompatible checksums (%s vs 0xb068931 = (name))" % __pyx_checksum)
 */
  }

  /* "(tree fragment)":7
 *         from pickle import PickleError as __pyx_PickleError
 *         raise __pyx_PickleError("Incompatible checksums (%s vs 0xb068931 = (name))" % __pyx_checksum)
 *     __pyx_result = Enum.__new__(__pyx_type)             # <<<<<<<<<<<<<<
 *     if __pyx_state is not None:
 *         __pyx_unpickle_Enum__set_state(<Enum> __pyx_result, __pyx_state)
 */
  __pyx_t_2 = __Pyx_PyObject_GetAttrStr(((PyObject *)__pyx_MemviewEnum_type), __pyx_n_s_new); if (unlikely(!__pyx_t_2)) __PYX_ERR(2, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_2);
  __pyx_t_4 = NULL;
  if (CYTHON_UNPACK_METHODS && likely(PyMethod_Check(__pyx_t_2))) {
    __pyx_t_4 = PyMethod_GET_SELF(__pyx_t_2);
    if (likely(__pyx_t_4)) {
      PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_2);
      __Pyx_INCREF(__pyx_t_4);
      __Pyx_INCREF(function);
      __Pyx_DECREF_SET(__pyx_t_2, function);
    }
  }
  __pyx_t_3 = (__pyx_t_4) ? __Pyx_PyObject_Call2Args(__pyx_t_2, __pyx_t_4, __pyx_v___pyx_type) : __Pyx_PyObject_CallOneArg(__pyx_t_2, __pyx_v___pyx_type);
  __Pyx_XDECREF(__pyx_t_4); __pyx_t_4 = 0;
  if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 7, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_3);
  __Pyx_DECREF(__pyx_t_2); __pyx_t_2 = 0;
  __pyx_v___pyx_result = __pyx_t_3;
  __pyx_t_3 = 0;

  /* "(tree fragment)":8
 *         raise __pyx_PickleError("Incompatible checksums (%s vs 0xb068931 = (name))" % __pyx_checksum)
 *     __pyx_result = Enum.__new__(__pyx_type)
 *     if __pyx_state is not None:             # <<<<<<<<<<<<<<
 *         __pyx_unpickle_Enum__set_state(<Enum> __pyx_result, __pyx_state)
 *     return __pyx_result
 */
  __pyx_t_1 = (__pyx_v___pyx_state != Py_None);
  __pyx_t_6 = (__pyx_t_1 != 0);
  if (__pyx_t_6) {

    /* "(tree fragment)":9
 *     __pyx_result = Enum.__new__(__pyx_type)
 *     if __pyx_state is not None:
 *         __pyx_unpickle_Enum__set_state(<Enum> __pyx_result, __pyx_state)             # <<<<<<<<<<<<<<
 *     return __pyx_result
 * cdef __pyx_unpickle_Enum__set_state(Enum __pyx_result, tuple __pyx_state):
 */
    if (!(likely(PyTuple_CheckExact(__pyx_v___pyx_state))||((__pyx_v___pyx_state) == Py_None)||(PyErr_Format(PyExc_TypeError, "Expected %.16s, got %.200s", "tuple", Py_TYPE(__pyx_v___pyx_state)->tp_name), 0))) __PYX_ERR(2, 9, __pyx_L1_error)
    __pyx_t_3 = __pyx_unpickle_Enum__set_state(((struct __pyx_MemviewEnum_obj *)__pyx_v___pyx_result), ((PyObject*)__pyx_v___pyx_state)); if (unlikely(!__pyx_t_3)) __PYX_ERR(2, 9, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_3);
    __Pyx_DECREF(__pyx_t_3); __pyx_t_3 = 0;

    /* "(tree fragment)":8
 *         raise __pyx_PickleError("Incompatible checksums (%s vs 0xb068931 = (name))" % __pyx_checksum)
 *     __pyx_result = Enum.__new__(__pyx_type)
 *     if __pyx_state is not None:             # <<<<<<<<<<<<<<
 *         __pyx_unpickle_Enum__set_state(<Enum> __pyx_result, __pyx_state)
 *     return __pyx_result
 */
  }

  /* "(tree fragment)":10
 *     if __pyx_state is not None:
 *         __pyx_unpickle_Enum__set_state(<Enum> __pyx_result, __pyx_state)
 *     return __pyx_result             # <<<<<<<<<<<<<<
 * cdef __pyx_unpickle_Enum__set_state(Enum __pyx_result, tuple __pyx_state):
 *     __pyx_result.name = __pyx_state[0]
 */
  __Pyx_XDECREF(__pyx_r);
  __Pyx_INCREF(__pyx_v___pyx_result);
  __pyx_r = __pyx_v___pyx_result;
  goto __pyx_L0;

  /* "(tree fragment)":1
 * def __pyx_unpickle_Enum(__pyx_type, long __pyx_checksum, __pyx_state):             # <<<<<<<<<<<<<<
 *     cdef object __pyx_PickleError
 *     cdef object __pyx_result
 */

  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_2);
  __Pyx_XDECREF(__pyx_t_3);
  __Pyx_XDECREF(__pyx_t_4);
  __Pyx_XDECREF(__pyx_t_5);
  __Pyx_AddTraceback("View.MemoryView.__pyx_unpickle_Enum", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __Pyx_XDECREF(__pyx_v___pyx_PickleError);
  __Pyx_XDECREF(__pyx_v___pyx_result);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

/* "(tree fragment)":11
 *         __pyx_unpickle_Enum__set_state(<Enum> __pyx_result, __pyx_state)
 *     return __pyx_result
 * cdef __pyx_unpickle_Enum__set_state(Enum __pyx_result, tuple __pyx_state):             # <<<<<<<<<<<<<<
 *     __pyx_result.name = __pyx_state[0]
 *     if len(__pyx_state) > 1 and hasattr(__pyx_result, '__dict__'):
 */

static PyObject *__pyx_unpickle_Enum__set_state(struct __pyx_MemviewEnum_obj *__pyx_v___pyx_result, PyObject *__pyx_v___pyx_state) {
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_t_2;
  Py_ssize_t __pyx_t_3;
  int __pyx_t_4;
  int __pyx_t_5;
  PyObject *__pyx_t_6 = NULL;
  PyObject *__pyx_t_7 = NULL;
  PyObject *__pyx_t_8 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__pyx_unpickle_Enum__set_state", 0);

  /* "(tree fragment)":12
 *     return __pyx_result
 * cdef __pyx_unpickle_Enum__set_state(Enum __pyx_result, tuple __pyx_state):
 *     __pyx_result.name = __pyx_state[0]             # <<<<<<<<<<<<<<
 *     if len(__pyx_state) > 1 and hasattr(__pyx_result, '__dict__'):
 *         __pyx_result.__dict__.update(__pyx_state[1])
 */
  if (unlikely(__pyx_v___pyx_state == Py_None)) {
    PyErr_SetString(PyExc_TypeError, "'NoneType' object is not subscriptable");
    __PYX_ERR(2, 12, __pyx_L1_error)
  }
  __pyx_t_1 = __Pyx_GetItemInt_Tuple(__pyx_v___pyx_state, 0, long, 1, __Pyx_PyInt_From_long, 0, 0, 1); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 12, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_GIVEREF(__pyx_t_1);
  __Pyx_GOTREF(__pyx_v___pyx_result->name);
  __Pyx_DECREF(__pyx_v___pyx_result->name);
  __pyx_v___pyx_result->name = __pyx_t_1;
  __pyx_t_1 = 0;

  /* "(tree fragment)":13
 * cdef __pyx_unpickle_Enum__set_state(Enum __pyx_result, tuple __pyx_state):
 *     __pyx_result.name = __pyx_state[0]
 *     if len(__pyx_state) > 1 and hasattr(__pyx_result, '__dict__'):             # <<<<<<<<<<<<<<
 *         __pyx_result.__dict__.update(__pyx_state[1])
 */
  if (unlikely(__pyx_v___pyx_state == Py_None)) {
    PyErr_SetString(PyExc_TypeError, "object of type 'NoneType' has no len()");
    __PYX_ERR(2, 13, __pyx_L1_error)
  }
  __pyx_t_3 = PyTuple_GET_SIZE(__pyx_v___pyx_state); if (unlikely(__pyx_t_3 == ((Py_ssize_t)-1))) __PYX_ERR(2, 13, __pyx_L1_error)
  __pyx_t_4 = ((__pyx_t_3 > 1) != 0);
  if (__pyx_t_4) {
  } else {
    __pyx_t_2 = __pyx_t_4;
    goto __pyx_L4_bool_binop_done;
  }
  __pyx_t_4 = __Pyx_HasAttr(((PyObject *)__pyx_v___pyx_result), __pyx_n_s_dict); if (unlikely(__pyx_t_4 == ((int)-1))) __PYX_ERR(2, 13, __pyx_L1_error)
  __pyx_t_5 = (__pyx_t_4 != 0);
  __pyx_t_2 = __pyx_t_5;
  __pyx_L4_bool_binop_done:;
  if (__pyx_t_2) {

    /* "(tree fragment)":14
 *     __pyx_result.name = __pyx_state[0]
 *     if len(__pyx_state) > 1 and hasattr(__pyx_result, '__dict__'):
 *         __pyx_result.__dict__.update(__pyx_state[1])             # <<<<<<<<<<<<<<
 */
    __pyx_t_6 = __Pyx_PyObject_GetAttrStr(((PyObject *)__pyx_v___pyx_result), __pyx_n_s_dict); if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 14, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_6);
    __pyx_t_7 = __Pyx_PyObject_GetAttrStr(__pyx_t_6, __pyx_n_s_update); if (unlikely(!__pyx_t_7)) __PYX_ERR(2, 14, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_7);
    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
    if (unlikely(__pyx_v___pyx_state == Py_None)) {
      PyErr_SetString(PyExc_TypeError, "'NoneType' object is not subscriptable");
      __PYX_ERR(2, 14, __pyx_L1_error)
    }
    __pyx_t_6 = __Pyx_GetItemInt_Tuple(__pyx_v___pyx_state, 1, long, 1, __Pyx_PyInt_From_long, 0, 0, 1); if (unlikely(!__pyx_t_6)) __PYX_ERR(2, 14, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_6);
    __pyx_t_8 = NULL;
    if (CYTHON_UNPACK_METHODS && likely(PyMethod_Check(__pyx_t_7))) {
      __pyx_t_8 = PyMethod_GET_SELF(__pyx_t_7);
      if (likely(__pyx_t_8)) {
        PyObject* function = PyMethod_GET_FUNCTION(__pyx_t_7);
        __Pyx_INCREF(__pyx_t_8);
        __Pyx_INCREF(function);
        __Pyx_DECREF_SET(__pyx_t_7, function);
      }
    }
    __pyx_t_1 = (__pyx_t_8) ? __Pyx_PyObject_Call2Args(__pyx_t_7, __pyx_t_8, __pyx_t_6) : __Pyx_PyObject_CallOneArg(__pyx_t_7, __pyx_t_6);
    __Pyx_XDECREF(__pyx_t_8); __pyx_t_8 = 0;
    __Pyx_DECREF(__pyx_t_6); __pyx_t_6 = 0;
    if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 14, __pyx_L1_error)
    __Pyx_GOTREF(__pyx_t_1);
    __Pyx_DECREF(__pyx_t_7); __pyx_t_7 = 0;
    __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

    /* "(tree fragment)":13
 * cdef __pyx_unpickle_Enum__set_state(Enum __pyx_result, tuple __pyx_state):
 *     __pyx_result.name = __pyx_state[0]
 *     if len(__pyx_state) > 1 and hasattr(__pyx_result, '__dict__'):             # <<<<<<<<<<<<<<
 *         __pyx_result.__dict__.update(__pyx_state[1])
 */
  }

  /* "(tree fragment)":11
 *         __pyx_unpickle_Enum__set_state(<Enum> __pyx_result, __pyx_state)
 *     return __pyx_result
 * cdef __pyx_unpickle_Enum__set_state(Enum __pyx_result, tuple __pyx_state):             # <<<<<<<<<<<<<<
 *     __pyx_result.name = __pyx_state[0]
 *     if len(__pyx_state) > 1 and hasattr(__pyx_result, '__dict__'):
 */

  /* function exit code */
  __pyx_r = Py_None; __Pyx_INCREF(Py_None);
  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_XDECREF(__pyx_t_6);
  __Pyx_XDECREF(__pyx_t_7);
  __Pyx_XDECREF(__pyx_t_8);
  __Pyx_AddTraceback("View.MemoryView.__pyx_unpickle_Enum__set_state", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = 0;
  __pyx_L0:;
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
static struct __pyx_vtabstruct_array __pyx_vtable_array;

static PyObject *__pyx_tp_new_array(PyTypeObject *t, PyObject *a, PyObject *k) {
  struct __pyx_array_obj *p;
  PyObject *o;
  if (likely((t->tp_flags & Py_TPFLAGS_IS_ABSTRACT) == 0)) {
    o = (*t->tp_alloc)(t, 0);
  } else {
    o = (PyObject *) PyBaseObject_Type.tp_new(t, __pyx_empty_tuple, 0);
  }
  if (unlikely(!o)) return 0;
  p = ((struct __pyx_array_obj *)o);
  p->__pyx_vtab = __pyx_vtabptr_array;
  p->mode = ((PyObject*)Py_None); Py_INCREF(Py_None);
  p->_format = ((PyObject*)Py_None); Py_INCREF(Py_None);
  if (unlikely(__pyx_array___cinit__(o, a, k) < 0)) goto bad;
  return o;
  bad:
  Py_DECREF(o); o = 0;
  return NULL;
}

static void __pyx_tp_dealloc_array(PyObject *o) {
  struct __pyx_array_obj *p = (struct __pyx_array_obj *)o;
  #if CYTHON_USE_TP_FINALIZE
  if (unlikely(PyType_HasFeature(Py_TYPE(o), Py_TPFLAGS_HAVE_FINALIZE) && Py_TYPE(o)->tp_finalize) && (!PyType_IS_GC(Py_TYPE(o)) || !_PyGC_FINALIZED(o))) {
    if (PyObject_CallFinalizerFromDealloc(o)) return;
  }
  #endif
  {
    PyObject *etype, *eval, *etb;
    PyErr_Fetch(&etype, &eval, &etb);
    __Pyx_SET_REFCNT(o, Py_REFCNT(o) + 1);
    __pyx_array___dealloc__(o);
    __Pyx_SET_REFCNT(o, Py_REFCNT(o) - 1);
    PyErr_Restore(etype, eval, etb);
  }
  Py_CLEAR(p->mode);
  Py_CLEAR(p->_format);
  (*Py_TYPE(o)->tp_free)(o);
}
static PyObject *__pyx_sq_item_array(PyObject *o, Py_ssize_t i) {
  PyObject *r;
  PyObject *x = PyInt_FromSsize_t(i); if(!x) return 0;
  r = Py_TYPE(o)->tp_as_mapping->mp_subscript(o, x);
  Py_DECREF(x);
  return r;
}

static int __pyx_mp_ass_subscript_array(PyObject *o, PyObject *i, PyObject *v) {
  if (v) {
    return __pyx_array___setitem__(o, i, v);
  }
  else {
    PyErr_Format(PyExc_NotImplementedError,
      "Subscript deletion not supported by %.200s", Py_TYPE(o)->tp_name);
    return -1;
  }
}

static PyObject *__pyx_tp_getattro_array(PyObject *o, PyObject *n) {
  PyObject *v = __Pyx_PyObject_GenericGetAttr(o, n);
  if (!v && PyErr_ExceptionMatches(PyExc_AttributeError)) {
    PyErr_Clear();
    v = __pyx_array___getattr__(o, n);
  }
  return v;
}

static PyObject *__pyx_getprop___pyx_array_memview(PyObject *o, CYTHON_UNUSED void *x) {
  return __pyx_pw_15View_dot_MemoryView_5array_7memview_1__get__(o);
}

static PyMethodDef __pyx_methods_array[] = {
  {"__getattr__", (PyCFunction)__pyx_array___getattr__, METH_O|METH_COEXIST, 0},
  {"__reduce_cython__", (PyCFunction)__pyx_pw___pyx_array_1__reduce_cython__, METH_NOARGS, 0},
  {"__setstate_cython__", (PyCFunction)__pyx_pw___pyx_array_3__setstate_cython__, METH_O, 0},
  {0, 0, 0, 0}
};

static struct PyGetSetDef __pyx_getsets_array[] = {
  {(char *)"memview", __pyx_getprop___pyx_array_memview, 0, (char *)0, 0},
  {0, 0, 0, 0, 0}
};

static PySequenceMethods __pyx_tp_as_sequence_array = {
  __pyx_array___len__, /*sq_length*/
  0, /*sq_concat*/
  0, /*sq_repeat*/
  __pyx_sq_item_array, /*sq_item*/
  0, /*sq_slice*/
  0, /*sq_ass_item*/
  0, /*sq_ass_slice*/
  0, /*sq_contains*/
  0, /*sq_inplace_concat*/
  0, /*sq_inplace_repeat*/
};

static PyMappingMethods __pyx_tp_as_mapping_array = {
  __pyx_array___len__, /*mp_length*/
  __pyx_array___getitem__, /*mp_subscript*/
  __pyx_mp_ass_subscript_array, /*mp_ass_subscript*/
};

static PyBufferProcs __pyx_tp_as_buffer_array = {
  #if PY_MAJOR_VERSION < 3
  0, /*bf_getreadbuffer*/
  #endif
  #if PY_MAJOR_VERSION < 3
  0, /*bf_getwritebuffer*/
  #endif
  #if PY_MAJOR_VERSION < 3
  0, /*bf_getsegcount*/
  #endif
  #if PY_MAJOR_VERSION < 3
  0, /*bf_getcharbuffer*/
  #endif
  __pyx_array_getbuffer, /*bf_getbuffer*/
  0, /*bf_releasebuffer*/
};

static PyTypeObject __pyx_type___pyx_array = {
  PyVarObject_HEAD_INIT(0, 0)
  "TTS.tts.utils.monotonic_align.core.array", /*tp_name*/
  sizeof(struct __pyx_array_obj), /*tp_basicsize*/
  0, /*tp_itemsize*/
  __pyx_tp_dealloc_array, /*tp_dealloc*/
  #if PY_VERSION_HEX < 0x030800b4
  0, /*tp_print*/
  #endif
  #if PY_VERSION_HEX >= 0x030800b4
  0, /*tp_vectorcall_offset*/
  #endif
  0, /*tp_getattr*/
  0, /*tp_setattr*/
  #if PY_MAJOR_VERSION < 3
  0, /*tp_compare*/
  #endif
  #if PY_MAJOR_VERSION >= 3
  0, /*tp_as_async*/
  #endif
  0, /*tp_repr*/
  0, /*tp_as_number*/
  &__pyx_tp_as_sequence_array, /*tp_as_sequence*/
  &__pyx_tp_as_mapping_array, /*tp_as_mapping*/
  0, /*tp_hash*/
  0, /*tp_call*/
  0, /*tp_str*/
  __pyx_tp_getattro_array, /*tp_getattro*/
  0, /*tp_setattro*/
  &__pyx_tp_as_buffer_array, /*tp_as_buffer*/
  Py_TPFLAGS_DEFAULT|Py_TPFLAGS_HAVE_VERSION_TAG|Py_TPFLAGS_CHECKTYPES|Py_TPFLAGS_HAVE_NEWBUFFER|Py_TPFLAGS_BASETYPE, /*tp_flags*/
  0, /*tp_doc*/
  0, /*tp_traverse*/
  0, /*tp_clear*/
  0, /*tp_richcompare*/
  0, /*tp_weaklistoffset*/
  0, /*tp_iter*/
  0, /*tp_iternext*/
  __pyx_methods_array, /*tp_methods*/
  0, /*tp_members*/
  __pyx_getsets_array, /*tp_getset*/
  0, /*tp_base*/
  0, /*tp_dict*/
  0, /*tp_descr_get*/
  0, /*tp_descr_set*/
  0, /*tp_dictoffset*/
  0, /*tp_init*/
  0, /*tp_alloc*/
  __pyx_tp_new_array, /*tp_new*/
  0, /*tp_free*/
  0, /*tp_is_gc*/
  0, /*tp_bases*/
  0, /*tp_mro*/
  0, /*tp_cache*/
  0, /*tp_subclasses*/
  0, /*tp_weaklist*/
  0, /*tp_del*/
  0, /*tp_version_tag*/
  #if PY_VERSION_HEX >= 0x030400a1
  0, /*tp_finalize*/
  #endif
  #if PY_VERSION_HEX >= 0x030800b1 && (!CYTHON_COMPILING_IN_PYPY || PYPY_VERSION_NUM >= 0x07030800)
  0, /*tp_vectorcall*/
  #endif
  #if PY_VERSION_HEX >= 0x030800b4 && PY_VERSION_HEX < 0x03090000
  0, /*tp_print*/
  #endif
  #if CYTHON_COMPILING_IN_PYPY && PY_VERSION_HEX >= 0x03090000
  0, /*tp_pypy_flags*/
  #endif
};

static PyObject *__pyx_tp_new_Enum(PyTypeObject *t, CYTHON_UNUSED PyObject *a, CYTHON_UNUSED PyObject *k) {
  struct __pyx_MemviewEnum_obj *p;
  PyObject *o;
  if (likely((t->tp_flags & Py_TPFLAGS_IS_ABSTRACT) == 0)) {
    o = (*t->tp_alloc)(t, 0);
  } else {
    o = (PyObject *) PyBaseObject_Type.tp_new(t, __pyx_empty_tuple, 0);
  }
  if (unlikely(!o)) return 0;
  p = ((struct __pyx_MemviewEnum_obj *)o);
  p->name = Py_None; Py_INCREF(Py_None);
  return o;
}

static void __pyx_tp_dealloc_Enum(PyObject *o) {
  struct __pyx_MemviewEnum_obj *p = (struct __pyx_MemviewEnum_obj *)o;
  #if CYTHON_USE_TP_FINALIZE
  if (unlikely(PyType_HasFeature(Py_TYPE(o), Py_TPFLAGS_HAVE_FINALIZE) && Py_TYPE(o)->tp_finalize) && !_PyGC_FINALIZED(o)) {
    if (PyObject_CallFinalizerFromDealloc(o)) return;
  }
  #endif
  PyObject_GC_UnTrack(o);
  Py_CLEAR(p->name);
  (*Py_TYPE(o)->tp_free)(o);
}

static int __pyx_tp_traverse_Enum(PyObject *o, visitproc v, void *a) {
  int e;
  struct __pyx_MemviewEnum_obj *p = (struct __pyx_MemviewEnum_obj *)o;
  if (p->name) {
    e = (*v)(p->name, a); if (e) return e;
  }
  return 0;
}

static int __pyx_tp_clear_Enum(PyObject *o) {
  PyObject* tmp;
  struct __pyx_MemviewEnum_obj *p = (struct __pyx_MemviewEnum_obj *)o;
  tmp = ((PyObject*)p->name);
  p->name = Py_None; Py_INCREF(Py_None);
  Py_XDECREF(tmp);
  return 0;
}

static PyMethodDef __pyx_methods_Enum[] = {
  {"__reduce_cython__", (PyCFunction)__pyx_pw___pyx_MemviewEnum_1__reduce_cython__, METH_NOARGS, 0},
  {"__setstate_cython__", (PyCFunction)__pyx_pw___pyx_MemviewEnum_3__setstate_cython__, METH_O, 0},
  {0, 0, 0, 0}
};

static PyTypeObject __pyx_type___pyx_MemviewEnum = {
  PyVarObject_HEAD_INIT(0, 0)
  "TTS.tts.utils.monotonic_align.core.Enum", /*tp_name*/
  sizeof(struct __pyx_MemviewEnum_obj), /*tp_basicsize*/
  0, /*tp_itemsize*/
  __pyx_tp_dealloc_Enum, /*tp_dealloc*/
  #if PY_VERSION_HEX < 0x030800b4
  0, /*tp_print*/
  #endif
  #if PY_VERSION_HEX >= 0x030800b4
  0, /*tp_vectorcall_offset*/
  #endif
  0, /*tp_getattr*/
  0, /*tp_setattr*/
  #if PY_MAJOR_VERSION < 3
  0, /*tp_compare*/
  #endif
  #if PY_MAJOR_VERSION >= 3
  0, /*tp_as_async*/
  #endif
  __pyx_MemviewEnum___repr__, /*tp_repr*/
  0, /*tp_as_number*/
  0, /*tp_as_sequence*/
  0, /*tp_as_mapping*/
  0, /*tp_hash*/
  0, /*tp_call*/
  0, /*tp_str*/
  0, /*tp_getattro*/
  0, /*tp_setattro*/
  0, /*tp_as_buffer*/
  Py_TPFLAGS_DEFAULT|Py_TPFLAGS_HAVE_VERSION_TAG|Py_TPFLAGS_CHECKTYPES|Py_TPFLAGS_HAVE_NEWBUFFER|Py_TPFLAGS_BASETYPE|Py_TPFLAGS_HAVE_GC, /*tp_flags*/
  0, /*tp_doc*/
  __pyx_tp_traverse_Enum, /*tp_traverse*/
  __pyx_tp_clear_Enum, /*tp_clear*/
  0, /*tp_richcompare*/
  0, /*tp_weaklistoffset*/
  0, /*tp_iter*/
  0, /*tp_iternext*/
  __pyx_methods_Enum, /*tp_methods*/
  0, /*tp_members*/
  0, /*tp_getset*/
  0, /*tp_base*/
  0, /*tp_dict*/
  0, /*tp_descr_get*/
  0, /*tp_descr_set*/
  0, /*tp_dictoffset*/
  __pyx_MemviewEnum___init__, /*tp_init*/
  0, /*tp_alloc*/
  __pyx_tp_new_Enum, /*tp_new*/
  0, /*tp_free*/
  0, /*tp_is_gc*/
  0, /*tp_bases*/
  0, /*tp_mro*/
  0, /*tp_cache*/
  0, /*tp_subclasses*/
  0, /*tp_weaklist*/
  0, /*tp_del*/
  0, /*tp_version_tag*/
  #if PY_VERSION_HEX >= 0x030400a1
  0, /*tp_finalize*/
  #endif
  #if PY_VERSION_HEX >= 0x030800b1 && (!CYTHON_COMPILING_IN_PYPY || PYPY_VERSION_NUM >= 0x07030800)
  0, /*tp_vectorcall*/
  #endif
  #if PY_VERSION_HEX >= 0x030800b4 && PY_VERSION_HEX < 0x03090000
  0, /*tp_print*/
  #endif
  #if CYTHON_COMPILING_IN_PYPY && PY_VERSION_HEX >= 0x03090000
  0, /*tp_pypy_flags*/
  #endif
};
static struct __pyx_vtabstruct_memoryview __pyx_vtable_memoryview;

static PyObject *__pyx_tp_new_memoryview(PyTypeObject *t, PyObject *a, PyObject *k) {
  struct __pyx_memoryview_obj *p;
  PyObject *o;
  if (likely((t->tp_flags & Py_TPFLAGS_IS_ABSTRACT) == 0)) {
    o = (*t->tp_alloc)(t, 0);
  } else {
    o = (PyObject *) PyBaseObject_Type.tp_new(t, __pyx_empty_tuple, 0);
  }
  if (unlikely(!o)) return 0;
  p = ((struct __pyx_memoryview_obj *)o);
  p->__pyx_vtab = __pyx_vtabptr_memoryview;
  p->obj = Py_None; Py_INCREF(Py_None);
  p->_size = Py_None; Py_INCREF(Py_None);
  p->_array_interface = Py_None; Py_INCREF(Py_None);
  p->view.obj = NULL;
  if (unlikely(__pyx_memoryview___cinit__(o, a, k) < 0)) goto bad;
  return o;
  bad:
  Py_DECREF(o); o = 0;
  return NULL;
}

static void __pyx_tp_dealloc_memoryview(PyObject *o) {
  struct __pyx_memoryview_obj *p = (struct __pyx_memoryview_obj *)o;
  #if CYTHON_USE_TP_FINALIZE
  if (unlikely(PyType_HasFeature(Py_TYPE(o), Py_TPFLAGS_HAVE_FINALIZE) && Py_TYPE(o)->tp_finalize) && !_PyGC_FINALIZED(o)) {
    if (PyObject_CallFinalizerFromDealloc(o)) return;
  }
  #endif
  PyObject_GC_UnTrack(o);
  {
    PyObject *etype, *eval, *etb;
    PyErr_Fetch(&etype, &eval, &etb);
    __Pyx_SET_REFCNT(o, Py_REFCNT(o) + 1);
    __pyx_memoryview___dealloc__(o);
    __Pyx_SET_REFCNT(o, Py_REFCNT(o) - 1);
    PyErr_Restore(etype, eval, etb);
  }
  Py_CLEAR(p->obj);
  Py_CLEAR(p->_size);
  Py_CLEAR(p->_array_interface);
  (*Py_TYPE(o)->tp_free)(o);
}

static int __pyx_tp_traverse_memoryview(PyObject *o, visitproc v, void *a) {
  int e;
  struct __pyx_memoryview_obj *p = (struct __pyx_memoryview_obj *)o;
  if (p->obj) {
    e = (*v)(p->obj, a); if (e) return e;
  }
  if (p->_size) {
    e = (*v)(p->_size, a); if (e) return e;
  }
  if (p->_array_interface) {
    e = (*v)(p->_array_interface, a); if (e) return e;
  }
  if (p->view.obj) {
    e = (*v)(p->view.obj, a); if (e) return e;
  }
  return 0;
}

static int __pyx_tp_clear_memoryview(PyObject *o) {
  PyObject* tmp;
  struct __pyx_memoryview_obj *p = (struct __pyx_memoryview_obj *)o;
  tmp = ((PyObject*)p->obj);
  p->obj = Py_None; Py_INCREF(Py_None);
  Py_XDECREF(tmp);
  tmp = ((PyObject*)p->_size);
  p->_size = Py_None; Py_INCREF(Py_None);
  Py_XDECREF(tmp);
  tmp = ((PyObject*)p->_array_interface);
  p->_array_interface = Py_None; Py_INCREF(Py_None);
  Py_XDECREF(tmp);
  Py_CLEAR(p->view.obj);
  return 0;
}
static PyObject *__pyx_sq_item_memoryview(PyObject *o, Py_ssize_t i) {
  PyObject *r;
  PyObject *x = PyInt_FromSsize_t(i); if(!x) return 0;
  r = Py_TYPE(o)->tp_as_mapping->mp_subscript(o, x);
  Py_DECREF(x);
  return r;
}

static int __pyx_mp_ass_subscript_memoryview(PyObject *o, PyObject *i, PyObject *v) {
  if (v) {
    return __pyx_memoryview___setitem__(o, i, v);
  }
  else {
    PyErr_Format(PyExc_NotImplementedError,
      "Subscript deletion not supported by %.200s", Py_TYPE(o)->tp_name);
    return -1;
  }
}

static PyObject *__pyx_getprop___pyx_memoryview_T(PyObject *o, CYTHON_UNUSED void *x) {
  return __pyx_pw_15View_dot_MemoryView_10memoryview_1T_1__get__(o);
}

static PyObject *__pyx_getprop___pyx_memoryview_base(PyObject *o, CYTHON_UNUSED void *x) {
  return __pyx_pw_15View_dot_MemoryView_10memoryview_4base_1__get__(o);
}

static PyObject *__pyx_getprop___pyx_memoryview_shape(PyObject *o, CYTHON_UNUSED void *x) {
  return __pyx_pw_15View_dot_MemoryView_10memoryview_5shape_1__get__(o);
}

static PyObject *__pyx_getprop___pyx_memoryview_strides(PyObject *o, CYTHON_UNUSED void *x) {
  return __pyx_pw_15View_dot_MemoryView_10memoryview_7strides_1__get__(o);
}

static PyObject *__pyx_getprop___pyx_memoryview_suboffsets(PyObject *o, CYTHON_UNUSED void *x) {
  return __pyx_pw_15View_dot_MemoryView_10memoryview_10suboffsets_1__get__(o);
}

static PyObject *__pyx_getprop___pyx_memoryview_ndim(PyObject *o, CYTHON_UNUSED void *x) {
  return __pyx_pw_15View_dot_MemoryView_10memoryview_4ndim_1__get__(o);
}

static PyObject *__pyx_getprop___pyx_memoryview_itemsize(PyObject *o, CYTHON_UNUSED void *x) {
  return __pyx_pw_15View_dot_MemoryView_10memoryview_8itemsize_1__get__(o);
}

static PyObject *__pyx_getprop___pyx_memoryview_nbytes(PyObject *o, CYTHON_UNUSED void *x) {
  return __pyx_pw_15View_dot_MemoryView_10memoryview_6nbytes_1__get__(o);
}

static PyObject *__pyx_getprop___pyx_memoryview_size(PyObject *o, CYTHON_UNUSED void *x) {
  return __pyx_pw_15View_dot_MemoryView_10memoryview_4size_1__get__(o);
}

static PyMethodDef __pyx_methods_memoryview[] = {
  {"is_c_contig", (PyCFunction)__pyx_memoryview_is_c_contig, METH_NOARGS, 0},
  {"is_f_contig", (PyCFunction)__pyx_memoryview_is_f_contig, METH_NOARGS, 0},
  {"copy", (PyCFunction)__pyx_memoryview_copy, METH_NOARGS, 0},
  {"copy_fortran", (PyCFunction)__pyx_memoryview_copy_fortran, METH_NOARGS, 0},
  {"__reduce_cython__", (PyCFunction)__pyx_pw___pyx_memoryview_1__reduce_cython__, METH_NOARGS, 0},
  {"__setstate_cython__", (PyCFunction)__pyx_pw___pyx_memoryview_3__setstate_cython__, METH_O, 0},
  {0, 0, 0, 0}
};

static struct PyGetSetDef __pyx_getsets_memoryview[] = {
  {(char *)"T", __pyx_getprop___pyx_memoryview_T, 0, (char *)0, 0},
  {(char *)"base", __pyx_getprop___pyx_memoryview_base, 0, (char *)0, 0},
  {(char *)"shape", __pyx_getprop___pyx_memoryview_shape, 0, (char *)0, 0},
  {(char *)"strides", __pyx_getprop___pyx_memoryview_strides, 0, (char *)0, 0},
  {(char *)"suboffsets", __pyx_getprop___pyx_memoryview_suboffsets, 0, (char *)0, 0},
  {(char *)"ndim", __pyx_getprop___pyx_memoryview_ndim, 0, (char *)0, 0},
  {(char *)"itemsize", __pyx_getprop___pyx_memoryview_itemsize, 0, (char *)0, 0},
  {(char *)"nbytes", __pyx_getprop___pyx_memoryview_nbytes, 0, (char *)0, 0},
  {(char *)"size", __pyx_getprop___pyx_memoryview_size, 0, (char *)0, 0},
  {0, 0, 0, 0, 0}
};

static PySequenceMethods __pyx_tp_as_sequence_memoryview = {
  __pyx_memoryview___len__, /*sq_length*/
  0, /*sq_concat*/
  0, /*sq_repeat*/
  __pyx_sq_item_memoryview, /*sq_item*/
  0, /*sq_slice*/
  0, /*sq_ass_item*/
  0, /*sq_ass_slice*/
  0, /*sq_contains*/
  0, /*sq_inplace_concat*/
  0, /*sq_inplace_repeat*/
};

static PyMappingMethods __pyx_tp_as_mapping_memoryview = {
  __pyx_memoryview___len__, /*mp_length*/
  __pyx_memoryview___getitem__, /*mp_subscript*/
  __pyx_mp_ass_subscript_memoryview, /*mp_ass_subscript*/
};

static PyBufferProcs __pyx_tp_as_buffer_memoryview = {
  #if PY_MAJOR_VERSION < 3
  0, /*bf_getreadbuffer*/
  #endif
  #if PY_MAJOR_VERSION < 3
  0, /*bf_getwritebuffer*/
  #endif
  #if PY_MAJOR_VERSION < 3
  0, /*bf_getsegcount*/
  #endif
  #if PY_MAJOR_VERSION < 3
  0, /*bf_getcharbuffer*/
  #endif
  __pyx_memoryview_getbuffer, /*bf_getbuffer*/
  0, /*bf_releasebuffer*/
};

static PyTypeObject __pyx_type___pyx_memoryview = {
  PyVarObject_HEAD_INIT(0, 0)
  "TTS.tts.utils.monotonic_align.core.memoryview", /*tp_name*/
  sizeof(struct __pyx_memoryview_obj), /*tp_basicsize*/
  0, /*tp_itemsize*/
  __pyx_tp_dealloc_memoryview, /*tp_dealloc*/
  #if PY_VERSION_HEX < 0x030800b4
  0, /*tp_print*/
  #endif
  #if PY_VERSION_HEX >= 0x030800b4
  0, /*tp_vectorcall_offset*/
  #endif
  0, /*tp_getattr*/
  0, /*tp_setattr*/
  #if PY_MAJOR_VERSION < 3
  0, /*tp_compare*/
  #endif
  #if PY_MAJOR_VERSION >= 3
  0, /*tp_as_async*/
  #endif
  __pyx_memoryview___repr__, /*tp_repr*/
  0, /*tp_as_number*/
  &__pyx_tp_as_sequence_memoryview, /*tp_as_sequence*/
  &__pyx_tp_as_mapping_memoryview, /*tp_as_mapping*/
  0, /*tp_hash*/
  0, /*tp_call*/
  __pyx_memoryview___str__, /*tp_str*/
  0, /*tp_getattro*/
  0, /*tp_setattro*/
  &__pyx_tp_as_buffer_memoryview, /*tp_as_buffer*/
  Py_TPFLAGS_DEFAULT|Py_TPFLAGS_HAVE_VERSION_TAG|Py_TPFLAGS_CHECKTYPES|Py_TPFLAGS_HAVE_NEWBUFFER|Py_TPFLAGS_BASETYPE|Py_TPFLAGS_HAVE_GC, /*tp_flags*/
  0, /*tp_doc*/
  __pyx_tp_traverse_memoryview, /*tp_traverse*/
  __pyx_tp_clear_memoryview, /*tp_clear*/
  0, /*tp_richcompare*/
  0, /*tp_weaklistoffset*/
  0, /*tp_iter*/
  0, /*tp_iternext*/
  __pyx_methods_memoryview, /*tp_methods*/
  0, /*tp_members*/
  __pyx_getsets_memoryview, /*tp_getset*/
  0, /*tp_base*/
  0, /*tp_dict*/
  0, /*tp_descr_get*/
  0, /*tp_descr_set*/
  0, /*tp_dictoffset*/
  0, /*tp_init*/
  0, /*tp_alloc*/
  __pyx_tp_new_memoryview, /*tp_new*/
  0, /*tp_free*/
  0, /*tp_is_gc*/
  0, /*tp_bases*/
  0, /*tp_mro*/
  0, /*tp_cache*/
  0, /*tp_subclasses*/
  0, /*tp_weaklist*/
  0, /*tp_del*/
  0, /*tp_version_tag*/
  #if PY_VERSION_HEX >= 0x030400a1
  0, /*tp_finalize*/
  #endif
  #if PY_VERSION_HEX >= 0x030800b1 && (!CYTHON_COMPILING_IN_PYPY || PYPY_VERSION_NUM >= 0x07030800)
  0, /*tp_vectorcall*/
  #endif
  #if PY_VERSION_HEX >= 0x030800b4 && PY_VERSION_HEX < 0x03090000
  0, /*tp_print*/
  #endif
  #if CYTHON_COMPILING_IN_PYPY && PY_VERSION_HEX >= 0x03090000
  0, /*tp_pypy_flags*/
  #endif
};
static struct __pyx_vtabstruct__memoryviewslice __pyx_vtable__memoryviewslice;

static PyObject *__pyx_tp_new__memoryviewslice(PyTypeObject *t, PyObject *a, PyObject *k) {
  struct __pyx_memoryviewslice_obj *p;
  PyObject *o = __pyx_tp_new_memoryview(t, a, k);
  if (unlikely(!o)) return 0;
  p = ((struct __pyx_memoryviewslice_obj *)o);
  p->__pyx_base.__pyx_vtab = (struct __pyx_vtabstruct_memoryview*)__pyx_vtabptr__memoryviewslice;
  p->from_object = Py_None; Py_INCREF(Py_None);
  p->from_slice.memview = NULL;
  return o;
}

static void __pyx_tp_dealloc__memoryviewslice(PyObject *o) {
  struct __pyx_memoryviewslice_obj *p = (struct __pyx_memoryviewslice_obj *)o;
  #if CYTHON_USE_TP_FINALIZE
  if (unlikely(PyType_HasFeature(Py_TYPE(o), Py_TPFLAGS_HAVE_FINALIZE) && Py_TYPE(o)->tp_finalize) && !_PyGC_FINALIZED(o)) {
    if (PyObject_CallFinalizerFromDealloc(o)) return;
  }
  #endif
  PyObject_GC_UnTrack(o);
  {
    PyObject *etype, *eval, *etb;
    PyErr_Fetch(&etype, &eval, &etb);
    __Pyx_SET_REFCNT(o, Py_REFCNT(o) + 1);
    __pyx_memoryviewslice___dealloc__(o);
    __Pyx_SET_REFCNT(o, Py_REFCNT(o) - 1);
    PyErr_Restore(etype, eval, etb);
  }
  Py_CLEAR(p->from_object);
  PyObject_GC_Track(o);
  __pyx_tp_dealloc_memoryview(o);
}

static int __pyx_tp_traverse__memoryviewslice(PyObject *o, visitproc v, void *a) {
  int e;
  struct __pyx_memoryviewslice_obj *p = (struct __pyx_memoryviewslice_obj *)o;
  e = __pyx_tp_traverse_memoryview(o, v, a); if (e) return e;
  if (p->from_object) {
    e = (*v)(p->from_object, a); if (e) return e;
  }
  return 0;
}

static int __pyx_tp_clear__memoryviewslice(PyObject *o) {
  PyObject* tmp;
  struct __pyx_memoryviewslice_obj *p = (struct __pyx_memoryviewslice_obj *)o;
  __pyx_tp_clear_memoryview(o);
  tmp = ((PyObject*)p->from_object);
  p->from_object = Py_None; Py_INCREF(Py_None);
  Py_XDECREF(tmp);
  __PYX_XDEC_MEMVIEW(&p->from_slice, 1);
  return 0;
}

static PyObject *__pyx_getprop___pyx_memoryviewslice_base(PyObject *o, CYTHON_UNUSED void *x) {
  return __pyx_pw_15View_dot_MemoryView_16_memoryviewslice_4base_1__get__(o);
}

static PyMethodDef __pyx_methods__memoryviewslice[] = {
  {"__reduce_cython__", (PyCFunction)__pyx_pw___pyx_memoryviewslice_1__reduce_cython__, METH_NOARGS, 0},
  {"__setstate_cython__", (PyCFunction)__pyx_pw___pyx_memoryviewslice_3__setstate_cython__, METH_O, 0},
  {0, 0, 0, 0}
};

static struct PyGetSetDef __pyx_getsets__memoryviewslice[] = {
  {(char *)"base", __pyx_getprop___pyx_memoryviewslice_base, 0, (char *)0, 0},
  {0, 0, 0, 0, 0}
};

static PyTypeObject __pyx_type___pyx_memoryviewslice = {
  PyVarObject_HEAD_INIT(0, 0)
  "TTS.tts.utils.monotonic_align.core._memoryviewslice", /*tp_name*/
  sizeof(struct __pyx_memoryviewslice_obj), /*tp_basicsize*/
  0, /*tp_itemsize*/
  __pyx_tp_dealloc__memoryviewslice, /*tp_dealloc*/
  #if PY_VERSION_HEX < 0x030800b4
  0, /*tp_print*/
  #endif
  #if PY_VERSION_HEX >= 0x030800b4
  0, /*tp_vectorcall_offset*/
  #endif
  0, /*tp_getattr*/
  0, /*tp_setattr*/
  #if PY_MAJOR_VERSION < 3
  0, /*tp_compare*/
  #endif
  #if PY_MAJOR_VERSION >= 3
  0, /*tp_as_async*/
  #endif
  #if CYTHON_COMPILING_IN_PYPY
  __pyx_memoryview___repr__, /*tp_repr*/
  #else
  0, /*tp_repr*/
  #endif
  0, /*tp_as_number*/
  0, /*tp_as_sequence*/
  0, /*tp_as_mapping*/
  0, /*tp_hash*/
  0, /*tp_call*/
  #if CYTHON_COMPILING_IN_PYPY
  __pyx_memoryview___str__, /*tp_str*/
  #else
  0, /*tp_str*/
  #endif
  0, /*tp_getattro*/
  0, /*tp_setattro*/
  0, /*tp_as_buffer*/
  Py_TPFLAGS_DEFAULT|Py_TPFLAGS_HAVE_VERSION_TAG|Py_TPFLAGS_CHECKTYPES|Py_TPFLAGS_HAVE_NEWBUFFER|Py_TPFLAGS_BASETYPE|Py_TPFLAGS_HAVE_GC, /*tp_flags*/
  "Internal class for passing memoryview slices to Python", /*tp_doc*/
  __pyx_tp_traverse__memoryviewslice, /*tp_traverse*/
  __pyx_tp_clear__memoryviewslice, /*tp_clear*/
  0, /*tp_richcompare*/
  0, /*tp_weaklistoffset*/
  0, /*tp_iter*/
  0, /*tp_iternext*/
  __pyx_methods__memoryviewslice, /*tp_methods*/
  0, /*tp_members*/
  __pyx_getsets__memoryviewslice, /*tp_getset*/
  0, /*tp_base*/
  0, /*tp_dict*/
  0, /*tp_descr_get*/
  0, /*tp_descr_set*/
  0, /*tp_dictoffset*/
  0, /*tp_init*/
  0, /*tp_alloc*/
  __pyx_tp_new__memoryviewslice, /*tp_new*/
  0, /*tp_free*/
  0, /*tp_is_gc*/
  0, /*tp_bases*/
  0, /*tp_mro*/
  0, /*tp_cache*/
  0, /*tp_subclasses*/
  0, /*tp_weaklist*/
  0, /*tp_del*/
  0, /*tp_version_tag*/
  #if PY_VERSION_HEX >= 0x030400a1
  0, /*tp_finalize*/
  #endif
  #if PY_VERSION_HEX >= 0x030800b1 && (!CYTHON_COMPILING_IN_PYPY || PYPY_VERSION_NUM >= 0x07030800)
  0, /*tp_vectorcall*/
  #endif
  #if PY_VERSION_HEX >= 0x030800b4 && PY_VERSION_HEX < 0x03090000
  0, /*tp_print*/
  #endif
  #if CYTHON_COMPILING_IN_PYPY && PY_VERSION_HEX >= 0x03090000
  0, /*tp_pypy_flags*/
  #endif
};

static PyMethodDef __pyx_methods[] = {
  {"maximum_path_c", (PyCFunction)(void*)(PyCFunctionWithKeywords)__pyx_pw_3TTS_3tts_5utils_15monotonic_align_4core_1maximum_path_c, METH_VARARGS|METH_KEYWORDS, 0},
  {0, 0, 0, 0}
};

#if PY_MAJOR_VERSION >= 3
#if CYTHON_PEP489_MULTI_PHASE_INIT
static PyObject* __pyx_pymod_create(PyObject *spec, PyModuleDef *def); /*proto*/
static int __pyx_pymod_exec_core(PyObject* module); /*proto*/
static PyModuleDef_Slot __pyx_moduledef_slots[] = {
  {Py_mod_create, (void*)__pyx_pymod_create},
  {Py_mod_exec, (void*)__pyx_pymod_exec_core},
  {0, NULL}
};
#endif

static struct PyModuleDef __pyx_moduledef = {
    PyModuleDef_HEAD_INIT,
    "core",
    0, /* m_doc */
  #if CYTHON_PEP489_MULTI_PHASE_INIT
    0, /* m_size */
  #else
    -1, /* m_size */
  #endif
    __pyx_methods /* m_methods */,
  #if CYTHON_PEP489_MULTI_PHASE_INIT
    __pyx_moduledef_slots, /* m_slots */
  #else
    NULL, /* m_reload */
  #endif
    NULL, /* m_traverse */
    NULL, /* m_clear */
    NULL /* m_free */
};
#endif
#ifndef CYTHON_SMALL_CODE
#if defined(__clang__)
    #define CYTHON_SMALL_CODE
#elif defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
    #define CYTHON_SMALL_CODE __attribute__((cold))
#else
    #define CYTHON_SMALL_CODE
#endif
#endif

static __Pyx_StringTabEntry __pyx_string_tab[] = {
  {&__pyx_n_s_ASCII, __pyx_k_ASCII, sizeof(__pyx_k_ASCII), 0, 0, 1, 1},
  {&__pyx_kp_s_Buffer_view_does_not_expose_stri, __pyx_k_Buffer_view_does_not_expose_stri, sizeof(__pyx_k_Buffer_view_does_not_expose_stri), 0, 0, 1, 0},
  {&__pyx_kp_s_Can_only_create_a_buffer_that_is, __pyx_k_Can_only_create_a_buffer_that_is, sizeof(__pyx_k_Can_only_create_a_buffer_that_is), 0, 0, 1, 0},
  {&__pyx_kp_s_Cannot_assign_to_read_only_memor, __pyx_k_Cannot_assign_to_read_only_memor, sizeof(__pyx_k_Cannot_assign_to_read_only_memor), 0, 0, 1, 0},
  {&__pyx_kp_s_Cannot_create_writable_memory_vi, __pyx_k_Cannot_create_writable_memory_vi, sizeof(__pyx_k_Cannot_create_writable_memory_vi), 0, 0, 1, 0},
  {&__pyx_kp_s_Cannot_index_with_type_s, __pyx_k_Cannot_index_with_type_s, sizeof(__pyx_k_Cannot_index_with_type_s), 0, 0, 1, 0},
  {&__pyx_n_s_Ellipsis, __pyx_k_Ellipsis, sizeof(__pyx_k_Ellipsis), 0, 0, 1, 1},
  {&__pyx_kp_s_Empty_shape_tuple_for_cython_arr, __pyx_k_Empty_shape_tuple_for_cython_arr, sizeof(__pyx_k_Empty_shape_tuple_for_cython_arr), 0, 0, 1, 0},
  {&__pyx_n_s_ImportError, __pyx_k_ImportError, sizeof(__pyx_k_ImportError), 0, 0, 1, 1},
  {&__pyx_kp_s_Incompatible_checksums_s_vs_0xb0, __pyx_k_Incompatible_checksums_s_vs_0xb0, sizeof(__pyx_k_Incompatible_checksums_s_vs_0xb0), 0, 0, 1, 0},
  {&__pyx_n_s_IndexError, __pyx_k_IndexError, sizeof(__pyx_k_IndexError), 0, 0, 1, 1},
  {&__pyx_kp_s_Indirect_dimensions_not_supporte, __pyx_k_Indirect_dimensions_not_supporte, sizeof(__pyx_k_Indirect_dimensions_not_supporte), 0, 0, 1, 0},
  {&__pyx_kp_s_Invalid_mode_expected_c_or_fortr, __pyx_k_Invalid_mode_expected_c_or_fortr, sizeof(__pyx_k_Invalid_mode_expected_c_or_fortr), 0, 0, 1, 0},
  {&__pyx_kp_s_Invalid_shape_in_axis_d_d, __pyx_k_Invalid_shape_in_axis_d_d, sizeof(__pyx_k_Invalid_shape_in_axis_d_d), 0, 0, 1, 0},
  {&__pyx_n_s_MemoryError, __pyx_k_MemoryError, sizeof(__pyx_k_MemoryError), 0, 0, 1, 1},
  {&__pyx_kp_s_MemoryView_of_r_at_0x_x, __pyx_k_MemoryView_of_r_at_0x_x, sizeof(__pyx_k_MemoryView_of_r_at_0x_x), 0, 0, 1, 0},
  {&__pyx_kp_s_MemoryView_of_r_object, __pyx_k_MemoryView_of_r_object, sizeof(__pyx_k_MemoryView_of_r_object), 0, 0, 1, 0},
  {&__pyx_n_b_O, __pyx_k_O, sizeof(__pyx_k_O), 0, 0, 0, 1},
  {&__pyx_kp_s_Out_of_bounds_on_buffer_access_a, __pyx_k_Out_of_bounds_on_buffer_access_a, sizeof(__pyx_k_Out_of_bounds_on_buffer_access_a), 0, 0, 1, 0},
  {&__pyx_n_s_PickleError, __pyx_k_PickleError, sizeof(__pyx_k_PickleError), 0, 0, 1, 1},
  {&__pyx_n_s_TypeError, __pyx_k_TypeError, sizeof(__pyx_k_TypeError), 0, 0, 1, 1},
  {&__pyx_kp_s_Unable_to_convert_item_to_object, __pyx_k_Unable_to_convert_item_to_object, sizeof(__pyx_k_Unable_to_convert_item_to_object), 0, 0, 1, 0},
  {&__pyx_n_s_ValueError, __pyx_k_ValueError, sizeof(__pyx_k_ValueError), 0, 0, 1, 1},
  {&__pyx_n_s_View_MemoryView, __pyx_k_View_MemoryView, sizeof(__pyx_k_View_MemoryView), 0, 0, 1, 1},
  {&__pyx_n_s_allocate_buffer, __pyx_k_allocate_buffer, sizeof(__pyx_k_allocate_buffer), 0, 0, 1, 1},
  {&__pyx_n_s_base, __pyx_k_base, sizeof(__pyx_k_base), 0, 0, 1, 1},
  {&__pyx_n_s_c, __pyx_k_c, sizeof(__pyx_k_c), 0, 0, 1, 1},
  {&__pyx_n_u_c, __pyx_k_c, sizeof(__pyx_k_c), 0, 1, 0, 1},
  {&__pyx_n_s_class, __pyx_k_class, sizeof(__pyx_k_class), 0, 0, 1, 1},
  {&__pyx_n_s_cline_in_traceback, __pyx_k_cline_in_traceback, sizeof(__pyx_k_cline_in_traceback), 0, 0, 1, 1},
  {&__pyx_kp_s_contiguous_and_direct, __pyx_k_contiguous_and_direct, sizeof(__pyx_k_contiguous_and_direct), 0, 0, 1, 0},
  {&__pyx_kp_s_contiguous_and_indirect, __pyx_k_contiguous_and_indirect, sizeof(__pyx_k_contiguous_and_indirect), 0, 0, 1, 0},
  {&__pyx_n_s_dict, __pyx_k_dict, sizeof(__pyx_k_dict), 0, 0, 1, 1},
  {&__pyx_n_s_dtype_is_object, __pyx_k_dtype_is_object, sizeof(__pyx_k_dtype_is_object), 0, 0, 1, 1},
  {&__pyx_n_s_encode, __pyx_k_encode, sizeof(__pyx_k_encode), 0, 0, 1, 1},
  {&__pyx_n_s_enumerate, __pyx_k_enumerate, sizeof(__pyx_k_enumerate), 0, 0, 1, 1},
  {&__pyx_n_s_error, __pyx_k_error, sizeof(__pyx_k_error), 0, 0, 1, 1},
  {&__pyx_n_s_flags, __pyx_k_flags, sizeof(__pyx_k_flags), 0, 0, 1, 1},
  {&__pyx_n_s_format, __pyx_k_format, sizeof(__pyx_k_format), 0, 0, 1, 1},
  {&__pyx_n_s_fortran, __pyx_k_fortran, sizeof(__pyx_k_fortran), 0, 0, 1, 1},
  {&__pyx_n_u_fortran, __pyx_k_fortran, sizeof(__pyx_k_fortran), 0, 1, 0, 1},
  {&__pyx_n_s_getstate, __pyx_k_getstate, sizeof(__pyx_k_getstate), 0, 0, 1, 1},
  {&__pyx_kp_s_got_differing_extents_in_dimensi, __pyx_k_got_differing_extents_in_dimensi, sizeof(__pyx_k_got_differing_extents_in_dimensi), 0, 0, 1, 0},
  {&__pyx_n_s_id, __pyx_k_id, sizeof(__pyx_k_id), 0, 0, 1, 1},
  {&__pyx_n_s_import, __pyx_k_import, sizeof(__pyx_k_import), 0, 0, 1, 1},
  {&__pyx_n_s_itemsize, __pyx_k_itemsize, sizeof(__pyx_k_itemsize), 0, 0, 1, 1},
  {&__pyx_kp_s_itemsize_0_for_cython_array, __pyx_k_itemsize_0_for_cython_array, sizeof(__pyx_k_itemsize_0_for_cython_array), 0, 0, 1, 0},
  {&__pyx_n_s_main, __pyx_k_main, sizeof(__pyx_k_main), 0, 0, 1, 1},
  {&__pyx_n_s_max_neg_val, __pyx_k_max_neg_val, sizeof(__pyx_k_max_neg_val), 0, 0, 1, 1},
  {&__pyx_n_s_memview, __pyx_k_memview, sizeof(__pyx_k_memview), 0, 0, 1, 1},
  {&__pyx_n_s_mode, __pyx_k_mode, sizeof(__pyx_k_mode), 0, 0, 1, 1},
  {&__pyx_n_s_name, __pyx_k_name, sizeof(__pyx_k_name), 0, 0, 1, 1},
  {&__pyx_n_s_name_2, __pyx_k_name_2, sizeof(__pyx_k_name_2), 0, 0, 1, 1},
  {&__pyx_n_s_ndim, __pyx_k_ndim, sizeof(__pyx_k_ndim), 0, 0, 1, 1},
  {&__pyx_n_s_new, __pyx_k_new, sizeof(__pyx_k_new), 0, 0, 1, 1},
  {&__pyx_kp_s_no_default___reduce___due_to_non, __pyx_k_no_default___reduce___due_to_non, sizeof(__pyx_k_no_default___reduce___due_to_non), 0, 0, 1, 0},
  {&__pyx_n_s_np, __pyx_k_np, sizeof(__pyx_k_np), 0, 0, 1, 1},
  {&__pyx_n_s_numpy, __pyx_k_numpy, sizeof(__pyx_k_numpy), 0, 0, 1, 1},
  {&__pyx_kp_u_numpy_core_multiarray_failed_to, __pyx_k_numpy_core_multiarray_failed_to, sizeof(__pyx_k_numpy_core_multiarray_failed_to), 0, 1, 0, 0},
  {&__pyx_kp_u_numpy_core_umath_failed_to_impor, __pyx_k_numpy_core_umath_failed_to_impor, sizeof(__pyx_k_numpy_core_umath_failed_to_impor), 0, 1, 0, 0},
  {&__pyx_n_s_obj, __pyx_k_obj, sizeof(__pyx_k_obj), 0, 0, 1, 1},
  {&__pyx_n_s_pack, __pyx_k_pack, sizeof(__pyx_k_pack), 0, 0, 1, 1},
  {&__pyx_n_s_paths, __pyx_k_paths, sizeof(__pyx_k_paths), 0, 0, 1, 1},
  {&__pyx_n_s_pickle, __pyx_k_pickle, sizeof(__pyx_k_pickle), 0, 0, 1, 1},
  {&__pyx_n_s_pyx_PickleError, __pyx_k_pyx_PickleError, sizeof(__pyx_k_pyx_PickleError), 0, 0, 1, 1},
  {&__pyx_n_s_pyx_checksum, __pyx_k_pyx_checksum, sizeof(__pyx_k_pyx_checksum), 0, 0, 1, 1},
  {&__pyx_n_s_pyx_getbuffer, __pyx_k_pyx_getbuffer, sizeof(__pyx_k_pyx_getbuffer), 0, 0, 1, 1},
  {&__pyx_n_s_pyx_result, __pyx_k_pyx_result, sizeof(__pyx_k_pyx_result), 0, 0, 1, 1},
  {&__pyx_n_s_pyx_state, __pyx_k_pyx_state, sizeof(__pyx_k_pyx_state), 0, 0, 1, 1},
  {&__pyx_n_s_pyx_type, __pyx_k_pyx_type, sizeof(__pyx_k_pyx_type), 0, 0, 1, 1},
  {&__pyx_n_s_pyx_unpickle_Enum, __pyx_k_pyx_unpickle_Enum, sizeof(__pyx_k_pyx_unpickle_Enum), 0, 0, 1, 1},
  {&__pyx_n_s_pyx_vtable, __pyx_k_pyx_vtable, sizeof(__pyx_k_pyx_vtable), 0, 0, 1, 1},
  {&__pyx_n_s_range, __pyx_k_range, sizeof(__pyx_k_range), 0, 0, 1, 1},
  {&__pyx_n_s_reduce, __pyx_k_reduce, sizeof(__pyx_k_reduce), 0, 0, 1, 1},
  {&__pyx_n_s_reduce_cython, __pyx_k_reduce_cython, sizeof(__pyx_k_reduce_cython), 0, 0, 1, 1},
  {&__pyx_n_s_reduce_ex, __pyx_k_reduce_ex, sizeof(__pyx_k_reduce_ex), 0, 0, 1, 1},
  {&__pyx_n_s_setstate, __pyx_k_setstate, sizeof(__pyx_k_setstate), 0, 0, 1, 1},
  {&__pyx_n_s_setstate_cython, __pyx_k_setstate_cython, sizeof(__pyx_k_setstate_cython), 0, 0, 1, 1},
  {&__pyx_n_s_shape, __pyx_k_shape, sizeof(__pyx_k_shape), 0, 0, 1, 1},
  {&__pyx_n_s_size, __pyx_k_size, sizeof(__pyx_k_size), 0, 0, 1, 1},
  {&__pyx_n_s_start, __pyx_k_start, sizeof(__pyx_k_start), 0, 0, 1, 1},
  {&__pyx_n_s_step, __pyx_k_step, sizeof(__pyx_k_step), 0, 0, 1, 1},
  {&__pyx_n_s_stop, __pyx_k_stop, sizeof(__pyx_k_stop), 0, 0, 1, 1},
  {&__pyx_kp_s_strided_and_direct, __pyx_k_strided_and_direct, sizeof(__pyx_k_strided_and_direct), 0, 0, 1, 0},
  {&__pyx_kp_s_strided_and_direct_or_indirect, __pyx_k_strided_and_direct_or_indirect, sizeof(__pyx_k_strided_and_direct_or_indirect), 0, 0, 1, 0},
  {&__pyx_kp_s_strided_and_indirect, __pyx_k_strided_and_indirect, sizeof(__pyx_k_strided_and_indirect), 0, 0, 1, 0},
  {&__pyx_kp_s_stringsource, __pyx_k_stringsource, sizeof(__pyx_k_stringsource), 0, 0, 1, 0},
  {&__pyx_n_s_struct, __pyx_k_struct, sizeof(__pyx_k_struct), 0, 0, 1, 1},
  {&__pyx_n_s_t_xs, __pyx_k_t_xs, sizeof(__pyx_k_t_xs), 0, 0, 1, 1},
  {&__pyx_n_s_t_ys, __pyx_k_t_ys, sizeof(__pyx_k_t_ys), 0, 0, 1, 1},
  {&__pyx_n_s_test, __pyx_k_test, sizeof(__pyx_k_test), 0, 0, 1, 1},
  {&__pyx_kp_s_unable_to_allocate_array_data, __pyx_k_unable_to_allocate_array_data, sizeof(__pyx_k_unable_to_allocate_array_data), 0, 0, 1, 0},
  {&__pyx_kp_s_unable_to_allocate_shape_and_str, __pyx_k_unable_to_allocate_shape_and_str, sizeof(__pyx_k_unable_to_allocate_shape_and_str), 0, 0, 1, 0},
  {&__pyx_n_s_unpack, __pyx_k_unpack, sizeof(__pyx_k_unpack), 0, 0, 1, 1},
  {&__pyx_n_s_update, __pyx_k_update, sizeof(__pyx_k_update), 0, 0, 1, 1},
  {&__pyx_n_s_values, __pyx_k_values, sizeof(__pyx_k_values), 0, 0, 1, 1},
  {0, 0, 0, 0, 0, 0, 0}
};
static CYTHON_SMALL_CODE int __Pyx_InitCachedBuiltins(void) {
  __pyx_builtin_range = __Pyx_GetBuiltinName(__pyx_n_s_range); if (!__pyx_builtin_range) __PYX_ERR(0, 19, __pyx_L1_error)
  __pyx_builtin_ImportError = __Pyx_GetBuiltinName(__pyx_n_s_ImportError); if (!__pyx_builtin_ImportError) __PYX_ERR(1, 945, __pyx_L1_error)
  __pyx_builtin_ValueError = __Pyx_GetBuiltinName(__pyx_n_s_ValueError); if (!__pyx_builtin_ValueError) __PYX_ERR(2, 133, __pyx_L1_error)
  __pyx_builtin_MemoryError = __Pyx_GetBuiltinName(__pyx_n_s_MemoryError); if (!__pyx_builtin_MemoryError) __PYX_ERR(2, 148, __pyx_L1_error)
  __pyx_builtin_enumerate = __Pyx_GetBuiltinName(__pyx_n_s_enumerate); if (!__pyx_builtin_enumerate) __PYX_ERR(2, 151, __pyx_L1_error)
  __pyx_builtin_TypeError = __Pyx_GetBuiltinName(__pyx_n_s_TypeError); if (!__pyx_builtin_TypeError) __PYX_ERR(2, 2, __pyx_L1_error)
  __pyx_builtin_Ellipsis = __Pyx_GetBuiltinName(__pyx_n_s_Ellipsis); if (!__pyx_builtin_Ellipsis) __PYX_ERR(2, 404, __pyx_L1_error)
  __pyx_builtin_id = __Pyx_GetBuiltinName(__pyx_n_s_id); if (!__pyx_builtin_id) __PYX_ERR(2, 613, __pyx_L1_error)
  __pyx_builtin_IndexError = __Pyx_GetBuiltinName(__pyx_n_s_IndexError); if (!__pyx_builtin_IndexError) __PYX_ERR(2, 832, __pyx_L1_error)
  return 0;
  __pyx_L1_error:;
  return -1;
}

static CYTHON_SMALL_CODE int __Pyx_InitCachedConstants(void) {
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__Pyx_InitCachedConstants", 0);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":945
 *         __pyx_import_array()
 *     except Exception:
 *         raise ImportError("numpy.core.multiarray failed to import")             # <<<<<<<<<<<<<<
 * 
 * cdef inline int import_umath() except -1:
 */
  __pyx_tuple__2 = PyTuple_Pack(1, __pyx_kp_u_numpy_core_multiarray_failed_to); if (unlikely(!__pyx_tuple__2)) __PYX_ERR(1, 945, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__2);
  __Pyx_GIVEREF(__pyx_tuple__2);

  /* "C:/Users/GIANMA~1/AppData/Local/Temp/pip-build-env-eetufkp7/overlay/Lib/site-packages/numpy/__init__.pxd":951
 *         _import_umath()
 *     except Exception:
 *         raise ImportError("numpy.core.umath failed to import")             # <<<<<<<<<<<<<<
 * 
 * cdef inline int import_ufunc() except -1:
 */
  __pyx_tuple__3 = PyTuple_Pack(1, __pyx_kp_u_numpy_core_umath_failed_to_impor); if (unlikely(!__pyx_tuple__3)) __PYX_ERR(1, 951, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__3);
  __Pyx_GIVEREF(__pyx_tuple__3);

  /* "View.MemoryView":133
 * 
 *         if not self.ndim:
 *             raise ValueError("Empty shape tuple for cython.array")             # <<<<<<<<<<<<<<
 * 
 *         if itemsize <= 0:
 */
  __pyx_tuple__4 = PyTuple_Pack(1, __pyx_kp_s_Empty_shape_tuple_for_cython_arr); if (unlikely(!__pyx_tuple__4)) __PYX_ERR(2, 133, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__4);
  __Pyx_GIVEREF(__pyx_tuple__4);

  /* "View.MemoryView":136
 * 
 *         if itemsize <= 0:
 *             raise ValueError("itemsize <= 0 for cython.array")             # <<<<<<<<<<<<<<
 * 
 *         if not isinstance(format, bytes):
 */
  __pyx_tuple__5 = PyTuple_Pack(1, __pyx_kp_s_itemsize_0_for_cython_array); if (unlikely(!__pyx_tuple__5)) __PYX_ERR(2, 136, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__5);
  __Pyx_GIVEREF(__pyx_tuple__5);

  /* "View.MemoryView":148
 * 
 *         if not self._shape:
 *             raise MemoryError("unable to allocate shape and strides.")             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_tuple__6 = PyTuple_Pack(1, __pyx_kp_s_unable_to_allocate_shape_and_str); if (unlikely(!__pyx_tuple__6)) __PYX_ERR(2, 148, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__6);
  __Pyx_GIVEREF(__pyx_tuple__6);

  /* "View.MemoryView":176
 *             self.data = <char *>malloc(self.len)
 *             if not self.data:
 *                 raise MemoryError("unable to allocate array data.")             # <<<<<<<<<<<<<<
 * 
 *             if self.dtype_is_object:
 */
  __pyx_tuple__7 = PyTuple_Pack(1, __pyx_kp_s_unable_to_allocate_array_data); if (unlikely(!__pyx_tuple__7)) __PYX_ERR(2, 176, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__7);
  __Pyx_GIVEREF(__pyx_tuple__7);

  /* "View.MemoryView":192
 *             bufmode = PyBUF_F_CONTIGUOUS | PyBUF_ANY_CONTIGUOUS
 *         if not (flags & bufmode):
 *             raise ValueError("Can only create a buffer that is contiguous in memory.")             # <<<<<<<<<<<<<<
 *         info.buf = self.data
 *         info.len = self.len
 */
  __pyx_tuple__8 = PyTuple_Pack(1, __pyx_kp_s_Can_only_create_a_buffer_that_is); if (unlikely(!__pyx_tuple__8)) __PYX_ERR(2, 192, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__8);
  __Pyx_GIVEREF(__pyx_tuple__8);

  /* "(tree fragment)":2
 * def __reduce_cython__(self):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")             # <<<<<<<<<<<<<<
 * def __setstate_cython__(self, __pyx_state):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 */
  __pyx_tuple__9 = PyTuple_Pack(1, __pyx_kp_s_no_default___reduce___due_to_non); if (unlikely(!__pyx_tuple__9)) __PYX_ERR(2, 2, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__9);
  __Pyx_GIVEREF(__pyx_tuple__9);

  /* "(tree fragment)":4
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")             # <<<<<<<<<<<<<<
 */
  __pyx_tuple__10 = PyTuple_Pack(1, __pyx_kp_s_no_default___reduce___due_to_non); if (unlikely(!__pyx_tuple__10)) __PYX_ERR(2, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__10);
  __Pyx_GIVEREF(__pyx_tuple__10);

  /* "View.MemoryView":418
 *     def __setitem__(memoryview self, object index, object value):
 *         if self.view.readonly:
 *             raise TypeError("Cannot assign to read-only memoryview")             # <<<<<<<<<<<<<<
 * 
 *         have_slices, index = _unellipsify(index, self.view.ndim)
 */
  __pyx_tuple__11 = PyTuple_Pack(1, __pyx_kp_s_Cannot_assign_to_read_only_memor); if (unlikely(!__pyx_tuple__11)) __PYX_ERR(2, 418, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__11);
  __Pyx_GIVEREF(__pyx_tuple__11);

  /* "View.MemoryView":495
 *             result = struct.unpack(self.view.format, bytesitem)
 *         except struct.error:
 *             raise ValueError("Unable to convert item to object")             # <<<<<<<<<<<<<<
 *         else:
 *             if len(self.view.format) == 1:
 */
  __pyx_tuple__12 = PyTuple_Pack(1, __pyx_kp_s_Unable_to_convert_item_to_object); if (unlikely(!__pyx_tuple__12)) __PYX_ERR(2, 495, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__12);
  __Pyx_GIVEREF(__pyx_tuple__12);

  /* "View.MemoryView":520
 *     def __getbuffer__(self, Py_buffer *info, int flags):
 *         if flags & PyBUF_WRITABLE and self.view.readonly:
 *             raise ValueError("Cannot create writable memory view from read-only memoryview")             # <<<<<<<<<<<<<<
 * 
 *         if flags & PyBUF_ND:
 */
  __pyx_tuple__13 = PyTuple_Pack(1, __pyx_kp_s_Cannot_create_writable_memory_vi); if (unlikely(!__pyx_tuple__13)) __PYX_ERR(2, 520, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__13);
  __Pyx_GIVEREF(__pyx_tuple__13);

  /* "View.MemoryView":570
 *         if self.view.strides == NULL:
 * 
 *             raise ValueError("Buffer view does not expose strides")             # <<<<<<<<<<<<<<
 * 
 *         return tuple([stride for stride in self.view.strides[:self.view.ndim]])
 */
  __pyx_tuple__14 = PyTuple_Pack(1, __pyx_kp_s_Buffer_view_does_not_expose_stri); if (unlikely(!__pyx_tuple__14)) __PYX_ERR(2, 570, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__14);
  __Pyx_GIVEREF(__pyx_tuple__14);

  /* "View.MemoryView":577
 *     def suboffsets(self):
 *         if self.view.suboffsets == NULL:
 *             return (-1,) * self.view.ndim             # <<<<<<<<<<<<<<
 * 
 *         return tuple([suboffset for suboffset in self.view.suboffsets[:self.view.ndim]])
 */
  __pyx_tuple__15 = PyTuple_New(1); if (unlikely(!__pyx_tuple__15)) __PYX_ERR(2, 577, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__15);
  __Pyx_INCREF(__pyx_int_neg_1);
  __Pyx_GIVEREF(__pyx_int_neg_1);
  PyTuple_SET_ITEM(__pyx_tuple__15, 0, __pyx_int_neg_1);
  __Pyx_GIVEREF(__pyx_tuple__15);

  /* "(tree fragment)":2
 * def __reduce_cython__(self):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")             # <<<<<<<<<<<<<<
 * def __setstate_cython__(self, __pyx_state):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 */
  __pyx_tuple__16 = PyTuple_Pack(1, __pyx_kp_s_no_default___reduce___due_to_non); if (unlikely(!__pyx_tuple__16)) __PYX_ERR(2, 2, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__16);
  __Pyx_GIVEREF(__pyx_tuple__16);

  /* "(tree fragment)":4
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")             # <<<<<<<<<<<<<<
 */
  __pyx_tuple__17 = PyTuple_Pack(1, __pyx_kp_s_no_default___reduce___due_to_non); if (unlikely(!__pyx_tuple__17)) __PYX_ERR(2, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__17);
  __Pyx_GIVEREF(__pyx_tuple__17);

  /* "View.MemoryView":682
 *         if item is Ellipsis:
 *             if not seen_ellipsis:
 *                 result.extend([slice(None)] * (ndim - len(tup) + 1))             # <<<<<<<<<<<<<<
 *                 seen_ellipsis = True
 *             else:
 */
  __pyx_slice__18 = PySlice_New(Py_None, Py_None, Py_None); if (unlikely(!__pyx_slice__18)) __PYX_ERR(2, 682, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_slice__18);
  __Pyx_GIVEREF(__pyx_slice__18);

  /* "View.MemoryView":703
 *     for suboffset in suboffsets[:ndim]:
 *         if suboffset >= 0:
 *             raise ValueError("Indirect dimensions not supported")             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_tuple__19 = PyTuple_Pack(1, __pyx_kp_s_Indirect_dimensions_not_supporte); if (unlikely(!__pyx_tuple__19)) __PYX_ERR(2, 703, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__19);
  __Pyx_GIVEREF(__pyx_tuple__19);

  /* "(tree fragment)":2
 * def __reduce_cython__(self):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")             # <<<<<<<<<<<<<<
 * def __setstate_cython__(self, __pyx_state):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 */
  __pyx_tuple__20 = PyTuple_Pack(1, __pyx_kp_s_no_default___reduce___due_to_non); if (unlikely(!__pyx_tuple__20)) __PYX_ERR(2, 2, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__20);
  __Pyx_GIVEREF(__pyx_tuple__20);

  /* "(tree fragment)":4
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")
 * def __setstate_cython__(self, __pyx_state):
 *     raise TypeError("no default __reduce__ due to non-trivial __cinit__")             # <<<<<<<<<<<<<<
 */
  __pyx_tuple__21 = PyTuple_Pack(1, __pyx_kp_s_no_default___reduce___due_to_non); if (unlikely(!__pyx_tuple__21)) __PYX_ERR(2, 4, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__21);
  __Pyx_GIVEREF(__pyx_tuple__21);

  /* "View.MemoryView":286
 *         return self.name
 * 
 * cdef generic = Enum("<strided and direct or indirect>")             # <<<<<<<<<<<<<<
 * cdef strided = Enum("<strided and direct>") # default
 * cdef indirect = Enum("<strided and indirect>")
 */
  __pyx_tuple__22 = PyTuple_Pack(1, __pyx_kp_s_strided_and_direct_or_indirect); if (unlikely(!__pyx_tuple__22)) __PYX_ERR(2, 286, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__22);
  __Pyx_GIVEREF(__pyx_tuple__22);

  /* "View.MemoryView":287
 * 
 * cdef generic = Enum("<strided and direct or indirect>")
 * cdef strided = Enum("<strided and direct>") # default             # <<<<<<<<<<<<<<
 * cdef indirect = Enum("<strided and indirect>")
 * 
 */
  __pyx_tuple__23 = PyTuple_Pack(1, __pyx_kp_s_strided_and_direct); if (unlikely(!__pyx_tuple__23)) __PYX_ERR(2, 287, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__23);
  __Pyx_GIVEREF(__pyx_tuple__23);

  /* "View.MemoryView":288
 * cdef generic = Enum("<strided and direct or indirect>")
 * cdef strided = Enum("<strided and direct>") # default
 * cdef indirect = Enum("<strided and indirect>")             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_tuple__24 = PyTuple_Pack(1, __pyx_kp_s_strided_and_indirect); if (unlikely(!__pyx_tuple__24)) __PYX_ERR(2, 288, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__24);
  __Pyx_GIVEREF(__pyx_tuple__24);

  /* "View.MemoryView":291
 * 
 * 
 * cdef contiguous = Enum("<contiguous and direct>")             # <<<<<<<<<<<<<<
 * cdef indirect_contiguous = Enum("<contiguous and indirect>")
 * 
 */
  __pyx_tuple__25 = PyTuple_Pack(1, __pyx_kp_s_contiguous_and_direct); if (unlikely(!__pyx_tuple__25)) __PYX_ERR(2, 291, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__25);
  __Pyx_GIVEREF(__pyx_tuple__25);

  /* "View.MemoryView":292
 * 
 * cdef contiguous = Enum("<contiguous and direct>")
 * cdef indirect_contiguous = Enum("<contiguous and indirect>")             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_tuple__26 = PyTuple_Pack(1, __pyx_kp_s_contiguous_and_indirect); if (unlikely(!__pyx_tuple__26)) __PYX_ERR(2, 292, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__26);
  __Pyx_GIVEREF(__pyx_tuple__26);

  /* "(tree fragment)":1
 * def __pyx_unpickle_Enum(__pyx_type, long __pyx_checksum, __pyx_state):             # <<<<<<<<<<<<<<
 *     cdef object __pyx_PickleError
 *     cdef object __pyx_result
 */
  __pyx_tuple__27 = PyTuple_Pack(5, __pyx_n_s_pyx_type, __pyx_n_s_pyx_checksum, __pyx_n_s_pyx_state, __pyx_n_s_pyx_PickleError, __pyx_n_s_pyx_result); if (unlikely(!__pyx_tuple__27)) __PYX_ERR(2, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__27);
  __Pyx_GIVEREF(__pyx_tuple__27);
  __pyx_codeobj__28 = (PyObject*)__Pyx_PyCode_New(3, 0, 5, 0, CO_OPTIMIZED|CO_NEWLOCALS, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__27, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_stringsource, __pyx_n_s_pyx_unpickle_Enum, 1, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__28)) __PYX_ERR(2, 1, __pyx_L1_error)
  __Pyx_RefNannyFinishContext();
  return 0;
  __pyx_L1_error:;
  __Pyx_RefNannyFinishContext();
  return -1;
}

static CYTHON_SMALL_CODE int __Pyx_InitGlobals(void) {
  /* InitThreads.init */
  #if defined(WITH_THREAD) && PY_VERSION_HEX < 0x030700F0
PyEval_InitThreads();
#endif

if (unlikely(PyErr_Occurred())) __PYX_ERR(0, 1, __pyx_L1_error)

  if (__Pyx_InitStrings(__pyx_string_tab) < 0) __PYX_ERR(0, 1, __pyx_L1_error);
  __pyx_int_0 = PyInt_FromLong(0); if (unlikely(!__pyx_int_0)) __PYX_ERR(0, 1, __pyx_L1_error)
  __pyx_int_1 = PyInt_FromLong(1); if (unlikely(!__pyx_int_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __pyx_int_184977713 = PyInt_FromLong(184977713L); if (unlikely(!__pyx_int_184977713)) __PYX_ERR(0, 1, __pyx_L1_error)
  __pyx_int_neg_1 = PyInt_FromLong(-1); if (unlikely(!__pyx_int_neg_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  return 0;
  __pyx_L1_error:;
  return -1;
}

static CYTHON_SMALL_CODE int __Pyx_modinit_global_init_code(void); /*proto*/
static CYTHON_SMALL_CODE int __Pyx_modinit_variable_export_code(void); /*proto*/
static CYTHON_SMALL_CODE int __Pyx_modinit_function_export_code(void); /*proto*/
static CYTHON_SMALL_CODE int __Pyx_modinit_type_init_code(void); /*proto*/
static CYTHON_SMALL_CODE int __Pyx_modinit_type_import_code(void); /*proto*/
static CYTHON_SMALL_CODE int __Pyx_modinit_variable_import_code(void); /*proto*/
static CYTHON_SMALL_CODE int __Pyx_modinit_function_import_code(void); /*proto*/

static int __Pyx_modinit_global_init_code(void) {
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__Pyx_modinit_global_init_code", 0);
  /*--- Global init code ---*/
  generic = Py_None; Py_INCREF(Py_None);
  strided = Py_None; Py_INCREF(Py_None);
  indirect = Py_None; Py_INCREF(Py_None);
  contiguous = Py_None; Py_INCREF(Py_None);
  indirect_contiguous = Py_None; Py_INCREF(Py_None);
  __Pyx_RefNannyFinishContext();
  return 0;
}

static int __Pyx_modinit_variable_export_code(void) {
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__Pyx_modinit_variable_export_code", 0);
  /*--- Variable export code ---*/
  __Pyx_RefNannyFinishContext();
  return 0;
}

static int __Pyx_modinit_function_export_code(void) {
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__Pyx_modinit_function_export_code", 0);
  /*--- Function export code ---*/
  __Pyx_RefNannyFinishContext();
  return 0;
}

static int __Pyx_modinit_type_init_code(void) {
  __Pyx_RefNannyDeclarations
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__Pyx_modinit_type_init_code", 0);
  /*--- Type init code ---*/
  __pyx_vtabptr_array = &__pyx_vtable_array;
  __pyx_vtable_array.get_memview = (PyObject *(*)(struct __pyx_array_obj *))__pyx_array_get_memview;
  if (PyType_Ready(&__pyx_type___pyx_array) < 0) __PYX_ERR(2, 105, __pyx_L1_error)
  #if PY_VERSION_HEX < 0x030800B1
  __pyx_type___pyx_array.tp_print = 0;
  #endif
  if (__Pyx_SetVtable(__pyx_type___pyx_array.tp_dict, __pyx_vtabptr_array) < 0) __PYX_ERR(2, 105, __pyx_L1_error)
  if (__Pyx_setup_reduce((PyObject*)&__pyx_type___pyx_array) < 0) __PYX_ERR(2, 105, __pyx_L1_error)
  __pyx_array_type = &__pyx_type___pyx_array;
  if (PyType_Ready(&__pyx_type___pyx_MemviewEnum) < 0) __PYX_ERR(2, 279, __pyx_L1_error)
  #if PY_VERSION_HEX < 0x030800B1
  __pyx_type___pyx_MemviewEnum.tp_print = 0;
  #endif
  if ((CYTHON_USE_TYPE_SLOTS && CYTHON_USE_PYTYPE_LOOKUP) && likely(!__pyx_type___pyx_MemviewEnum.tp_dictoffset && __pyx_type___pyx_MemviewEnum.tp_getattro == PyObject_GenericGetAttr)) {
    __pyx_type___pyx_MemviewEnum.tp_getattro = __Pyx_PyObject_GenericGetAttr;
  }
  if (__Pyx_setup_reduce((PyObject*)&__pyx_type___pyx_MemviewEnum) < 0) __PYX_ERR(2, 279, __pyx_L1_error)
  __pyx_MemviewEnum_type = &__pyx_type___pyx_MemviewEnum;
  __pyx_vtabptr_memoryview = &__pyx_vtable_memoryview;
  __pyx_vtable_memoryview.get_item_pointer = (char *(*)(struct __pyx_memoryview_obj *, PyObject *))__pyx_memoryview_get_item_pointer;
  __pyx_vtable_memoryview.is_slice = (PyObject *(*)(struct __pyx_memoryview_obj *, PyObject *))__pyx_memoryview_is_slice;
  __pyx_vtable_memoryview.setitem_slice_assignment = (PyObject *(*)(struct __pyx_memoryview_obj *, PyObject *, PyObject *))__pyx_memoryview_setitem_slice_assignment;
  __pyx_vtable_memoryview.setitem_slice_assign_scalar = (PyObject *(*)(struct __pyx_memoryview_obj *, struct __pyx_memoryview_obj *, PyObject *))__pyx_memoryview_setitem_slice_assign_scalar;
  __pyx_vtable_memoryview.setitem_indexed = (PyObject *(*)(struct __pyx_memoryview_obj *, PyObject *, PyObject *))__pyx_memoryview_setitem_indexed;
  __pyx_vtable_memoryview.convert_item_to_object = (PyObject *(*)(struct __pyx_memoryview_obj *, char *))__pyx_memoryview_convert_item_to_object;
  __pyx_vtable_memoryview.assign_item_from_object = (PyObject *(*)(struct __pyx_memoryview_obj *, char *, PyObject *))__pyx_memoryview_assign_item_from_object;
  if (PyType_Ready(&__pyx_type___pyx_memoryview) < 0) __PYX_ERR(2, 330, __pyx_L1_error)
  #if PY_VERSION_HEX < 0x030800B1
  __pyx_type___pyx_memoryview.tp_print = 0;
  #endif
  if ((CYTHON_USE_TYPE_SLOTS && CYTHON_USE_PYTYPE_LOOKUP) && likely(!__pyx_type___pyx_memoryview.tp_dictoffset && __pyx_type___pyx_memoryview.tp_getattro == PyObject_GenericGetAttr)) {
    __pyx_type___pyx_memoryview.tp_getattro = __Pyx_PyObject_GenericGetAttr;
  }
  if (__Pyx_SetVtable(__pyx_type___pyx_memoryview.tp_dict, __pyx_vtabptr_memoryview) < 0) __PYX_ERR(2, 330, __pyx_L1_error)
  if (__Pyx_setup_reduce((PyObject*)&__pyx_type___pyx_memoryview) < 0) __PYX_ERR(2, 330, __pyx_L1_error)
  __pyx_memoryview_type = &__pyx_type___pyx_memoryview;
  __pyx_vtabptr__memoryviewslice = &__pyx_vtable__memoryviewslice;
  __pyx_vtable__memoryviewslice.__pyx_base = *__pyx_vtabptr_memoryview;
  __pyx_vtable__memoryviewslice.__pyx_base.convert_item_to_object = (PyObject *(*)(struct __pyx_memoryview_obj *, char *))__pyx_memoryviewslice_convert_item_to_object;
  __pyx_vtable__memoryviewslice.__pyx_base.assign_item_from_object = (PyObject *(*)(struct __pyx_memoryview_obj *, char *, PyObject *))__pyx_memoryviewslice_assign_item_from_object;
  __pyx_type___pyx_memoryviewslice.tp_base = __pyx_memoryview_type;
  if (PyType_Ready(&__pyx_type___pyx_memoryviewslice) < 0) __PYX_ERR(2, 965, __pyx_L1_error)
  #if PY_VERSION_HEX < 0x030800B1
  __pyx_type___pyx_memoryviewslice.tp_print = 0;
  #endif
  if ((CYTHON_USE_TYPE_SLOTS && CYTHON_USE_PYTYPE_LOOKUP) && likely(!__pyx_type___pyx_memoryviewslice.tp_dictoffset && __pyx_type___pyx_memoryviewslice.tp_getattro == PyObject_GenericGetAttr)) {
    __pyx_type___pyx_memoryviewslice.tp_getattro = __Pyx_PyObject_GenericGetAttr;
  }
  if (__Pyx_SetVtable(__pyx_type___pyx_memoryviewslice.tp_dict, __pyx_vtabptr__memoryviewslice) < 0) __PYX_ERR(2, 965, __pyx_L1_error)
  if (__Pyx_setup_reduce((PyObject*)&__pyx_type___pyx_memoryviewslice) < 0) __PYX_ERR(2, 965, __pyx_L1_error)
  __pyx_memoryviewslice_type = &__pyx_type___pyx_memoryviewslice;
  __Pyx_RefNannyFinishContext();
  return 0;
  __pyx_L1_error:;
  __Pyx_RefNannyFinishContext();
  return -1;
}

static int __Pyx_modinit_type_import_code(void) {
  __Pyx_RefNannyDeclarations
  PyObject *__pyx_t_1 = NULL;
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannySetupContext("__Pyx_modinit_type_import_code", 0);
  /*--- Type import code ---*/
  __pyx_t_1 = PyImport_ImportModule(__Pyx_BUILTIN_MODULE_NAME); if (unlikely(!__pyx_t_1)) __PYX_ERR(3, 9, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_ptype_7cpython_4type_type = __Pyx_ImportType(__pyx_t_1, __Pyx_BUILTIN_MODULE_NAME, "type", 
  #if defined(PYPY_VERSION_NUM) && PYPY_VERSION_NUM < 0x050B0000
  sizeof(PyTypeObject),
  #else
  sizeof(PyHeapTypeObject),
  #endif
  __Pyx_ImportType_CheckSize_Warn);
   if (!__pyx_ptype_7cpython_4type_type) __PYX_ERR(3, 9, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_t_1 = PyImport_ImportModule("numpy"); if (unlikely(!__pyx_t_1)) __PYX_ERR(1, 200, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __pyx_ptype_5numpy_dtype = __Pyx_ImportType(__pyx_t_1, "numpy", "dtype", sizeof(PyArray_Descr), __Pyx_ImportType_CheckSize_Ignore);
   if (!__pyx_ptype_5numpy_dtype) __PYX_ERR(1, 200, __pyx_L1_error)
  __pyx_ptype_5numpy_flatiter = __Pyx_ImportType(__pyx_t_1, "numpy", "flatiter", sizeof(PyArrayIterObject), __Pyx_ImportType_CheckSize_Ignore);
   if (!__pyx_ptype_5numpy_flatiter) __PYX_ERR(1, 223, __pyx_L1_error)
  __pyx_ptype_5numpy_broadcast = __Pyx_ImportType(__pyx_t_1, "numpy", "broadcast", sizeof(PyArrayMultiIterObject), __Pyx_ImportType_CheckSize_Ignore);
   if (!__pyx_ptype_5numpy_broadcast) __PYX_ERR(1, 227, __pyx_L1_error)
  __pyx_ptype_5numpy_ndarray = __Pyx_ImportType(__pyx_t_1, "numpy", "ndarray", sizeof(PyArrayObject), __Pyx_ImportType_CheckSize_Ignore);
   if (!__pyx_ptype_5numpy_ndarray) __PYX_ERR(1, 239, __pyx_L1_error)
  __pyx_ptype_5numpy_generic = __Pyx_ImportType(__pyx_t_1, "numpy", "generic", sizeof(PyObject), __Pyx_ImportType_CheckSize_Warn);
   if (!__pyx_ptype_5numpy_generic) __PYX_ERR(1, 771, __pyx_L1_error)
  __pyx_ptype_5numpy_number = __Pyx_ImportType(__pyx_t_1, "numpy", "number", sizeof(PyObject), __Pyx_ImportType_CheckSize_Warn);
   if (!__pyx_ptype_5numpy_number) __PYX_ERR(1, 773, __pyx_L1_error)
  __pyx_ptype_5numpy_integer = __Pyx_ImportType(__pyx_t_1, "numpy", "integer", sizeof(PyObject), __Pyx_ImportType_CheckSize_Warn);
   if (!__pyx_ptype_5numpy_integer) __PYX_ERR(1, 775, __pyx_L1_error)
  __pyx_ptype_5numpy_signedinteger = __Pyx_ImportType(__pyx_t_1, "numpy", "signedinteger", sizeof(PyObject), __Pyx_ImportType_CheckSize_Warn);
   if (!__pyx_ptype_5numpy_signedinteger) __PYX_ERR(1, 777, __pyx_L1_error)
  __pyx_ptype_5numpy_unsignedinteger = __Pyx_ImportType(__pyx_t_1, "numpy", "unsignedinteger", sizeof(PyObject), __Pyx_ImportType_CheckSize_Warn);
   if (!__pyx_ptype_5numpy_unsignedinteger) __PYX_ERR(1, 779, __pyx_L1_error)
  __pyx_ptype_5numpy_inexact = __Pyx_ImportType(__pyx_t_1, "numpy", "inexact", sizeof(PyObject), __Pyx_ImportType_CheckSize_Warn);
   if (!__pyx_ptype_5numpy_inexact) __PYX_ERR(1, 781, __pyx_L1_error)
  __pyx_ptype_5numpy_floating = __Pyx_ImportType(__pyx_t_1, "numpy", "floating", sizeof(PyObject), __Pyx_ImportType_CheckSize_Warn);
   if (!__pyx_ptype_5numpy_floating) __PYX_ERR(1, 783, __pyx_L1_error)
  __pyx_ptype_5numpy_complexfloating = __Pyx_ImportType(__pyx_t_1, "numpy", "complexfloating", sizeof(PyObject), __Pyx_ImportType_CheckSize_Warn);
   if (!__pyx_ptype_5numpy_complexfloating) __PYX_ERR(1, 785, __pyx_L1_error)
  __pyx_ptype_5numpy_flexible = __Pyx_ImportType(__pyx_t_1, "numpy", "flexible", sizeof(PyObject), __Pyx_ImportType_CheckSize_Warn);
   if (!__pyx_ptype_5numpy_flexible) __PYX_ERR(1, 787, __pyx_L1_error)
  __pyx_ptype_5numpy_character = __Pyx_ImportType(__pyx_t_1, "numpy", "character", sizeof(PyObject), __Pyx_ImportType_CheckSize_Warn);
   if (!__pyx_ptype_5numpy_character) __PYX_ERR(1, 789, __pyx_L1_error)
  __pyx_ptype_5numpy_ufunc = __Pyx_ImportType(__pyx_t_1, "numpy", "ufunc", sizeof(PyUFuncObject), __Pyx_ImportType_CheckSize_Ignore);
   if (!__pyx_ptype_5numpy_ufunc) __PYX_ERR(1, 827, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __Pyx_RefNannyFinishContext();
  return 0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  __Pyx_RefNannyFinishContext();
  return -1;
}

static int __Pyx_modinit_variable_import_code(void) {
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__Pyx_modinit_variable_import_code", 0);
  /*--- Variable import code ---*/
  __Pyx_RefNannyFinishContext();
  return 0;
}

static int __Pyx_modinit_function_import_code(void) {
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("__Pyx_modinit_function_import_code", 0);
  /*--- Function import code ---*/
  __Pyx_RefNannyFinishContext();
  return 0;
}


#ifndef CYTHON_NO_PYINIT_EXPORT
#define __Pyx_PyMODINIT_FUNC PyMODINIT_FUNC
#elif PY_MAJOR_VERSION < 3
#ifdef __cplusplus
#define __Pyx_PyMODINIT_FUNC extern "C" void
#else
#define __Pyx_PyMODINIT_FUNC void
#endif
#else
#ifdef __cplusplus
#define __Pyx_PyMODINIT_FUNC extern "C" PyObject *
#else
#define __Pyx_PyMODINIT_FUNC PyObject *
#endif
#endif


#if PY_MAJOR_VERSION < 3
__Pyx_PyMODINIT_FUNC initcore(void) CYTHON_SMALL_CODE; /*proto*/
__Pyx_PyMODINIT_FUNC initcore(void)
#else
__Pyx_PyMODINIT_FUNC PyInit_core(void) CYTHON_SMALL_CODE; /*proto*/
__Pyx_PyMODINIT_FUNC PyInit_core(void)
#if CYTHON_PEP489_MULTI_PHASE_INIT
{
  return PyModuleDef_Init(&__pyx_moduledef);
}
static CYTHON_SMALL_CODE int __Pyx_check_single_interpreter(void) {
    #if PY_VERSION_HEX >= 0x030700A1
    static PY_INT64_T main_interpreter_id = -1;
    PY_INT64_T current_id = PyInterpreterState_GetID(PyThreadState_Get()->interp);
    if (main_interpreter_id == -1) {
        main_interpreter_id = current_id;
        return (unlikely(current_id == -1)) ? -1 : 0;
    } else if (unlikely(main_interpreter_id != current_id))
    #else
    static PyInterpreterState *main_interpreter = NULL;
    PyInterpreterState *current_interpreter = PyThreadState_Get()->interp;
    if (!main_interpreter) {
        main_interpreter = current_interpreter;
    } else if (unlikely(main_interpreter != current_interpreter))
    #endif
    {
        PyErr_SetString(
            PyExc_ImportError,
            "Interpreter change detected - this module can only be loaded into one interpreter per process.");
        return -1;
    }
    return 0;
}
static CYTHON_SMALL_CODE int __Pyx_copy_spec_to_module(PyObject *spec, PyObject *moddict, const char* from_name, const char* to_name, int allow_none) {
    PyObject *value = PyObject_GetAttrString(spec, from_name);
    int result = 0;
    if (likely(value)) {
        if (allow_none || value != Py_None) {
            result = PyDict_SetItemString(moddict, to_name, value);
        }
        Py_DECREF(value);
    } else if (PyErr_ExceptionMatches(PyExc_AttributeError)) {
        PyErr_Clear();
    } else {
        result = -1;
    }
    return result;
}
static CYTHON_SMALL_CODE PyObject* __pyx_pymod_create(PyObject *spec, CYTHON_UNUSED PyModuleDef *def) {
    PyObject *module = NULL, *moddict, *modname;
    if (__Pyx_check_single_interpreter())
        return NULL;
    if (__pyx_m)
        return __Pyx_NewRef(__pyx_m);
    modname = PyObject_GetAttrString(spec, "name");
    if (unlikely(!modname)) goto bad;
    module = PyModule_NewObject(modname);
    Py_DECREF(modname);
    if (unlikely(!module)) goto bad;
    moddict = PyModule_GetDict(module);
    if (unlikely(!moddict)) goto bad;
    if (unlikely(__Pyx_copy_spec_to_module(spec, moddict, "loader", "__loader__", 1) < 0)) goto bad;
    if (unlikely(__Pyx_copy_spec_to_module(spec, moddict, "origin", "__file__", 1) < 0)) goto bad;
    if (unlikely(__Pyx_copy_spec_to_module(spec, moddict, "parent", "__package__", 1) < 0)) goto bad;
    if (unlikely(__Pyx_copy_spec_to_module(spec, moddict, "submodule_search_locations", "__path__", 0) < 0)) goto bad;
    return module;
bad:
    Py_XDECREF(module);
    return NULL;
}


static CYTHON_SMALL_CODE int __pyx_pymod_exec_core(PyObject *__pyx_pyinit_module)
#endif
#endif
{
  PyObject *__pyx_t_1 = NULL;
  static PyThread_type_lock __pyx_t_2[8];
  int __pyx_lineno = 0;
  const char *__pyx_filename = NULL;
  int __pyx_clineno = 0;
  __Pyx_RefNannyDeclarations
  #if CYTHON_PEP489_MULTI_PHASE_INIT
  if (__pyx_m) {
    if (__pyx_m == __pyx_pyinit_module) return 0;
    PyErr_SetString(PyExc_RuntimeError, "Module 'core' has already been imported. Re-initialisation is not supported.");
    return -1;
  }
  #elif PY_MAJOR_VERSION >= 3
  if (__pyx_m) return __Pyx_NewRef(__pyx_m);
  #endif
  #if CYTHON_REFNANNY
__Pyx_RefNanny = __Pyx_RefNannyImportAPI("refnanny");
if (!__Pyx_RefNanny) {
  PyErr_Clear();
  __Pyx_RefNanny = __Pyx_RefNannyImportAPI("Cython.Runtime.refnanny");
  if (!__Pyx_RefNanny)
      Py_FatalError("failed to import 'refnanny' module");
}
#endif
  __Pyx_RefNannySetupContext("__Pyx_PyMODINIT_FUNC PyInit_core(void)", 0);
  if (__Pyx_check_binary_version() < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  #ifdef __Pxy_PyFrame_Initialize_Offsets
  __Pxy_PyFrame_Initialize_Offsets();
  #endif
  __pyx_empty_tuple = PyTuple_New(0); if (unlikely(!__pyx_empty_tuple)) __PYX_ERR(0, 1, __pyx_L1_error)
  __pyx_empty_bytes = PyBytes_FromStringAndSize("", 0); if (unlikely(!__pyx_empty_bytes)) __PYX_ERR(0, 1, __pyx_L1_error)
  __pyx_empty_unicode = PyUnicode_FromStringAndSize("", 0); if (unlikely(!__pyx_empty_unicode)) __PYX_ERR(0, 1, __pyx_L1_error)
  #ifdef __Pyx_CyFunction_USED
  if (__pyx_CyFunction_init() < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  #endif
  #ifdef __Pyx_FusedFunction_USED
  if (__pyx_FusedFunction_init() < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  #endif
  #ifdef __Pyx_Coroutine_USED
  if (__pyx_Coroutine_init() < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  #endif
  #ifdef __Pyx_Generator_USED
  if (__pyx_Generator_init() < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  #endif
  #ifdef __Pyx_AsyncGen_USED
  if (__pyx_AsyncGen_init() < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  #endif
  #ifdef __Pyx_StopAsyncIteration_USED
  if (__pyx_StopAsyncIteration_init() < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  #endif
  /*--- Library function declarations ---*/
  /*--- Threads initialization code ---*/
  #if defined(WITH_THREAD) && PY_VERSION_HEX < 0x030700F0 && defined(__PYX_FORCE_INIT_THREADS) && __PYX_FORCE_INIT_THREADS
  PyEval_InitThreads();
  #endif
  /*--- Module creation code ---*/
  #if CYTHON_PEP489_MULTI_PHASE_INIT
  __pyx_m = __pyx_pyinit_module;
  Py_INCREF(__pyx_m);
  #else
  #if PY_MAJOR_VERSION < 3
  __pyx_m = Py_InitModule4("core", __pyx_methods, 0, 0, PYTHON_API_VERSION); Py_XINCREF(__pyx_m);
  #else
  __pyx_m = PyModule_Create(&__pyx_moduledef);
  #endif
  if (unlikely(!__pyx_m)) __PYX_ERR(0, 1, __pyx_L1_error)
  #endif
  __pyx_d = PyModule_GetDict(__pyx_m); if (unlikely(!__pyx_d)) __PYX_ERR(0, 1, __pyx_L1_error)
  Py_INCREF(__pyx_d);
  __pyx_b = PyImport_AddModule(__Pyx_BUILTIN_MODULE_NAME); if (unlikely(!__pyx_b)) __PYX_ERR(0, 1, __pyx_L1_error)
  Py_INCREF(__pyx_b);
  __pyx_cython_runtime = PyImport_AddModule((char *) "cython_runtime"); if (unlikely(!__pyx_cython_runtime)) __PYX_ERR(0, 1, __pyx_L1_error)
  Py_INCREF(__pyx_cython_runtime);
  if (PyObject_SetAttrString(__pyx_m, "__builtins__", __pyx_b) < 0) __PYX_ERR(0, 1, __pyx_L1_error);
  /*--- Initialize various global constants etc. ---*/
  if (__Pyx_InitGlobals() < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  #if PY_MAJOR_VERSION < 3 && (__PYX_DEFAULT_STRING_ENCODING_IS_ASCII || __PYX_DEFAULT_STRING_ENCODING_IS_DEFAULT)
  if (__Pyx_init_sys_getdefaultencoding_params() < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  #endif
  if (__pyx_module_is_main_TTS__tts__utils__monotonic_align__core) {
    if (PyObject_SetAttr(__pyx_m, __pyx_n_s_name_2, __pyx_n_s_main) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  }
  #if PY_MAJOR_VERSION >= 3
  {
    PyObject *modules = PyImport_GetModuleDict(); if (unlikely(!modules)) __PYX_ERR(0, 1, __pyx_L1_error)
    if (!PyDict_GetItemString(modules, "TTS.tts.utils.monotonic_align.core")) {
      if (unlikely(PyDict_SetItemString(modules, "TTS.tts.utils.monotonic_align.core", __pyx_m) < 0)) __PYX_ERR(0, 1, __pyx_L1_error)
    }
  }
  #endif
  /*--- Builtin init code ---*/
  if (__Pyx_InitCachedBuiltins() < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  /*--- Constants init code ---*/
  if (__Pyx_InitCachedConstants() < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  /*--- Global type/function init code ---*/
  (void)__Pyx_modinit_global_init_code();
  (void)__Pyx_modinit_variable_export_code();
  (void)__Pyx_modinit_function_export_code();
  if (unlikely(__Pyx_modinit_type_init_code() < 0)) __PYX_ERR(0, 1, __pyx_L1_error)
  if (unlikely(__Pyx_modinit_type_import_code() < 0)) __PYX_ERR(0, 1, __pyx_L1_error)
  (void)__Pyx_modinit_variable_import_code();
  (void)__Pyx_modinit_function_import_code();
  /*--- Execution code ---*/
  #if defined(__Pyx_Generator_USED) || defined(__Pyx_Coroutine_USED)
  if (__Pyx_patch_abc() < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  #endif

  /* "TTS/tts/utils/monotonic_align/core.pyx":1
 * import numpy as np             # <<<<<<<<<<<<<<
 * 
 * cimport cython
 */
  __pyx_t_1 = __Pyx_Import(__pyx_n_s_numpy, 0, 0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_np, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

  /* "TTS/tts/utils/monotonic_align/core.pyx":42
 * @cython.boundscheck(False)
 * @cython.wraparound(False)
 * cpdef void maximum_path_c(int[:,:,::1] paths, float[:,:,::1] values, int[::1] t_xs, int[::1] t_ys, float max_neg_val=-1e9) nogil:             # <<<<<<<<<<<<<<
 *   cdef int b = values.shape[0]
 * 
 */
  __pyx_k_ = (-1e9);
  __pyx_k_ = (-1e9);

  /* "TTS/tts/utils/monotonic_align/core.pyx":1
 * import numpy as np             # <<<<<<<<<<<<<<
 * 
 * cimport cython
 */
  __pyx_t_1 = __Pyx_PyDict_NewPresized(0); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

  /* "View.MemoryView":209
 *         info.obj = self
 * 
 *     __pyx_getbuffer = capsule(<void *> &__pyx_array_getbuffer, "getbuffer(obj, view, flags)")             # <<<<<<<<<<<<<<
 * 
 *     def __dealloc__(array self):
 */
  __pyx_t_1 = __pyx_capsule_create(((void *)(&__pyx_array_getbuffer)), ((char *)"getbuffer(obj, view, flags)")); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 209, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem((PyObject *)__pyx_array_type->tp_dict, __pyx_n_s_pyx_getbuffer, __pyx_t_1) < 0) __PYX_ERR(2, 209, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  PyType_Modified(__pyx_array_type);

  /* "View.MemoryView":286
 *         return self.name
 * 
 * cdef generic = Enum("<strided and direct or indirect>")             # <<<<<<<<<<<<<<
 * cdef strided = Enum("<strided and direct>") # default
 * cdef indirect = Enum("<strided and indirect>")
 */
  __pyx_t_1 = __Pyx_PyObject_Call(((PyObject *)__pyx_MemviewEnum_type), __pyx_tuple__22, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 286, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_XGOTREF(generic);
  __Pyx_DECREF_SET(generic, __pyx_t_1);
  __Pyx_GIVEREF(__pyx_t_1);
  __pyx_t_1 = 0;

  /* "View.MemoryView":287
 * 
 * cdef generic = Enum("<strided and direct or indirect>")
 * cdef strided = Enum("<strided and direct>") # default             # <<<<<<<<<<<<<<
 * cdef indirect = Enum("<strided and indirect>")
 * 
 */
  __pyx_t_1 = __Pyx_PyObject_Call(((PyObject *)__pyx_MemviewEnum_type), __pyx_tuple__23, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 287, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_XGOTREF(strided);
  __Pyx_DECREF_SET(strided, __pyx_t_1);
  __Pyx_GIVEREF(__pyx_t_1);
  __pyx_t_1 = 0;

  /* "View.MemoryView":288
 * cdef generic = Enum("<strided and direct or indirect>")
 * cdef strided = Enum("<strided and direct>") # default
 * cdef indirect = Enum("<strided and indirect>")             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_t_1 = __Pyx_PyObject_Call(((PyObject *)__pyx_MemviewEnum_type), __pyx_tuple__24, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 288, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_XGOTREF(indirect);
  __Pyx_DECREF_SET(indirect, __pyx_t_1);
  __Pyx_GIVEREF(__pyx_t_1);
  __pyx_t_1 = 0;

  /* "View.MemoryView":291
 * 
 * 
 * cdef contiguous = Enum("<contiguous and direct>")             # <<<<<<<<<<<<<<
 * cdef indirect_contiguous = Enum("<contiguous and indirect>")
 * 
 */
  __pyx_t_1 = __Pyx_PyObject_Call(((PyObject *)__pyx_MemviewEnum_type), __pyx_tuple__25, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 291, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_XGOTREF(contiguous);
  __Pyx_DECREF_SET(contiguous, __pyx_t_1);
  __Pyx_GIVEREF(__pyx_t_1);
  __pyx_t_1 = 0;

  /* "View.MemoryView":292
 * 
 * cdef contiguous = Enum("<contiguous and direct>")
 * cdef indirect_contiguous = Enum("<contiguous and indirect>")             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_t_1 = __Pyx_PyObject_Call(((PyObject *)__pyx_MemviewEnum_type), __pyx_tuple__26, NULL); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 292, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  __Pyx_XGOTREF(indirect_contiguous);
  __Pyx_DECREF_SET(indirect_contiguous, __pyx_t_1);
  __Pyx_GIVEREF(__pyx_t_1);
  __pyx_t_1 = 0;

  /* "View.MemoryView":316
 * 
 * DEF THREAD_LOCKS_PREALLOCATED = 8
 * cdef int __pyx_memoryview_thread_locks_used = 0             # <<<<<<<<<<<<<<
 * cdef PyThread_type_lock[THREAD_LOCKS_PREALLOCATED] __pyx_memoryview_thread_locks = [
 *     PyThread_allocate_lock(),
 */
  __pyx_memoryview_thread_locks_used = 0;

  /* "View.MemoryView":317
 * DEF THREAD_LOCKS_PREALLOCATED = 8
 * cdef int __pyx_memoryview_thread_locks_used = 0
 * cdef PyThread_type_lock[THREAD_LOCKS_PREALLOCATED] __pyx_memoryview_thread_locks = [             # <<<<<<<<<<<<<<
 *     PyThread_allocate_lock(),
 *     PyThread_allocate_lock(),
 */
  __pyx_t_2[0] = PyThread_allocate_lock();
  __pyx_t_2[1] = PyThread_allocate_lock();
  __pyx_t_2[2] = PyThread_allocate_lock();
  __pyx_t_2[3] = PyThread_allocate_lock();
  __pyx_t_2[4] = PyThread_allocate_lock();
  __pyx_t_2[5] = PyThread_allocate_lock();
  __pyx_t_2[6] = PyThread_allocate_lock();
  __pyx_t_2[7] = PyThread_allocate_lock();
  memcpy(&(__pyx_memoryview_thread_locks[0]), __pyx_t_2, sizeof(__pyx_memoryview_thread_locks[0]) * (8));

  /* "View.MemoryView":549
 *         info.obj = self
 * 
 *     __pyx_getbuffer = capsule(<void *> &__pyx_memoryview_getbuffer, "getbuffer(obj, view, flags)")             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_t_1 = __pyx_capsule_create(((void *)(&__pyx_memoryview_getbuffer)), ((char *)"getbuffer(obj, view, flags)")); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 549, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem((PyObject *)__pyx_memoryview_type->tp_dict, __pyx_n_s_pyx_getbuffer, __pyx_t_1) < 0) __PYX_ERR(2, 549, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  PyType_Modified(__pyx_memoryview_type);

  /* "View.MemoryView":995
 *         return self.from_object
 * 
 *     __pyx_getbuffer = capsule(<void *> &__pyx_memoryview_getbuffer, "getbuffer(obj, view, flags)")             # <<<<<<<<<<<<<<
 * 
 * 
 */
  __pyx_t_1 = __pyx_capsule_create(((void *)(&__pyx_memoryview_getbuffer)), ((char *)"getbuffer(obj, view, flags)")); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 995, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem((PyObject *)__pyx_memoryviewslice_type->tp_dict, __pyx_n_s_pyx_getbuffer, __pyx_t_1) < 0) __PYX_ERR(2, 995, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  PyType_Modified(__pyx_memoryviewslice_type);

  /* "(tree fragment)":1
 * def __pyx_unpickle_Enum(__pyx_type, long __pyx_checksum, __pyx_state):             # <<<<<<<<<<<<<<
 *     cdef object __pyx_PickleError
 *     cdef object __pyx_result
 */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_15View_dot_MemoryView_1__pyx_unpickle_Enum, NULL, __pyx_n_s_View_MemoryView); if (unlikely(!__pyx_t_1)) __PYX_ERR(2, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_pyx_unpickle_Enum, __pyx_t_1) < 0) __PYX_ERR(2, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;

  /* "(tree fragment)":11
 *         __pyx_unpickle_Enum__set_state(<Enum> __pyx_result, __pyx_state)
 *     return __pyx_result
 * cdef __pyx_unpickle_Enum__set_state(Enum __pyx_result, tuple __pyx_state):             # <<<<<<<<<<<<<<
 *     __pyx_result.name = __pyx_state[0]
 *     if len(__pyx_state) > 1 and hasattr(__pyx_result, '__dict__'):
 */

  /*--- Wrapped vars code ---*/

  goto __pyx_L0;
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_1);
  if (__pyx_m) {
    if (__pyx_d) {
      __Pyx_AddTraceback("init TTS.tts.utils.monotonic_align.core", __pyx_clineno, __pyx_lineno, __pyx_filename);
    }
    Py_CLEAR(__pyx_m);
  } else if (!PyErr_Occurred()) {
    PyErr_SetString(PyExc_ImportError, "init TTS.tts.utils.monotonic_align.core");
  }
  __pyx_L0:;
  __Pyx_RefNannyFinishContext();
  #if CYTHON_PEP489_MULTI_PHASE_INIT
  return (__pyx_m != NULL) ? 0 : -1;
  #elif PY_MAJOR_VERSION >= 3
  return __pyx_m;
  #else
  return;
  #endif
}

/* --- Runtime support code --- */
/* Refnanny */
#if CYTHON_REFNANNY
static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modname) {
    PyObject *m = NULL, *p = NULL;
    void *r = NULL;
    m = PyImport_ImportModule(modname);
    if (!m) goto end;
    p = PyObject_GetAttrString(m, "RefNannyAPI");
    if (!p) goto end;
    r = PyLong_AsVoidPtr(p);
end:
    Py_XDECREF(p);
    Py_XDECREF(m);
    return (__Pyx_RefNannyAPIStruct *)r;
}
#endif

/* PyObjectGetAttrStr */
#if CYTHON_USE_TYPE_SLOTS
static CYTHON_INLINE PyObject* __Pyx_PyObject_GetAttrStr(PyObject* obj, PyObject* attr_name) {
    PyTypeObject* tp = Py_TYPE(obj);
    if (likely(tp->tp_getattro))
        return tp->tp_getattro(obj, attr_name);
#if PY_MAJOR_VERSION < 3
    if (likely(tp->tp_getattr))
        return tp->tp_getattr(obj, PyString_AS_STRING(attr_name));
#endif
    return PyObject_GetAttr(obj, attr_name);
}
#endif

/* GetBuiltinName */
static PyObject *__Pyx_GetBuiltinName(PyObject *name) {
    PyObject* result = __Pyx_PyObject_GetAttrStr(__pyx_b, name);
    if (unlikely(!result)) {
        PyErr_Format(PyExc_NameError,
#if PY_MAJOR_VERSION >= 3
            "name '%U' is not defined", name);
#else
            "name '%.200s' is not defined", PyString_AS_STRING(name));
#endif
    }
    return result;
}

/* MemviewSliceInit */
static int
__Pyx_init_memviewslice(struct __pyx_memoryview_obj *memview,
                        int ndim,
                        __Pyx_memviewslice *memviewslice,
                        int memview_is_new_reference)
{
    __Pyx_RefNannyDeclarations
    int i, retval=-1;
    Py_buffer *buf = &memview->view;
    __Pyx_RefNannySetupContext("init_memviewslice", 0);
    if (unlikely(memviewslice->memview || memviewslice->data)) {
        PyErr_SetString(PyExc_ValueError,
            "memviewslice is already initialized!");
        goto fail;
    }
    if (buf->strides) {
        for (i = 0; i < ndim; i++) {
            memviewslice->strides[i] = buf->strides[i];
        }
    } else {
        Py_ssize_t stride = buf->itemsize;
        for (i = ndim - 1; i >= 0; i--) {
            memviewslice->strides[i] = stride;
            stride *= buf->shape[i];
        }
    }
    for (i = 0; i < ndim; i++) {
        memviewslice->shape[i]   = buf->shape[i];
        if (buf->suboffsets) {
            memviewslice->suboffsets[i] = buf->suboffsets[i];
        } else {
            memviewslice->suboffsets[i] = -1;
        }
    }
    memviewslice->memview = memview;
    memviewslice->data = (char *)buf->buf;
    if (__pyx_add_acquisition_count(memview) == 0 && !memview_is_new_reference) {
        Py_INCREF(memview);
    }
    retval = 0;
    goto no_fail;
fail:
    memviewslice->memview = 0;
    memviewslice->data = 0;
    retval = -1;
no_fail:
    __Pyx_RefNannyFinishContext();
    return retval;
}
#ifndef Py_NO_RETURN
#define Py_NO_RETURN
#endif
static void __pyx_fatalerror(const char *fmt, ...) Py_NO_RETURN {
    va_list vargs;
    char msg[200];
#ifdef HAVE_STDARG_PROTOTYPES
    va_start(vargs, fmt);
#else
    va_start(vargs);
#endif
    vsnprintf(msg, 200, fmt, vargs);
    va_end(vargs);
    Py_FatalError(msg);
}
static CYTHON_INLINE int
__pyx_add_acquisition_count_locked(__pyx_atomic_int *acquisition_count,
                                   PyThread_type_lock lock)
{
    int result;
    PyThread_acquire_lock(lock, 1);
    result = (*acquisition_count)++;
    PyThread_release_lock(lock);
    return result;
}
static CYTHON_INLINE int
__pyx_sub_acquisition_count_locked(__pyx_atomic_int *acquisition_count,
                                   PyThread_type_lock lock)
{
    int result;
    PyThread_acquire_lock(lock, 1);
    result = (*acquisition_count)--;
    PyThread_release_lock(lock);
    return result;
}
static CYTHON_INLINE void
__Pyx_INC_MEMVIEW(__Pyx_memviewslice *memslice, int have_gil, int lineno)
{
    int first_time;
    struct __pyx_memoryview_obj *memview = memslice->memview;
    if (unlikely(!memview || (PyObject *) memview == Py_None))
        return;
    if (unlikely(__pyx_get_slice_count(memview) < 0))
        __pyx_fatalerror("Acquisition count is %d (line %d)",
                         __pyx_get_slice_count(memview), lineno);
    first_time = __pyx_add_acquisition_count(memview) == 0;
    if (unlikely(first_time)) {
        if (have_gil) {
            Py_INCREF((PyObject *) memview);
        } else {
            PyGILState_STATE _gilstate = PyGILState_Ensure();
            Py_INCREF((PyObject *) memview);
            PyGILState_Release(_gilstate);
        }
    }
}
static CYTHON_INLINE void __Pyx_XDEC_MEMVIEW(__Pyx_memviewslice *memslice,
                                             int have_gil, int lineno) {
    int last_time;
    struct __pyx_memoryview_obj *memview = memslice->memview;
    if (unlikely(!memview || (PyObject *) memview == Py_None)) {
        memslice->memview = NULL;
        return;
    }
    if (unlikely(__pyx_get_slice_count(memview) <= 0))
        __pyx_fatalerror("Acquisition count is %d (line %d)",
                         __pyx_get_slice_count(memview), lineno);
    last_time = __pyx_sub_acquisition_count(memview) == 1;
    memslice->data = NULL;
    if (unlikely(last_time)) {
        if (have_gil) {
            Py_CLEAR(memslice->memview);
        } else {
            PyGILState_STATE _gilstate = PyGILState_Ensure();
            Py_CLEAR(memslice->memview);
            PyGILState_Release(_gilstate);
        }
    } else {
        memslice->memview = NULL;
    }
}

/* RaiseArgTupleInvalid */
static void __Pyx_RaiseArgtupleInvalid(
    const char* func_name,
    int exact,
    Py_ssize_t num_min,
    Py_ssize_t num_max,
    Py_ssize_t num_found)
{
    Py_ssize_t num_expected;
    const char *more_or_less;
    if (num_found < num_min) {
        num_expected = num_min;
        more_or_less = "at least";
    } else {
        num_expected = num_max;
        more_or_less = "at most";
    }
    if (exact) {
        more_or_less = "exactly";
    }
    PyErr_Format(PyExc_TypeError,
                 "%.200s() takes %.8s %" CYTHON_FORMAT_SSIZE_T "d positional argument%.1s (%" CYTHON_FORMAT_SSIZE_T "d given)",
                 func_name, more_or_less, num_expected,
                 (num_expected == 1) ? "" : "s", num_found);
}

/* RaiseDoubleKeywords */
static void __Pyx_RaiseDoubleKeywordsError(
    const char* func_name,
    PyObject* kw_name)
{
    PyErr_Format(PyExc_TypeError,
        #if PY_MAJOR_VERSION >= 3
        "%s() got multiple values for keyword argument '%U'", func_name, kw_name);
        #else
        "%s() got multiple values for keyword argument '%s'", func_name,
        PyString_AsString(kw_name));
        #endif
}

/* ParseKeywords */
static int __Pyx_ParseOptionalKeywords(
    PyObject *kwds,
    PyObject **argnames[],
    PyObject *kwds2,
    PyObject *values[],
    Py_ssize_t num_pos_args,
    const char* function_name)
{
    PyObject *key = 0, *value = 0;
    Py_ssize_t pos = 0;
    PyObject*** name;
    PyObject*** first_kw_arg = argnames + num_pos_args;
    while (PyDict_Next(kwds, &pos, &key, &value)) {
        name = first_kw_arg;
        while (*name && (**name != key)) name++;
        if (*name) {
            values[name-argnames] = value;
            continue;
        }
        name = first_kw_arg;
        #if PY_MAJOR_VERSION < 3
        if (likely(PyString_Check(key))) {
            while (*name) {
                if ((CYTHON_COMPILING_IN_PYPY || PyString_GET_SIZE(**name) == PyString_GET_SIZE(key))
                        && _PyString_Eq(**name, key)) {
                    values[name-argnames] = value;
                    break;
                }
                name++;
            }
            if (*name) continue;
            else {
                PyObject*** argname = argnames;
                while (argname != first_kw_arg) {
                    if ((**argname == key) || (
                            (CYTHON_COMPILING_IN_PYPY || PyString_GET_SIZE(**argname) == PyString_GET_SIZE(key))
                             && _PyString_Eq(**argname, key))) {
                        goto arg_passed_twice;
                    }
                    argname++;
                }
            }
        } else
        #endif
        if (likely(PyUnicode_Check(key))) {
            while (*name) {
                int cmp = (**name == key) ? 0 :
                #if !CYTHON_COMPILING_IN_PYPY && PY_MAJOR_VERSION >= 3
                    (__Pyx_PyUnicode_GET_LENGTH(**name) != __Pyx_PyUnicode_GET_LENGTH(key)) ? 1 :
                #endif
                    PyUnicode_Compare(**name, key);
                if (cmp < 0 && unlikely(PyErr_Occurred())) goto bad;
                if (cmp == 0) {
                    values[name-argnames] = value;
                    break;
                }
                name++;
            }
            if (*name) continue;
            else {
                PyObject*** argname = argnames;
                while (argname != first_kw_arg) {
                    int cmp = (**argname == key) ? 0 :
                    #if !CYTHON_COMPILING_IN_PYPY && PY_MAJOR_VERSION >= 3
                        (__Pyx_PyUnicode_GET_LENGTH(**argname) != __Pyx_PyUnicode_GET_LENGTH(key)) ? 1 :
                    #endif
                        PyUnicode_Compare(**argname, key);
                    if (cmp < 0 && unlikely(PyErr_Occurred())) goto bad;
                    if (cmp == 0) goto arg_passed_twice;
                    argname++;
                }
            }
        } else
            goto invalid_keyword_type;
        if (kwds2) {
            if (unlikely(PyDict_SetItem(kwds2, key, value))) goto bad;
        } else {
            goto invalid_keyword;
        }
    }
    return 0;
arg_passed_twice:
    __Pyx_RaiseDoubleKeywordsError(function_name, key);
    goto bad;
invalid_keyword_type:
    PyErr_Format(PyExc_TypeError,
        "%.200s() keywords must be strings", function_name);
    goto bad;
invalid_keyword:
    PyErr_Format(PyExc_TypeError,
    #if PY_MAJOR_VERSION < 3
        "%.200s() got an unexpected keyword argument '%.200s'",
        function_name, PyString_AsString(key));
    #else
        "%s() got an unexpected keyword argument '%U'",
        function_name, key);
    #endif
bad:
    return -1;
}

/* None */
static CYTHON_INLINE void __Pyx_RaiseUnboundLocalError(const char *varname) {
    PyErr_Format(PyExc_UnboundLocalError, "local variable '%s' referenced before assignment", varname);
}

/* GetTopmostException */
#if CYTHON_USE_EXC_INFO_STACK
static _PyErr_StackItem *
__Pyx_PyErr_GetTopmostException(PyThreadState *tstate)
{
    _PyErr_StackItem *exc_info = tstate->exc_info;
    while ((exc_info->exc_type == NULL || exc_info->exc_type == Py_None) &&
           exc_info->previous_item != NULL)
    {
        exc_info = exc_info->previous_item;
    }
    return exc_info;
}
#endif

/* SaveResetException */
#if CYTHON_FAST_THREAD_STATE
static CYTHON_INLINE void __Pyx__ExceptionSave(PyThreadState *tstate, PyObject **type, PyObject **value, PyObject **tb) {
    #if CYTHON_USE_EXC_INFO_STACK
    _PyErr_StackItem *exc_info = __Pyx_PyErr_GetTopmostException(tstate);
    *type = exc_info->exc_type;
    *value = exc_info->exc_value;
    *tb = exc_info->exc_traceback;
    #else
    *type = tstate->exc_type;
    *value = tstate->exc_value;
    *tb = tstate->exc_traceback;
    #endif
    Py_XINCREF(*type);
    Py_XINCREF(*value);
    Py_XINCREF(*tb);
}
static CYTHON_INLINE void __Pyx__ExceptionReset(PyThreadState *tstate, PyObject *type, PyObject *value, PyObject *tb) {
    PyObject *tmp_type, *tmp_value, *tmp_tb;
    #if CYTHON_USE_EXC_INFO_STACK
    _PyErr_StackItem *exc_info = tstate->exc_info;
    tmp_type = exc_info->exc_type;
    tmp_value = exc_info->exc_value;
    tmp_tb = exc_info->exc_traceback;
    exc_info->exc_type = type;
    exc_info->exc_value = value;
    exc_info->exc_traceback = tb;
    #else
    tmp_type = tstate->exc_type;
    tmp_value = tstate->exc_value;
    tmp_tb = tstate->exc_traceback;
    tstate->exc_type = type;
    tstate->exc_value = value;
    tstate->exc_traceback = tb;
    #endif
    Py_XDECREF(tmp_type);
    Py_XDECREF(tmp_value);
    Py_XDECREF(tmp_tb);
}
#endif

/* PyErrExceptionMatches */
#if CYTHON_FAST_THREAD_STATE
static int __Pyx_PyErr_ExceptionMatchesTuple(PyObject *exc_type, PyObject *tuple) {
    Py_ssize_t i, n;
    n = PyTuple_GET_SIZE(tuple);
#if PY_MAJOR_VERSION >= 3
    for (i=0; i<n; i++) {
        if (exc_type == PyTuple_GET_ITEM(tuple, i)) return 1;
    }
#endif
    for (i=0; i<n; i++) {
        if (__Pyx_PyErr_GivenExceptionMatches(exc_type, PyTuple_GET_ITEM(tuple, i))) return 1;
    }
    return 0;
}
static CYTHON_INLINE int __Pyx_PyErr_ExceptionMatchesInState(PyThreadState* tstate, PyObject* err) {
    PyObject *exc_type = tstate->curexc_type;
    if (exc_type == err) return 1;
    if (unlikely(!exc_type)) return 0;
    if (unlikely(PyTuple_Check(err)))
        return __Pyx_PyErr_ExceptionMatchesTuple(exc_type, err);
    return __Pyx_PyErr_GivenExceptionMatches(exc_type, err);
}
#endif

/* GetException */
#if CYTHON_FAST_THREAD_STATE
static int __Pyx__GetException(PyThreadState *tstate, PyObject **type, PyObject **value, PyObject **tb)
#else
static int __Pyx_GetException(PyObject **type, PyObject **value, PyObject **tb)
#endif
{
    PyObject *local_type, *local_value, *local_tb;
#if CYTHON_FAST_THREAD_STATE
    PyObject *tmp_type, *tmp_value, *tmp_tb;
    local_type = tstate->curexc_type;
    local_value = tstate->curexc_value;
    local_tb = tstate->curexc_traceback;
    tstate->curexc_type = 0;
    tstate->curexc_value = 0;
    tstate->curexc_traceback = 0;
#else
    PyErr_Fetch(&local_type, &local_value, &local_tb);
#endif
    PyErr_NormalizeException(&local_type, &local_value, &local_tb);
#if CYTHON_FAST_THREAD_STATE
    if (unlikely(tstate->curexc_type))
#else
    if (unlikely(PyErr_Occurred()))
#endif
        goto bad;
    #if PY_MAJOR_VERSION >= 3
    if (local_tb) {
        if (unlikely(PyException_SetTraceback(local_value, local_tb) < 0))
            goto bad;
    }
    #endif
    Py_XINCREF(local_tb);
    Py_XINCREF(local_type);
    Py_XINCREF(local_value);
    *type = local_type;
    *value = local_value;
    *tb = local_tb;
#if CYTHON_FAST_THREAD_STATE
    #if CYTHON_USE_EXC_INFO_STACK
    {
        _PyErr_StackItem *exc_info = tstate->exc_info;
        tmp_type = exc_info->exc_type;
        tmp_value = exc_info->exc_value;
        tmp_tb = exc_info->exc_traceback;
        exc_info->exc_type = local_type;
        exc_info->exc_value = local_value;
        exc_info->exc_traceback = local_tb;
    }
    #else
    tmp_type = tstate->exc_type;
    tmp_value = tstate->exc_value;
    tmp_tb = tstate->exc_traceback;
    tstate->exc_type = local_type;
    tstate->exc_value = local_value;
    tstate->exc_traceback = local_tb;
    #endif
    Py_XDECREF(tmp_type);
    Py_XDECREF(tmp_value);
    Py_XDECREF(tmp_tb);
#else
    PyErr_SetExcInfo(local_type, local_value, local_tb);
#endif
    return 0;
bad:
    *type = 0;
    *value = 0;
    *tb = 0;
    Py_XDECREF(local_type);
    Py_XDECREF(local_value);
    Py_XDECREF(local_tb);
    return -1;
}

/* PyObjectCall */
#if CYTHON_COMPILING_IN_CPYTHON
static CYTHON_INLINE PyObject* __Pyx_PyObject_Call(PyObject *func, PyObject *arg, PyObject *kw) {
    PyObject *result;
    ternaryfunc call = Py_TYPE(func)->tp_call;
    if (unlikely(!call))
        return PyObject_Call(func, arg, kw);
    if (unlikely(Py_EnterRecursiveCall((char*)" while calling a Python object")))
        return NULL;
    result = (*call)(func, arg, kw);
    Py_LeaveRecursiveCall();
    if (unlikely(!result) && unlikely(!PyErr_Occurred())) {
        PyErr_SetString(
            PyExc_SystemError,
            "NULL result without error in PyObject_Call");
    }
    return result;
}
#endif

/* PyErrFetchRestore */
#if CYTHON_FAST_THREAD_STATE
static CYTHON_INLINE void __Pyx_ErrRestoreInState(PyThreadState *tstate, PyObject *type, PyObject *value, PyObject *tb) {
    PyObject *tmp_type, *tmp_value, *tmp_tb;
    tmp_type = tstate->curexc_type;
    tmp_value = tstate->curexc_value;
    tmp_tb = tstate->curexc_traceback;
    tstate->curexc_type = type;
    tstate->curexc_value = value;
    tstate->curexc_traceback = tb;
    Py_XDECREF(tmp_type);
    Py_XDECREF(tmp_value);
    Py_XDECREF(tmp_tb);
}
static CYTHON_INLINE void __Pyx_ErrFetchInState(PyThreadState *tstate, PyObject **type, PyObject **value, PyObject **tb) {
    *type = tstate->curexc_type;
    *value = tstate->curexc_value;
    *tb = tstate->curexc_traceback;
    tstate->curexc_type = 0;
    tstate->curexc_value = 0;
    tstate->curexc_traceback = 0;
}
#endif

/* RaiseException */
#if PY_MAJOR_VERSION < 3
static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb,
                        CYTHON_UNUSED PyObject *cause) {
    __Pyx_PyThreadState_declare
    Py_XINCREF(type);
    if (!value || value == Py_None)
        value = NULL;
    else
        Py_INCREF(value);
    if (!tb || tb == Py_None)
        tb = NULL;
    else {
        Py_INCREF(tb);
        if (!PyTraceBack_Check(tb)) {
            PyErr_SetString(PyExc_TypeError,
                "raise: arg 3 must be a traceback or None");
            goto raise_error;
        }
    }
    if (PyType_Check(type)) {
#if CYTHON_COMPILING_IN_PYPY
        if (!value) {
            Py_INCREF(Py_None);
            value = Py_None;
        }
#endif
        PyErr_NormalizeException(&type, &value, &tb);
    } else {
        if (value) {
            PyErr_SetString(PyExc_TypeError,
                "instance exception may not have a separate value");
            goto raise_error;
        }
        value = type;
        type = (PyObject*) Py_TYPE(type);
        Py_INCREF(type);
        if (!PyType_IsSubtype((PyTypeObject *)type, (PyTypeObject *)PyExc_BaseException)) {
            PyErr_SetString(PyExc_TypeError,
                "raise: exception class must be a subclass of BaseException");
            goto raise_error;
        }
    }
    __Pyx_PyThreadState_assign
    __Pyx_ErrRestore(type, value, tb);
    return;
raise_error:
    Py_XDECREF(value);
    Py_XDECREF(type);
    Py_XDECREF(tb);
    return;
}
#else
static void __Pyx_Raise(PyObject *type, PyObject *value, PyObject *tb, PyObject *cause) {
    PyObject* owned_instance = NULL;
    if (tb == Py_None) {
        tb = 0;
    } else if (tb && !PyTraceBack_Check(tb)) {
        PyErr_SetString(PyExc_TypeError,
            "raise: arg 3 must be a traceback or None");
        goto bad;
    }
    if (value == Py_None)
        value = 0;
    if (PyExceptionInstance_Check(type)) {
        if (value) {
            PyErr_SetString(PyExc_TypeError,
                "instance exception may not have a separate value");
            goto bad;
        }
        value = type;
        type = (PyObject*) Py_TYPE(value);
    } else if (PyExceptionClass_Check(type)) {
        PyObject *instance_class = NULL;
        if (value && PyExceptionInstance_Check(value)) {
            instance_class = (PyObject*) Py_TYPE(value);
            if (instance_class != type) {
                int is_subclass = PyObject_IsSubclass(instance_class, type);
                if (!is_subclass) {
                    instance_class = NULL;
                } else if (unlikely(is_subclass == -1)) {
                    goto bad;
                } else {
                    type = instance_class;
                }
            }
        }
        if (!instance_class) {
            PyObject *args;
            if (!value)
                args = PyTuple_New(0);
            else if (PyTuple_Check(value)) {
                Py_INCREF(value);
                args = value;
            } else
                args = PyTuple_Pack(1, value);
            if (!args)
                goto bad;
            owned_instance = PyObject_Call(type, args, NULL);
            Py_DECREF(args);
            if (!owned_instance)
                goto bad;
            value = owned_instance;
            if (!PyExceptionInstance_Check(value)) {
                PyErr_Format(PyExc_TypeError,
                             "calling %R should have returned an instance of "
                             "BaseException, not %R",
                             type, Py_TYPE(value));
                goto bad;
            }
        }
    } else {
        PyErr_SetString(PyExc_TypeError,
            "raise: exception class must be a subclass of BaseException");
        goto bad;
    }
    if (cause) {
        PyObject *fixed_cause;
        if (cause == Py_None) {
            fixed_cause = NULL;
        } else if (PyExceptionClass_Check(cause)) {
            fixed_cause = PyObject_CallObject(cause, NULL);
            if (fixed_cause == NULL)
                goto bad;
        } else if (PyExceptionInstance_Check(cause)) {
            fixed_cause = cause;
            Py_INCREF(fixed_cause);
        } else {
            PyErr_SetString(PyExc_TypeError,
                            "exception causes must derive from "
                            "BaseException");
            goto bad;
        }
        PyException_SetCause(value, fixed_cause);
    }
    PyErr_SetObject(type, value);
    if (tb) {
#if CYTHON_COMPILING_IN_PYPY
        PyObject *tmp_type, *tmp_value, *tmp_tb;
        PyErr_Fetch(&tmp_type, &tmp_value, &tmp_tb);
        Py_INCREF(tb);
        PyErr_Restore(tmp_type, tmp_value, tb);
        Py_XDECREF(tmp_tb);
#else
        PyThreadState *tstate = __Pyx_PyThreadState_Current;
        PyObject* tmp_tb = tstate->curexc_traceback;
        if (tb != tmp_tb) {
            Py_INCREF(tb);
            tstate->curexc_traceback = tb;
            Py_XDECREF(tmp_tb);
        }
#endif
    }
bad:
    Py_XDECREF(owned_instance);
    return;
}
#endif

/* ArgTypeTest */
static int __Pyx__ArgTypeTest(PyObject *obj, PyTypeObject *type, const char *name, int exact)
{
    if (unlikely(!type)) {
        PyErr_SetString(PyExc_SystemError, "Missing type object");
        return 0;
    }
    else if (exact) {
        #if PY_MAJOR_VERSION == 2
        if ((type == &PyBaseString_Type) && likely(__Pyx_PyBaseString_CheckExact(obj))) return 1;
        #endif
    }
    else {
        if (likely(__Pyx_TypeCheck(obj, type))) return 1;
    }
    PyErr_Format(PyExc_TypeError,
        "Argument '%.200s' has incorrect type (expected %.200s, got %.200s)",
        name, type->tp_name, Py_TYPE(obj)->tp_name);
    return 0;
}

/* PyCFunctionFastCall */
#if CYTHON_FAST_PYCCALL
static CYTHON_INLINE PyObject * __Pyx_PyCFunction_FastCall(PyObject *func_obj, PyObject **args, Py_ssize_t nargs) {
    PyCFunctionObject *func = (PyCFunctionObject*)func_obj;
    PyCFunction meth = PyCFunction_GET_FUNCTION(func);
    PyObject *self = PyCFunction_GET_SELF(func);
    int flags = PyCFunction_GET_FLAGS(func);
    assert(PyCFunction_Check(func));
    assert(METH_FASTCALL == (flags & ~(METH_CLASS | METH_STATIC | METH_COEXIST | METH_KEYWORDS | METH_STACKLESS)));
    assert(nargs >= 0);
    assert(nargs == 0 || args != NULL);
    /* _PyCFunction_FastCallDict() must not be called with an exception set,
       because it may clear it (directly or indirectly) and so the
       caller loses its exception */
    assert(!PyErr_Occurred());
    if ((PY_VERSION_HEX < 0x030700A0) || unlikely(flags & METH_KEYWORDS)) {
        return (*((__Pyx_PyCFunctionFastWithKeywords)(void*)meth)) (self, args, nargs, NULL);
    } else {
        return (*((__Pyx_PyCFunctionFast)(void*)meth)) (self, args, nargs);
    }
}
#endif

/* PyFunctionFastCall */
#if CYTHON_FAST_PYCALL
static PyObject* __Pyx_PyFunction_FastCallNoKw(PyCodeObject *co, PyObject **args, Py_ssize_t na,
                                               PyObject *globals) {
    PyFrameObject *f;
    PyThreadState *tstate = __Pyx_PyThreadState_Current;
    PyObject **fastlocals;
    Py_ssize_t i;
    PyObject *result;
    assert(globals != NULL);
    /* XXX Perhaps we should create a specialized
       PyFrame_New() that doesn't take locals, but does
       take builtins without sanity checking them.
       */
    assert(tstate != NULL);
    f = PyFrame_New(tstate, co, globals, NULL);
    if (f == NULL) {
        return NULL;
    }
    fastlocals = __Pyx_PyFrame_GetLocalsplus(f);
    for (i = 0; i < na; i++) {
        Py_INCREF(*args);
        fastlocals[i] = *args++;
    }
    result = PyEval_EvalFrameEx(f,0);
    ++tstate->recursion_depth;
    Py_DECREF(f);
    --tstate->recursion_depth;
    return result;
}
#if 1 || PY_VERSION_HEX < 0x030600B1
static PyObject *__Pyx_PyFunction_FastCallDict(PyObject *func, PyObject **args, Py_ssize_t nargs, PyObject *kwargs) {
    PyCodeObject *co = (PyCodeObject *)PyFunction_GET_CODE(func);
    PyObject *globals = PyFunction_GET_GLOBALS(func);
    PyObject *argdefs = PyFunction_GET_DEFAULTS(func);
    PyObject *closure;
#if PY_MAJOR_VERSION >= 3
    PyObject *kwdefs;
#endif
    PyObject *kwtuple, **k;
    PyObject **d;
    Py_ssize_t nd;
    Py_ssize_t nk;
    PyObject *result;
    assert(kwargs == NULL || PyDict_Check(kwargs));
    nk = kwargs ? PyDict_Size(kwargs) : 0;
    if (Py_EnterRecursiveCall((char*)" while calling a Python object")) {
        return NULL;
    }
    if (
#if PY_MAJOR_VERSION >= 3
            co->co_kwonlyargcount == 0 &&
#endif
            likely(kwargs == NULL || nk == 0) &&
            co->co_flags == (CO_OPTIMIZED | CO_NEWLOCALS | CO_NOFREE)) {
        if (argdefs == NULL && co->co_argcount == nargs) {
            result = __Pyx_PyFunction_FastCallNoKw(co, args, nargs, globals);
            goto done;
        }
        else if (nargs == 0 && argdefs != NULL
                 && co->co_argcount == Py_SIZE(argdefs)) {
            /* function called with no arguments, but all parameters have
               a default value: use default values as arguments .*/
            args = &PyTuple_GET_ITEM(argdefs, 0);
            result =__Pyx_PyFunction_FastCallNoKw(co, args, Py_SIZE(argdefs), globals);
            goto done;
        }
    }
    if (kwargs != NULL) {
        Py_ssize_t pos, i;
        kwtuple = PyTuple_New(2 * nk);
        if (kwtuple == NULL) {
            result = NULL;
            goto done;
        }
        k = &PyTuple_GET_ITEM(kwtuple, 0);
        pos = i = 0;
        while (PyDict_Next(kwargs, &pos, &k[i], &k[i+1])) {
            Py_INCREF(k[i]);
            Py_INCREF(k[i+1]);
            i += 2;
        }
        nk = i / 2;
    }
    else {
        kwtuple = NULL;
        k = NULL;
    }
    closure = PyFunction_GET_CLOSURE(func);
#if PY_MAJOR_VERSION >= 3
    kwdefs = PyFunction_GET_KW_DEFAULTS(func);
#endif
    if (argdefs != NULL) {
        d = &PyTuple_GET_ITEM(argdefs, 0);
        nd = Py_SIZE(argdefs);
    }
    else {
        d = NULL;
        nd = 0;
    }
#if PY_MAJOR_VERSION >= 3
    result = PyEval_EvalCodeEx((PyObject*)co, globals, (PyObject *)NULL,
                               args, (int)nargs,
                               k, (int)nk,
                               d, (int)nd, kwdefs, closure);
#else
    result = PyEval_EvalCodeEx(co, globals, (PyObject *)NULL,
                               args, (int)nargs,
                               k, (int)nk,
                               d, (int)nd, closure);
#endif
    Py_XDECREF(kwtuple);
done:
    Py_LeaveRecursiveCall();
    return result;
}
#endif
#endif

/* PyObjectCall2Args */
static CYTHON_UNUSED PyObject* __Pyx_PyObject_Call2Args(PyObject* function, PyObject* arg1, PyObject* arg2) {
    PyObject *args, *result = NULL;
    #if CYTHON_FAST_PYCALL
    if (PyFunction_Check(function)) {
        PyObject *args[2] = {arg1, arg2};
        return __Pyx_PyFunction_FastCall(function, args, 2);
    }
    #endif
    #if CYTHON_FAST_PYCCALL
    if (__Pyx_PyFastCFunction_Check(function)) {
        PyObject *args[2] = {arg1, arg2};
        return __Pyx_PyCFunction_FastCall(function, args, 2);
    }
    #endif
    args = PyTuple_New(2);
    if (unlikely(!args)) goto done;
    Py_INCREF(arg1);
    PyTuple_SET_ITEM(args, 0, arg1);
    Py_INCREF(arg2);
    PyTuple_SET_ITEM(args, 1, arg2);
    Py_INCREF(function);
    result = __Pyx_PyObject_Call(function, args, NULL);
    Py_DECREF(args);
    Py_DECREF(function);
done:
    return result;
}

/* PyObjectCallMethO */
#if CYTHON_COMPILING_IN_CPYTHON
static CYTHON_INLINE PyObject* __Pyx_PyObject_CallMethO(PyObject *func, PyObject *arg) {
    PyObject *self, *result;
    PyCFunction cfunc;
    cfunc = PyCFunction_GET_FUNCTION(func);
    self = PyCFunction_GET_SELF(func);
    if (unlikely(Py_EnterRecursiveCall((char*)" while calling a Python object")))
        return NULL;
    result = cfunc(self, arg);
    Py_LeaveRecursiveCall();
    if (unlikely(!result) && unlikely(!PyErr_Occurred())) {
        PyErr_SetString(
            PyExc_SystemError,
            "NULL result without error in PyObject_Call");
    }
    return result;
}
#endif

/* PyObjectCallOneArg */
#if CYTHON_COMPILING_IN_CPYTHON
static PyObject* __Pyx__PyObject_CallOneArg(PyObject *func, PyObject *arg) {
    PyObject *result;
    PyObject *args = PyTuple_New(1);
    if (unlikely(!args)) return NULL;
    Py_INCREF(arg);
    PyTuple_SET_ITEM(args, 0, arg);
    result = __Pyx_PyObject_Call(func, args, NULL);
    Py_DECREF(args);
    return result;
}
static CYTHON_INLINE PyObject* __Pyx_PyObject_CallOneArg(PyObject *func, PyObject *arg) {
#if CYTHON_FAST_PYCALL
    if (PyFunction_Check(func)) {
        return __Pyx_PyFunction_FastCall(func, &arg, 1);
    }
#endif
    if (likely(PyCFunction_Check(func))) {
        if (likely(PyCFunction_GET_FLAGS(func) & METH_O)) {
            return __Pyx_PyObject_CallMethO(func, arg);
#if CYTHON_FAST_PYCCALL
        } else if (__Pyx_PyFastCFunction_Check(func)) {
            return __Pyx_PyCFunction_FastCall(func, &arg, 1);
#endif
        }
    }
    return __Pyx__PyObject_CallOneArg(func, arg);
}
#else
static CYTHON_INLINE PyObject* __Pyx_PyObject_CallOneArg(PyObject *func, PyObject *arg) {
    PyObject *result;
    PyObject *args = PyTuple_Pack(1, arg);
    if (unlikely(!args)) return NULL;
    result = __Pyx_PyObject_Call(func, args, NULL);
    Py_DECREF(args);
    return result;
}
#endif

/* BytesEquals */
static CYTHON_INLINE int __Pyx_PyBytes_Equals(PyObject* s1, PyObject* s2, int equals) {
#if CYTHON_COMPILING_IN_PYPY
    return PyObject_RichCompareBool(s1, s2, equals);
#else
    if (s1 == s2) {
        return (equals == Py_EQ);
    } else if (PyBytes_CheckExact(s1) & PyBytes_CheckExact(s2)) {
        const char *ps1, *ps2;
        Py_ssize_t length = PyBytes_GET_SIZE(s1);
        if (length != PyBytes_GET_SIZE(s2))
            return (equals == Py_NE);
        ps1 = PyBytes_AS_STRING(s1);
        ps2 = PyBytes_AS_STRING(s2);
        if (ps1[0] != ps2[0]) {
            return (equals == Py_NE);
        } else if (length == 1) {
            return (equals == Py_EQ);
        } else {
            int result;
#if CYTHON_USE_UNICODE_INTERNALS
            Py_hash_t hash1, hash2;
            hash1 = ((PyBytesObject*)s1)->ob_shash;
            hash2 = ((PyBytesObject*)s2)->ob_shash;
            if (hash1 != hash2 && hash1 != -1 && hash2 != -1) {
                return (equals == Py_NE);
            }
#endif
            result = memcmp(ps1, ps2, (size_t)length);
            return (equals == Py_EQ) ? (result == 0) : (result != 0);
        }
    } else if ((s1 == Py_None) & PyBytes_CheckExact(s2)) {
        return (equals == Py_NE);
    } else if ((s2 == Py_None) & PyBytes_CheckExact(s1)) {
        return (equals == Py_NE);
    } else {
        int result;
        PyObject* py_result = PyObject_RichCompare(s1, s2, equals);
        if (!py_result)
            return -1;
        result = __Pyx_PyObject_IsTrue(py_result);
        Py_DECREF(py_result);
        return result;
    }
#endif
}

/* UnicodeEquals */
static CYTHON_INLINE int __Pyx_PyUnicode_Equals(PyObject* s1, PyObject* s2, int equals) {
#if CYTHON_COMPILING_IN_PYPY
    return PyObject_RichCompareBool(s1, s2, equals);
#else
#if PY_MAJOR_VERSION < 3
    PyObject* owned_ref = NULL;
#endif
    int s1_is_unicode, s2_is_unicode;
    if (s1 == s2) {
        goto return_eq;
    }
    s1_is_unicode = PyUnicode_CheckExact(s1);
    s2_is_unicode = PyUnicode_CheckExact(s2);
#if PY_MAJOR_VERSION < 3
    if ((s1_is_unicode & (!s2_is_unicode)) && PyString_CheckExact(s2)) {
        owned_ref = PyUnicode_FromObject(s2);
        if (unlikely(!owned_ref))
            return -1;
        s2 = owned_ref;
        s2_is_unicode = 1;
    } else if ((s2_is_unicode & (!s1_is_unicode)) && PyString_CheckExact(s1)) {
        owned_ref = PyUnicode_FromObject(s1);
        if (unlikely(!owned_ref))
            return -1;
        s1 = owned_ref;
        s1_is_unicode = 1;
    } else if (((!s2_is_unicode) & (!s1_is_unicode))) {
        return __Pyx_PyBytes_Equals(s1, s2, equals);
    }
#endif
    if (s1_is_unicode & s2_is_unicode) {
        Py_ssize_t length;
        int kind;
        void *data1, *data2;
        if (unlikely(__Pyx_PyUnicode_READY(s1) < 0) || unlikely(__Pyx_PyUnicode_READY(s2) < 0))
            return -1;
        length = __Pyx_PyUnicode_GET_LENGTH(s1);
        if (length != __Pyx_PyUnicode_GET_LENGTH(s2)) {
            goto return_ne;
        }
#if CYTHON_USE_UNICODE_INTERNALS
        {
            Py_hash_t hash1, hash2;
        #if CYTHON_PEP393_ENABLED
            hash1 = ((PyASCIIObject*)s1)->hash;
            hash2 = ((PyASCIIObject*)s2)->hash;
        #else
            hash1 = ((PyUnicodeObject*)s1)->hash;
            hash2 = ((PyUnicodeObject*)s2)->hash;
        #endif
            if (hash1 != hash2 && hash1 != -1 && hash2 != -1) {
                goto return_ne;
            }
        }
#endif
        kind = __Pyx_PyUnicode_KIND(s1);
        if (kind != __Pyx_PyUnicode_KIND(s2)) {
            goto return_ne;
        }
        data1 = __Pyx_PyUnicode_DATA(s1);
        data2 = __Pyx_PyUnicode_DATA(s2);
        if (__Pyx_PyUnicode_READ(kind, data1, 0) != __Pyx_PyUnicode_READ(kind, data2, 0)) {
            goto return_ne;
        } else if (length == 1) {
            goto return_eq;
        } else {
            int result = memcmp(data1, data2, (size_t)(length * kind));
            #if PY_MAJOR_VERSION < 3
            Py_XDECREF(owned_ref);
            #endif
            return (equals == Py_EQ) ? (result == 0) : (result != 0);
        }
    } else if ((s1 == Py_None) & s2_is_unicode) {
        goto return_ne;
    } else if ((s2 == Py_None) & s1_is_unicode) {
        goto return_ne;
    } else {
        int result;
        PyObject* py_result = PyObject_RichCompare(s1, s2, equals);
        #if PY_MAJOR_VERSION < 3
        Py_XDECREF(owned_ref);
        #endif
        if (!py_result)
            return -1;
        result = __Pyx_PyObject_IsTrue(py_result);
        Py_DECREF(py_result);
        return result;
    }
return_eq:
    #if PY_MAJOR_VERSION < 3
    Py_XDECREF(owned_ref);
    #endif
    return (equals == Py_EQ);
return_ne:
    #if PY_MAJOR_VERSION < 3
    Py_XDECREF(owned_ref);
    #endif
    return (equals == Py_NE);
#endif
}

/* DivInt[Py_ssize_t] */
static CYTHON_INLINE Py_ssize_t __Pyx_div_Py_ssize_t(Py_ssize_t a, Py_ssize_t b) {
    Py_ssize_t q = a / b;
    Py_ssize_t r = a - q*b;
    q -= ((r != 0) & ((r ^ b) < 0));
    return q;
}

/* GetAttr */
static CYTHON_INLINE PyObject *__Pyx_GetAttr(PyObject *o, PyObject *n) {
#if CYTHON_USE_TYPE_SLOTS
#if PY_MAJOR_VERSION >= 3
    if (likely(PyUnicode_Check(n)))
#else
    if (likely(PyString_Check(n)))
#endif
        return __Pyx_PyObject_GetAttrStr(o, n);
#endif
    return PyObject_GetAttr(o, n);
}

/* GetItemInt */
static PyObject *__Pyx_GetItemInt_Generic(PyObject *o, PyObject* j) {
    PyObject *r;
    if (!j) return NULL;
    r = PyObject_GetItem(o, j);
    Py_DECREF(j);
    return r;
}
static CYTHON_INLINE PyObject *__Pyx_GetItemInt_List_Fast(PyObject *o, Py_ssize_t i,
                                                              CYTHON_NCP_UNUSED int wraparound,
                                                              CYTHON_NCP_UNUSED int boundscheck) {
#if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS
    Py_ssize_t wrapped_i = i;
    if (wraparound & unlikely(i < 0)) {
        wrapped_i += PyList_GET_SIZE(o);
    }
    if ((!boundscheck) || likely(__Pyx_is_valid_index(wrapped_i, PyList_GET_SIZE(o)))) {
        PyObject *r = PyList_GET_ITEM(o, wrapped_i);
        Py_INCREF(r);
        return r;
    }
    return __Pyx_GetItemInt_Generic(o, PyInt_FromSsize_t(i));
#else
    return PySequence_GetItem(o, i);
#endif
}
static CYTHON_INLINE PyObject *__Pyx_GetItemInt_Tuple_Fast(PyObject *o, Py_ssize_t i,
                                                              CYTHON_NCP_UNUSED int wraparound,
                                                              CYTHON_NCP_UNUSED int boundscheck) {
#if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS
    Py_ssize_t wrapped_i = i;
    if (wraparound & unlikely(i < 0)) {
        wrapped_i += PyTuple_GET_SIZE(o);
    }
    if ((!boundscheck) || likely(__Pyx_is_valid_index(wrapped_i, PyTuple_GET_SIZE(o)))) {
        PyObject *r = PyTuple_GET_ITEM(o, wrapped_i);
        Py_INCREF(r);
        return r;
    }
    return __Pyx_GetItemInt_Generic(o, PyInt_FromSsize_t(i));
#else
    return PySequence_GetItem(o, i);
#endif
}
static CYTHON_INLINE PyObject *__Pyx_GetItemInt_Fast(PyObject *o, Py_ssize_t i, int is_list,
                                                     CYTHON_NCP_UNUSED int wraparound,
                                                     CYTHON_NCP_UNUSED int boundscheck) {
#if CYTHON_ASSUME_SAFE_MACROS && !CYTHON_AVOID_BORROWED_REFS && CYTHON_USE_TYPE_SLOTS
    if (is_list || PyList_CheckExact(o)) {
        Py_ssize_t n = ((!wraparound) | likely(i >= 0)) ? i : i + PyList_GET_SIZE(o);
        if ((!boundscheck) || (likely(__Pyx_is_valid_index(n, PyList_GET_SIZE(o))))) {
            PyObject *r = PyList_GET_ITEM(o, n);
            Py_INCREF(r);
            return r;
        }
    }
    else if (PyTuple_CheckExact(o)) {
        Py_ssize_t n = ((!wraparound) | likely(i >= 0)) ? i : i + PyTuple_GET_SIZE(o);
        if ((!boundscheck) || likely(__Pyx_is_valid_index(n, PyTuple_GET_SIZE(o)))) {
            PyObject *r = PyTuple_GET_ITEM(o, n);
            Py_INCREF(r);
            return r;
        }
    } else {
        PySequenceMethods *m = Py_TYPE(o)->tp_as_sequence;
        if (likely(m && m->sq_item)) {
            if (wraparound && unlikely(i < 0) && likely(m->sq_length)) {
                Py_ssize_t l = m->sq_length(o);
                if (likely(l >= 0)) {
                    i += l;
                } else {
                    if (!PyErr_ExceptionMatches(PyExc_OverflowError))
                        return NULL;
                    PyErr_Clear();
                }
            }
            return m->sq_item(o, i);
        }
    }
#else
    if (is_list || PySequence_Check(o)) {
        return PySequence_GetItem(o, i);
    }
#endif
    return __Pyx_GetItemInt_Generic(o, PyInt_FromSsize_t(i));
}

/* ObjectGetItem */
#if CYTHON_USE_TYPE_SLOTS
static PyObject *__Pyx_PyObject_GetIndex(PyObject *obj, PyObject* index) {
    PyObject *runerr;
    Py_ssize_t key_value;
    PySequenceMethods *m = Py_TYPE(obj)->tp_as_sequence;
    if (unlikely(!(m && m->sq_item))) {
        PyErr_Format(PyExc_TypeError, "'%.200s' object is not subscriptable", Py_TYPE(obj)->tp_name);
        return NULL;
    }
    key_value = __Pyx_PyIndex_AsSsize_t(index);
    if (likely(key_value != -1 || !(runerr = PyErr_Occurred()))) {
        return __Pyx_GetItemInt_Fast(obj, key_value, 0, 1, 1);
    }
    if (PyErr_GivenExceptionMatches(runerr, PyExc_OverflowError)) {
        PyErr_Clear();
        PyErr_Format(PyExc_IndexError, "cannot fit '%.200s' into an index-sized integer", Py_TYPE(index)->tp_name);
    }
    return NULL;
}
static PyObject *__Pyx_PyObject_GetItem(PyObject *obj, PyObject* key) {
    PyMappingMethods *m = Py_TYPE(obj)->tp_as_mapping;
    if (likely(m && m->mp_subscript)) {
        return m->mp_subscript(obj, key);
    }
    return __Pyx_PyObject_GetIndex(obj, key);
}
#endif

/* decode_c_string */
static CYTHON_INLINE PyObject* __Pyx_decode_c_string(
         const char* cstring, Py_ssize_t start, Py_ssize_t stop,
         const char* encoding, const char* errors,
         PyObject* (*decode_func)(const char *s, Py_ssize_t size, const char *errors)) {
    Py_ssize_t length;
    if (unlikely((start < 0) | (stop < 0))) {
        size_t slen = strlen(cstring);
        if (unlikely(slen > (size_t) PY_SSIZE_T_MAX)) {
            PyErr_SetString(PyExc_OverflowError,
                            "c-string too long to convert to Python");
            return NULL;
        }
        length = (Py_ssize_t) slen;
        if (start < 0) {
            start += length;
            if (start < 0)
                start = 0;
        }
        if (stop < 0)
            stop += length;
    }
    if (unlikely(stop <= start))
        return __Pyx_NewRef(__pyx_empty_unicode);
    length = stop - start;
    cstring += start;
    if (decode_func) {
        return decode_func(cstring, length, errors);
    } else {
        return PyUnicode_Decode(cstring, length, encoding, errors);
    }
}

/* GetAttr3 */
static PyObject *__Pyx_GetAttr3Default(PyObject *d) {
    __Pyx_PyThreadState_declare
    __Pyx_PyThreadState_assign
    if (unlikely(!__Pyx_PyErr_ExceptionMatches(PyExc_AttributeError)))
        return NULL;
    __Pyx_PyErr_Clear();
    Py_INCREF(d);
    return d;
}
static CYTHON_INLINE PyObject *__Pyx_GetAttr3(PyObject *o, PyObject *n, PyObject *d) {
    PyObject *r = __Pyx_GetAttr(o, n);
    return (likely(r)) ? r : __Pyx_GetAttr3Default(d);
}

/* PyDictVersioning */
#if CYTHON_USE_DICT_VERSIONS && CYTHON_USE_TYPE_SLOTS
static CYTHON_INLINE PY_UINT64_T __Pyx_get_tp_dict_version(PyObject *obj) {
    PyObject *dict = Py_TYPE(obj)->tp_dict;
    return likely(dict) ? __PYX_GET_DICT_VERSION(dict) : 0;
}
static CYTHON_INLINE PY_UINT64_T __Pyx_get_object_dict_version(PyObject *obj) {
    PyObject **dictptr = NULL;
    Py_ssize_t offset = Py_TYPE(obj)->tp_dictoffset;
    if (offset) {
#if CYTHON_COMPILING_IN_CPYTHON
        dictptr = (likely(offset > 0)) ? (PyObject **) ((char *)obj + offset) : _PyObject_GetDictPtr(obj);
#else
        dictptr = _PyObject_GetDictPtr(obj);
#endif
    }
    return (dictptr && *dictptr) ? __PYX_GET_DICT_VERSION(*dictptr) : 0;
}
static CYTHON_INLINE int __Pyx_object_dict_version_matches(PyObject* obj, PY_UINT64_T tp_dict_version, PY_UINT64_T obj_dict_version) {
    PyObject *dict = Py_TYPE(obj)->tp_dict;
    if (unlikely(!dict) || unlikely(tp_dict_version != __PYX_GET_DICT_VERSION(dict)))
        return 0;
    return obj_dict_version == __Pyx_get_object_dict_version(obj);
}
#endif

/* GetModuleGlobalName */
#if CYTHON_USE_DICT_VERSIONS
static PyObject *__Pyx__GetModuleGlobalName(PyObject *name, PY_UINT64_T *dict_version, PyObject **dict_cached_value)
#else
static CYTHON_INLINE PyObject *__Pyx__GetModuleGlobalName(PyObject *name)
#endif
{
    PyObject *result;
#if !CYTHON_AVOID_BORROWED_REFS
#if CYTHON_COMPILING_IN_CPYTHON && PY_VERSION_HEX >= 0x030500A1
    result = _PyDict_GetItem_KnownHash(__pyx_d, name, ((PyASCIIObject *) name)->hash);
    __PYX_UPDATE_DICT_CACHE(__pyx_d, result, *dict_cached_value, *dict_version)
    if (likely(result)) {
        return __Pyx_NewRef(result);
    } else if (unlikely(PyErr_Occurred())) {
        return NULL;
    }
#else
    result = PyDict_GetItem(__pyx_d, name);
    __PYX_UPDATE_DICT_CACHE(__pyx_d, result, *dict_cached_value, *dict_version)
    if (likely(result)) {
        return __Pyx_NewRef(result);
    }
#endif
#else
    result = PyObject_GetItem(__pyx_d, name);
    __PYX_UPDATE_DICT_CACHE(__pyx_d, result, *dict_cached_value, *dict_version)
    if (likely(result)) {
        return __Pyx_NewRef(result);
    }
    PyErr_Clear();
#endif
    return __Pyx_GetBuiltinName(name);
}

/* RaiseTooManyValuesToUnpack */
static CYTHON_INLINE void __Pyx_RaiseTooManyValuesError(Py_ssize_t expected) {
    PyErr_Format(PyExc_ValueError,
                 "too many values to unpack (expected %" CYTHON_FORMAT_SSIZE_T "d)", expected);
}

/* RaiseNeedMoreValuesToUnpack */
static CYTHON_INLINE void __Pyx_RaiseNeedMoreValuesError(Py_ssize_t index) {
    PyErr_Format(PyExc_ValueError,
                 "need more than %" CYTHON_FORMAT_SSIZE_T "d value%.1s to unpack",
                 index, (index == 1) ? "" : "s");
}

/* RaiseNoneIterError */
static CYTHON_INLINE void __Pyx_RaiseNoneNotIterableError(void) {
    PyErr_SetString(PyExc_TypeError, "'NoneType' object is not iterable");
}

/* ExtTypeTest */
static CYTHON_INLINE int __Pyx_TypeTest(PyObject *obj, PyTypeObject *type) {
    if (unlikely(!type)) {
        PyErr_SetString(PyExc_SystemError, "Missing type object");
        return 0;
    }
    if (likely(__Pyx_TypeCheck(obj, type)))
        return 1;
    PyErr_Format(PyExc_TypeError, "Cannot convert %.200s to %.200s",
                 Py_TYPE(obj)->tp_name, type->tp_name);
    return 0;
}

/* SwapException */
#if CYTHON_FAST_THREAD_STATE
static CYTHON_INLINE void __Pyx__ExceptionSwap(PyThreadState *tstate, PyObject **type, PyObject **value, PyObject **tb) {
    PyObject *tmp_type, *tmp_value, *tmp_tb;
    #if CYTHON_USE_EXC_INFO_STACK
    _PyErr_StackItem *exc_info = tstate->exc_info;
    tmp_type = exc_info->exc_type;
    tmp_value = exc_info->exc_value;
    tmp_tb = exc_info->exc_traceback;
    exc_info->exc_type = *type;
    exc_info->exc_value = *value;
    exc_info->exc_traceback = *tb;
    #else
    tmp_type = tstate->exc_type;
    tmp_value = tstate->exc_value;
    tmp_tb = tstate->exc_traceback;
    tstate->exc_type = *type;
    tstate->exc_value = *value;
    tstate->exc_traceback = *tb;
    #endif
    *type = tmp_type;
    *value = tmp_value;
    *tb = tmp_tb;
}
#else
static CYTHON_INLINE void __Pyx_ExceptionSwap(PyObject **type, PyObject **value, PyObject **tb) {
    PyObject *tmp_type, *tmp_value, *tmp_tb;
    PyErr_GetExcInfo(&tmp_type, &tmp_value, &tmp_tb);
    PyErr_SetExcInfo(*type, *value, *tb);
    *type = tmp_type;
    *value = tmp_value;
    *tb = tmp_tb;
}
#endif

/* Import */
static PyObject *__Pyx_Import(PyObject *name, PyObject *from_list, int level) {
    PyObject *empty_list = 0;
    PyObject *module = 0;
    PyObject *global_dict = 0;
    PyObject *empty_dict = 0;
    PyObject *list;
    #if PY_MAJOR_VERSION < 3
    PyObject *py_import;
    py_import = __Pyx_PyObject_GetAttrStr(__pyx_b, __pyx_n_s_import);
    if (!py_import)
        goto bad;
    #endif
    if (from_list)
        list = from_list;
    else {
        empty_list = PyList_New(0);
        if (!empty_list)
            goto bad;
        list = empty_list;
    }
    global_dict = PyModule_GetDict(__pyx_m);
    if (!global_dict)
        goto bad;
    empty_dict = PyDict_New();
    if (!empty_dict)
        goto bad;
    {
        #if PY_MAJOR_VERSION >= 3
        if (level == -1) {
            if ((1) && (strchr(__Pyx_MODULE_NAME, '.'))) {
                module = PyImport_ImportModuleLevelObject(
                    name, global_dict, empty_dict, list, 1);
                if (!module) {
                    if (!PyErr_ExceptionMatches(PyExc_ImportError))
                        goto bad;
                    PyErr_Clear();
                }
            }
            level = 0;
        }
        #endif
        if (!module) {
            #if PY_MAJOR_VERSION < 3
            PyObject *py_level = PyInt_FromLong(level);
            if (!py_level)
                goto bad;
            module = PyObject_CallFunctionObjArgs(py_import,
                name, global_dict, empty_dict, list, py_level, (PyObject *)NULL);
            Py_DECREF(py_level);
            #else
            module = PyImport_ImportModuleLevelObject(
                name, global_dict, empty_dict, list, level);
            #endif
        }
    }
bad:
    #if PY_MAJOR_VERSION < 3
    Py_XDECREF(py_import);
    #endif
    Py_XDECREF(empty_list);
    Py_XDECREF(empty_dict);
    return module;
}

/* FastTypeChecks */
#if CYTHON_COMPILING_IN_CPYTHON
static int __Pyx_InBases(PyTypeObject *a, PyTypeObject *b) {
    while (a) {
        a = a->tp_base;
        if (a == b)
            return 1;
    }
    return b == &PyBaseObject_Type;
}
static CYTHON_INLINE int __Pyx_IsSubtype(PyTypeObject *a, PyTypeObject *b) {
    PyObject *mro;
    if (a == b) return 1;
    mro = a->tp_mro;
    if (likely(mro)) {
        Py_ssize_t i, n;
        n = PyTuple_GET_SIZE(mro);
        for (i = 0; i < n; i++) {
            if (PyTuple_GET_ITEM(mro, i) == (PyObject *)b)
                return 1;
        }
        return 0;
    }
    return __Pyx_InBases(a, b);
}
#if PY_MAJOR_VERSION == 2
static int __Pyx_inner_PyErr_GivenExceptionMatches2(PyObject *err, PyObject* exc_type1, PyObject* exc_type2) {
    PyObject *exception, *value, *tb;
    int res;
    __Pyx_PyThreadState_declare
    __Pyx_PyThreadState_assign
    __Pyx_ErrFetch(&exception, &value, &tb);
    res = exc_type1 ? PyObject_IsSubclass(err, exc_type1) : 0;
    if (unlikely(res == -1)) {
        PyErr_WriteUnraisable(err);
        res = 0;
    }
    if (!res) {
        res = PyObject_IsSubclass(err, exc_type2);
        if (unlikely(res == -1)) {
            PyErr_WriteUnraisable(err);
            res = 0;
        }
    }
    __Pyx_ErrRestore(exception, value, tb);
    return res;
}
#else
static CYTHON_INLINE int __Pyx_inner_PyErr_GivenExceptionMatches2(PyObject *err, PyObject* exc_type1, PyObject *exc_type2) {
    int res = exc_type1 ? __Pyx_IsSubtype((PyTypeObject*)err, (PyTypeObject*)exc_type1) : 0;
    if (!res) {
        res = __Pyx_IsSubtype((PyTypeObject*)err, (PyTypeObject*)exc_type2);
    }
    return res;
}
#endif
static int __Pyx_PyErr_GivenExceptionMatchesTuple(PyObject *exc_type, PyObject *tuple) {
    Py_ssize_t i, n;
    assert(PyExceptionClass_Check(exc_type));
    n = PyTuple_GET_SIZE(tuple);
#if PY_MAJOR_VERSION >= 3
    for (i=0; i<n; i++) {
        if (exc_type == PyTuple_GET_ITEM(tuple, i)) return 1;
    }
#endif
    for (i=0; i<n; i++) {
        PyObject *t = PyTuple_GET_ITEM(tuple, i);
        #if PY_MAJOR_VERSION < 3
        if (likely(exc_type == t)) return 1;
        #endif
        if (likely(PyExceptionClass_Check(t))) {
            if (__Pyx_inner_PyErr_GivenExceptionMatches2(exc_type, NULL, t)) return 1;
        } else {
        }
    }
    return 0;
}
static CYTHON_INLINE int __Pyx_PyErr_GivenExceptionMatches(PyObject *err, PyObject* exc_type) {
    if (likely(err == exc_type)) return 1;
    if (likely(PyExceptionClass_Check(err))) {
        if (likely(PyExceptionClass_Check(exc_type))) {
            return __Pyx_inner_PyErr_GivenExceptionMatches2(err, NULL, exc_type);
        } else if (likely(PyTuple_Check(exc_type))) {
            return __Pyx_PyErr_GivenExceptionMatchesTuple(err, exc_type);
        } else {
        }
    }
    return PyErr_GivenExceptionMatches(err, exc_type);
}
static CYTHON_INLINE int __Pyx_PyErr_GivenExceptionMatches2(PyObject *err, PyObject *exc_type1, PyObject *exc_type2) {
    assert(PyExceptionClass_Check(exc_type1));
    assert(PyExceptionClass_Check(exc_type2));
    if (likely(err == exc_type1 || err == exc_type2)) return 1;
    if (likely(PyExceptionClass_Check(err))) {
        return __Pyx_inner_PyErr_GivenExceptionMatches2(err, exc_type1, exc_type2);
    }
    return (PyErr_GivenExceptionMatches(err, exc_type1) || PyErr_GivenExceptionMatches(err, exc_type2));
}
#endif

/* PyIntBinop */
#if !CYTHON_COMPILING_IN_PYPY
static PyObject* __Pyx_PyInt_AddObjC(PyObject *op1, PyObject *op2, CYTHON_UNUSED long intval, int inplace, int zerodivision_check) {
    (void)inplace;
    (void)zerodivision_check;
    #if PY_MAJOR_VERSION < 3
    if (likely(PyInt_CheckExact(op1))) {
        const long b = intval;
        long x;
        long a = PyInt_AS_LONG(op1);
            x = (long)((unsigned long)a + b);
            if (likely((x^a) >= 0 || (x^b) >= 0))
                return PyInt_FromLong(x);
            return PyLong_Type.tp_as_number->nb_add(op1, op2);
    }
    #endif
    #if CYTHON_USE_PYLONG_INTERNALS
    if (likely(PyLong_CheckExact(op1))) {
        const long b = intval;
        long a, x;
#ifdef HAVE_LONG_LONG
        const PY_LONG_LONG llb = intval;
        PY_LONG_LONG lla, llx;
#endif
        const digit* digits = ((PyLongObject*)op1)->ob_digit;
        const Py_ssize_t size = Py_SIZE(op1);
        if (likely(__Pyx_sst_abs(size) <= 1)) {
            a = likely(size) ? digits[0] : 0;
            if (size == -1) a = -a;
        } else {
            switch (size) {
                case -2:
                    if (8 * sizeof(long) - 1 > 2 * PyLong_SHIFT) {
                        a = -(long) (((((unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0]));
                        break;
#ifdef HAVE_LONG_LONG
                    } else if (8 * sizeof(PY_LONG_LONG) - 1 > 2 * PyLong_SHIFT) {
                        lla = -(PY_LONG_LONG) (((((unsigned PY_LONG_LONG)digits[1]) << PyLong_SHIFT) | (unsigned PY_LONG_LONG)digits[0]));
                        goto long_long;
#endif
                    }
                    CYTHON_FALLTHROUGH;
                case 2:
                    if (8 * sizeof(long) - 1 > 2 * PyLong_SHIFT) {
                        a = (long) (((((unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0]));
                        break;
#ifdef HAVE_LONG_LONG
                    } else if (8 * sizeof(PY_LONG_LONG) - 1 > 2 * PyLong_SHIFT) {
                        lla = (PY_LONG_LONG) (((((unsigned PY_LONG_LONG)digits[1]) << PyLong_SHIFT) | (unsigned PY_LONG_LONG)digits[0]));
                        goto long_long;
#endif
                    }
                    CYTHON_FALLTHROUGH;
                case -3:
                    if (8 * sizeof(long) - 1 > 3 * PyLong_SHIFT) {
                        a = -(long) (((((((unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0]));
                        break;
#ifdef HAVE_LONG_LONG
                    } else if (8 * sizeof(PY_LONG_LONG) - 1 > 3 * PyLong_SHIFT) {
                        lla = -(PY_LONG_LONG) (((((((unsigned PY_LONG_LONG)digits[2]) << PyLong_SHIFT) | (unsigned PY_LONG_LONG)digits[1]) << PyLong_SHIFT) | (unsigned PY_LONG_LONG)digits[0]));
                        goto long_long;
#endif
                    }
                    CYTHON_FALLTHROUGH;
                case 3:
                    if (8 * sizeof(long) - 1 > 3 * PyLong_SHIFT) {
                        a = (long) (((((((unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0]));
                        break;
#ifdef HAVE_LONG_LONG
                    } else if (8 * sizeof(PY_LONG_LONG) - 1 > 3 * PyLong_SHIFT) {
                        lla = (PY_LONG_LONG) (((((((unsigned PY_LONG_LONG)digits[2]) << PyLong_SHIFT) | (unsigned PY_LONG_LONG)digits[1]) << PyLong_SHIFT) | (unsigned PY_LONG_LONG)digits[0]));
                        goto long_long;
#endif
                    }
                    CYTHON_FALLTHROUGH;
                case -4:
                    if (8 * sizeof(long) - 1 > 4 * PyLong_SHIFT) {
                        a = -(long) (((((((((unsigned long)digits[3]) << PyLong_SHIFT) | (unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0]));
                        break;
#ifdef HAVE_LONG_LONG
                    } else if (8 * sizeof(PY_LONG_LONG) - 1 > 4 * PyLong_SHIFT) {
                        lla = -(PY_LONG_LONG) (((((((((unsigned PY_LONG_LONG)digits[3]) << PyLong_SHIFT) | (unsigned PY_LONG_LONG)digits[2]) << PyLong_SHIFT) | (unsigned PY_LONG_LONG)digits[1]) << PyLong_SHIFT) | (unsigned PY_LONG_LONG)digits[0]));
                        goto long_long;
#endif
                    }
                    CYTHON_FALLTHROUGH;
                case 4:
                    if (8 * sizeof(long) - 1 > 4 * PyLong_SHIFT) {
                        a = (long) (((((((((unsigned long)digits[3]) << PyLong_SHIFT) | (unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0]));
                        break;
#ifdef HAVE_LONG_LONG
                    } else if (8 * sizeof(PY_LONG_LONG) - 1 > 4 * PyLong_SHIFT) {
                        lla = (PY_LONG_LONG) (((((((((unsigned PY_LONG_LONG)digits[3]) << PyLong_SHIFT) | (unsigned PY_LONG_LONG)digits[2]) << PyLong_SHIFT) | (unsigned PY_LONG_LONG)digits[1]) << PyLong_SHIFT) | (unsigned PY_LONG_LONG)digits[0]));
                        goto long_long;
#endif
                    }
                    CYTHON_FALLTHROUGH;
                default: return PyLong_Type.tp_as_number->nb_add(op1, op2);
            }
        }
                x = a + b;
            return PyLong_FromLong(x);
#ifdef HAVE_LONG_LONG
        long_long:
                llx = lla + llb;
            return PyLong_FromLongLong(llx);
#endif
        
        
    }
    #endif
    if (PyFloat_CheckExact(op1)) {
        const long b = intval;
        double a = PyFloat_AS_DOUBLE(op1);
            double result;
            PyFPE_START_PROTECT("add", return NULL)
            result = ((double)a) + (double)b;
            PyFPE_END_PROTECT(result)
            return PyFloat_FromDouble(result);
    }
    return (inplace ? PyNumber_InPlaceAdd : PyNumber_Add)(op1, op2);
}
#endif

/* DivInt[long] */
static CYTHON_INLINE long __Pyx_div_long(long a, long b) {
    long q = a / b;
    long r = a - q*b;
    q -= ((r != 0) & ((r ^ b) < 0));
    return q;
}

/* ImportFrom */
static PyObject* __Pyx_ImportFrom(PyObject* module, PyObject* name) {
    PyObject* value = __Pyx_PyObject_GetAttrStr(module, name);
    if (unlikely(!value) && PyErr_ExceptionMatches(PyExc_AttributeError)) {
        PyErr_Format(PyExc_ImportError,
        #if PY_MAJOR_VERSION < 3
            "cannot import name %.230s", PyString_AS_STRING(name));
        #else
            "cannot import name %S", name);
        #endif
    }
    return value;
}

/* HasAttr */
static CYTHON_INLINE int __Pyx_HasAttr(PyObject *o, PyObject *n) {
    PyObject *r;
    if (unlikely(!__Pyx_PyBaseString_Check(n))) {
        PyErr_SetString(PyExc_TypeError,
                        "hasattr(): attribute name must be string");
        return -1;
    }
    r = __Pyx_GetAttr(o, n);
    if (unlikely(!r)) {
        PyErr_Clear();
        return 0;
    } else {
        Py_DECREF(r);
        return 1;
    }
}

/* PyObject_GenericGetAttrNoDict */
#if CYTHON_USE_TYPE_SLOTS && CYTHON_USE_PYTYPE_LOOKUP && PY_VERSION_HEX < 0x03070000
static PyObject *__Pyx_RaiseGenericGetAttributeError(PyTypeObject *tp, PyObject *attr_name) {
    PyErr_Format(PyExc_AttributeError,
#if PY_MAJOR_VERSION >= 3
                 "'%.50s' object has no attribute '%U'",
                 tp->tp_name, attr_name);
#else
                 "'%.50s' object has no attribute '%.400s'",
                 tp->tp_name, PyString_AS_STRING(attr_name));
#endif
    return NULL;
}
static CYTHON_INLINE PyObject* __Pyx_PyObject_GenericGetAttrNoDict(PyObject* obj, PyObject* attr_name) {
    PyObject *descr;
    PyTypeObject *tp = Py_TYPE(obj);
    if (unlikely(!PyString_Check(attr_name))) {
        return PyObject_GenericGetAttr(obj, attr_name);
    }
    assert(!tp->tp_dictoffset);
    descr = _PyType_Lookup(tp, attr_name);
    if (unlikely(!descr)) {
        return __Pyx_RaiseGenericGetAttributeError(tp, attr_name);
    }
    Py_INCREF(descr);
    #if PY_MAJOR_VERSION < 3
    if (likely(PyType_HasFeature(Py_TYPE(descr), Py_TPFLAGS_HAVE_CLASS)))
    #endif
    {
        descrgetfunc f = Py_TYPE(descr)->tp_descr_get;
        if (unlikely(f)) {
            PyObject *res = f(descr, obj, (PyObject *)tp);
            Py_DECREF(descr);
            return res;
        }
    }
    return descr;
}
#endif

/* PyObject_GenericGetAttr */
#if CYTHON_USE_TYPE_SLOTS && CYTHON_USE_PYTYPE_LOOKUP && PY_VERSION_HEX < 0x03070000
static PyObject* __Pyx_PyObject_GenericGetAttr(PyObject* obj, PyObject* attr_name) {
    if (unlikely(Py_TYPE(obj)->tp_dictoffset)) {
        return PyObject_GenericGetAttr(obj, attr_name);
    }
    return __Pyx_PyObject_GenericGetAttrNoDict(obj, attr_name);
}
#endif

/* SetVTable */
static int __Pyx_SetVtable(PyObject *dict, void *vtable) {
#if PY_VERSION_HEX >= 0x02070000
    PyObject *ob = PyCapsule_New(vtable, 0, 0);
#else
    PyObject *ob = PyCObject_FromVoidPtr(vtable, 0);
#endif
    if (!ob)
        goto bad;
    if (PyDict_SetItem(dict, __pyx_n_s_pyx_vtable, ob) < 0)
        goto bad;
    Py_DECREF(ob);
    return 0;
bad:
    Py_XDECREF(ob);
    return -1;
}

/* PyObjectGetAttrStrNoError */
static void __Pyx_PyObject_GetAttrStr_ClearAttributeError(void) {
    __Pyx_PyThreadState_declare
    __Pyx_PyThreadState_assign
    if (likely(__Pyx_PyErr_ExceptionMatches(PyExc_AttributeError)))
        __Pyx_PyErr_Clear();
}
static CYTHON_INLINE PyObject* __Pyx_PyObject_GetAttrStrNoError(PyObject* obj, PyObject* attr_name) {
    PyObject *result;
#if CYTHON_COMPILING_IN_CPYTHON && CYTHON_USE_TYPE_SLOTS && PY_VERSION_HEX >= 0x030700B1
    PyTypeObject* tp = Py_TYPE(obj);
    if (likely(tp->tp_getattro == PyObject_GenericGetAttr)) {
        return _PyObject_GenericGetAttrWithDict(obj, attr_name, NULL, 1);
    }
#endif
    result = __Pyx_PyObject_GetAttrStr(obj, attr_name);
    if (unlikely(!result)) {
        __Pyx_PyObject_GetAttrStr_ClearAttributeError();
    }
    return result;
}

/* SetupReduce */
static int __Pyx_setup_reduce_is_named(PyObject* meth, PyObject* name) {
  int ret;
  PyObject *name_attr;
  name_attr = __Pyx_PyObject_GetAttrStr(meth, __pyx_n_s_name_2);
  if (likely(name_attr)) {
      ret = PyObject_RichCompareBool(name_attr, name, Py_EQ);
  } else {
      ret = -1;
  }
  if (unlikely(ret < 0)) {
      PyErr_Clear();
      ret = 0;
  }
  Py_XDECREF(name_attr);
  return ret;
}
static int __Pyx_setup_reduce(PyObject* type_obj) {
    int ret = 0;
    PyObject *object_reduce = NULL;
    PyObject *object_reduce_ex = NULL;
    PyObject *reduce = NULL;
    PyObject *reduce_ex = NULL;
    PyObject *reduce_cython = NULL;
    PyObject *setstate = NULL;
    PyObject *setstate_cython = NULL;
#if CYTHON_USE_PYTYPE_LOOKUP
    if (_PyType_Lookup((PyTypeObject*)type_obj, __pyx_n_s_getstate)) goto __PYX_GOOD;
#else
    if (PyObject_HasAttr(type_obj, __pyx_n_s_getstate)) goto __PYX_GOOD;
#endif
#if CYTHON_USE_PYTYPE_LOOKUP
    object_reduce_ex = _PyType_Lookup(&PyBaseObject_Type, __pyx_n_s_reduce_ex); if (!object_reduce_ex) goto __PYX_BAD;
#else
    object_reduce_ex = __Pyx_PyObject_GetAttrStr((PyObject*)&PyBaseObject_Type, __pyx_n_s_reduce_ex); if (!object_reduce_ex) goto __PYX_BAD;
#endif
    reduce_ex = __Pyx_PyObject_GetAttrStr(type_obj, __pyx_n_s_reduce_ex); if (unlikely(!reduce_ex)) goto __PYX_BAD;
    if (reduce_ex == object_reduce_ex) {
#if CYTHON_USE_PYTYPE_LOOKUP
        object_reduce = _PyType_Lookup(&PyBaseObject_Type, __pyx_n_s_reduce); if (!object_reduce) goto __PYX_BAD;
#else
        object_reduce = __Pyx_PyObject_GetAttrStr((PyObject*)&PyBaseObject_Type, __pyx_n_s_reduce); if (!object_reduce) goto __PYX_BAD;
#endif
        reduce = __Pyx_PyObject_GetAttrStr(type_obj, __pyx_n_s_reduce); if (unlikely(!reduce)) goto __PYX_BAD;
        if (reduce == object_reduce || __Pyx_setup_reduce_is_named(reduce, __pyx_n_s_reduce_cython)) {
            reduce_cython = __Pyx_PyObject_GetAttrStrNoError(type_obj, __pyx_n_s_reduce_cython);
            if (likely(reduce_cython)) {
                ret = PyDict_SetItem(((PyTypeObject*)type_obj)->tp_dict, __pyx_n_s_reduce, reduce_cython); if (unlikely(ret < 0)) goto __PYX_BAD;
                ret = PyDict_DelItem(((PyTypeObject*)type_obj)->tp_dict, __pyx_n_s_reduce_cython); if (unlikely(ret < 0)) goto __PYX_BAD;
            } else if (reduce == object_reduce || PyErr_Occurred()) {
                goto __PYX_BAD;
            }
            setstate = __Pyx_PyObject_GetAttrStr(type_obj, __pyx_n_s_setstate);
            if (!setstate) PyErr_Clear();
            if (!setstate || __Pyx_setup_reduce_is_named(setstate, __pyx_n_s_setstate_cython)) {
                setstate_cython = __Pyx_PyObject_GetAttrStrNoError(type_obj, __pyx_n_s_setstate_cython);
                if (likely(setstate_cython)) {
                    ret = PyDict_SetItem(((PyTypeObject*)type_obj)->tp_dict, __pyx_n_s_setstate, setstate_cython); if (unlikely(ret < 0)) goto __PYX_BAD;
                    ret = PyDict_DelItem(((PyTypeObject*)type_obj)->tp_dict, __pyx_n_s_setstate_cython); if (unlikely(ret < 0)) goto __PYX_BAD;
                } else if (!setstate || PyErr_Occurred()) {
                    goto __PYX_BAD;
                }
            }
            PyType_Modified((PyTypeObject*)type_obj);
        }
    }
    goto __PYX_GOOD;
__PYX_BAD:
    if (!PyErr_Occurred())
        PyErr_Format(PyExc_RuntimeError, "Unable to initialize pickling for %s", ((PyTypeObject*)type_obj)->tp_name);
    ret = -1;
__PYX_GOOD:
#if !CYTHON_USE_PYTYPE_LOOKUP
    Py_XDECREF(object_reduce);
    Py_XDECREF(object_reduce_ex);
#endif
    Py_XDECREF(reduce);
    Py_XDECREF(reduce_ex);
    Py_XDECREF(reduce_cython);
    Py_XDECREF(setstate);
    Py_XDECREF(setstate_cython);
    return ret;
}

/* TypeImport */
#ifndef __PYX_HAVE_RT_ImportType
#define __PYX_HAVE_RT_ImportType
static PyTypeObject *__Pyx_ImportType(PyObject *module, const char *module_name, const char *class_name,
    size_t size, enum __Pyx_ImportType_CheckSize check_size)
{
    PyObject *result = 0;
    char warning[200];
    Py_ssize_t basicsize;
#ifdef Py_LIMITED_API
    PyObject *py_basicsize;
#endif
    result = PyObject_GetAttrString(module, class_name);
    if (!result)
        goto bad;
    if (!PyType_Check(result)) {
        PyErr_Format(PyExc_TypeError,
            "%.200s.%.200s is not a type object",
            module_name, class_name);
        goto bad;
    }
#ifndef Py_LIMITED_API
    basicsize = ((PyTypeObject *)result)->tp_basicsize;
#else
    py_basicsize = PyObject_GetAttrString(result, "__basicsize__");
    if (!py_basicsize)
        goto bad;
    basicsize = PyLong_AsSsize_t(py_basicsize);
    Py_DECREF(py_basicsize);
    py_basicsize = 0;
    if (basicsize == (Py_ssize_t)-1 && PyErr_Occurred())
        goto bad;
#endif
    if ((size_t)basicsize < size) {
        PyErr_Format(PyExc_ValueError,
            "%.200s.%.200s size changed, may indicate binary incompatibility. "
            "Expected %zd from C header, got %zd from PyObject",
            module_name, class_name, size, basicsize);
        goto bad;
    }
    if (check_size == __Pyx_ImportType_CheckSize_Error && (size_t)basicsize != size) {
        PyErr_Format(PyExc_ValueError,
            "%.200s.%.200s size changed, may indicate binary incompatibility. "
            "Expected %zd from C header, got %zd from PyObject",
            module_name, class_name, size, basicsize);
        goto bad;
    }
    else if (check_size == __Pyx_ImportType_CheckSize_Warn && (size_t)basicsize > size) {
        PyOS_snprintf(warning, sizeof(warning),
            "%s.%s size changed, may indicate binary incompatibility. "
            "Expected %zd from C header, got %zd from PyObject",
            module_name, class_name, size, basicsize);
        if (PyErr_WarnEx(NULL, warning, 0) < 0) goto bad;
    }
    return (PyTypeObject *)result;
bad:
    Py_XDECREF(result);
    return NULL;
}
#endif

/* CLineInTraceback */
#ifndef CYTHON_CLINE_IN_TRACEBACK
static int __Pyx_CLineForTraceback(CYTHON_NCP_UNUSED PyThreadState *tstate, int c_line) {
    PyObject *use_cline;
    PyObject *ptype, *pvalue, *ptraceback;
#if CYTHON_COMPILING_IN_CPYTHON
    PyObject **cython_runtime_dict;
#endif
    if (unlikely(!__pyx_cython_runtime)) {
        return c_line;
    }
    __Pyx_ErrFetchInState(tstate, &ptype, &pvalue, &ptraceback);
#if CYTHON_COMPILING_IN_CPYTHON
    cython_runtime_dict = _PyObject_GetDictPtr(__pyx_cython_runtime);
    if (likely(cython_runtime_dict)) {
        __PYX_PY_DICT_LOOKUP_IF_MODIFIED(
            use_cline, *cython_runtime_dict,
            __Pyx_PyDict_GetItemStr(*cython_runtime_dict, __pyx_n_s_cline_in_traceback))
    } else
#endif
    {
      PyObject *use_cline_obj = __Pyx_PyObject_GetAttrStr(__pyx_cython_runtime, __pyx_n_s_cline_in_traceback);
      if (use_cline_obj) {
        use_cline = PyObject_Not(use_cline_obj) ? Py_False : Py_True;
        Py_DECREF(use_cline_obj);
      } else {
        PyErr_Clear();
        use_cline = NULL;
      }
    }
    if (!use_cline) {
        c_line = 0;
        (void) PyObject_SetAttr(__pyx_cython_runtime, __pyx_n_s_cline_in_traceback, Py_False);
    }
    else if (use_cline == Py_False || (use_cline != Py_True && PyObject_Not(use_cline) != 0)) {
        c_line = 0;
    }
    __Pyx_ErrRestoreInState(tstate, ptype, pvalue, ptraceback);
    return c_line;
}
#endif

/* CodeObjectCache */
static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries, int count, int code_line) {
    int start = 0, mid = 0, end = count - 1;
    if (end >= 0 && code_line > entries[end].code_line) {
        return count;
    }
    while (start < end) {
        mid = start + (end - start) / 2;
        if (code_line < entries[mid].code_line) {
            end = mid;
        } else if (code_line > entries[mid].code_line) {
             start = mid + 1;
        } else {
            return mid;
        }
    }
    if (code_line <= entries[mid].code_line) {
        return mid;
    } else {
        return mid + 1;
    }
}
static PyCodeObject *__pyx_find_code_object(int code_line) {
    PyCodeObject* code_object;
    int pos;
    if (unlikely(!code_line) || unlikely(!__pyx_code_cache.entries)) {
        return NULL;
    }
    pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line);
    if (unlikely(pos >= __pyx_code_cache.count) || unlikely(__pyx_code_cache.entries[pos].code_line != code_line)) {
        return NULL;
    }
    code_object = __pyx_code_cache.entries[pos].code_object;
    Py_INCREF(code_object);
    return code_object;
}
static void __pyx_insert_code_object(int code_line, PyCodeObject* code_object) {
    int pos, i;
    __Pyx_CodeObjectCacheEntry* entries = __pyx_code_cache.entries;
    if (unlikely(!code_line)) {
        return;
    }
    if (unlikely(!entries)) {
        entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Malloc(64*sizeof(__Pyx_CodeObjectCacheEntry));
        if (likely(entries)) {
            __pyx_code_cache.entries = entries;
            __pyx_code_cache.max_count = 64;
            __pyx_code_cache.count = 1;
            entries[0].code_line = code_line;
            entries[0].code_object = code_object;
            Py_INCREF(code_object);
        }
        return;
    }
    pos = __pyx_bisect_code_objects(__pyx_code_cache.entries, __pyx_code_cache.count, code_line);
    if ((pos < __pyx_code_cache.count) && unlikely(__pyx_code_cache.entries[pos].code_line == code_line)) {
        PyCodeObject* tmp = entries[pos].code_object;
        entries[pos].code_object = code_object;
        Py_DECREF(tmp);
        return;
    }
    if (__pyx_code_cache.count == __pyx_code_cache.max_count) {
        int new_max = __pyx_code_cache.max_count + 64;
        entries = (__Pyx_CodeObjectCacheEntry*)PyMem_Realloc(
            __pyx_code_cache.entries, ((size_t)new_max) * sizeof(__Pyx_CodeObjectCacheEntry));
        if (unlikely(!entries)) {
            return;
        }
        __pyx_code_cache.entries = entries;
        __pyx_code_cache.max_count = new_max;
    }
    for (i=__pyx_code_cache.count; i>pos; i--) {
        entries[i] = entries[i-1];
    }
    entries[pos].code_line = code_line;
    entries[pos].code_object = code_object;
    __pyx_code_cache.count++;
    Py_INCREF(code_object);
}

/* AddTraceback */
#include "compile.h"
#include "frameobject.h"
#include "traceback.h"
static PyCodeObject* __Pyx_CreateCodeObjectForTraceback(
            const char *funcname, int c_line,
            int py_line, const char *filename) {
    PyCodeObject *py_code = NULL;
    PyObject *py_funcname = NULL;
    #if PY_MAJOR_VERSION < 3
    PyObject *py_srcfile = NULL;
    py_srcfile = PyString_FromString(filename);
    if (!py_srcfile) goto bad;
    #endif
    if (c_line) {
        #if PY_MAJOR_VERSION < 3
        py_funcname = PyString_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line);
        if (!py_funcname) goto bad;
        #else
        py_funcname = PyUnicode_FromFormat( "%s (%s:%d)", funcname, __pyx_cfilenm, c_line);
        if (!py_funcname) goto bad;
        funcname = PyUnicode_AsUTF8(py_funcname);
        if (!funcname) goto bad;
        #endif
    }
    else {
        #if PY_MAJOR_VERSION < 3
        py_funcname = PyString_FromString(funcname);
        if (!py_funcname) goto bad;
        #endif
    }
    #if PY_MAJOR_VERSION < 3
    py_code = __Pyx_PyCode_New(
        0,
        0,
        0,
        0,
        0,
        __pyx_empty_bytes, /*PyObject *code,*/
        __pyx_empty_tuple, /*PyObject *consts,*/
        __pyx_empty_tuple, /*PyObject *names,*/
        __pyx_empty_tuple, /*PyObject *varnames,*/
        __pyx_empty_tuple, /*PyObject *freevars,*/
        __pyx_empty_tuple, /*PyObject *cellvars,*/
        py_srcfile,   /*PyObject *filename,*/
        py_funcname,  /*PyObject *name,*/
        py_line,
        __pyx_empty_bytes  /*PyObject *lnotab*/
    );
    Py_DECREF(py_srcfile);
    #else
    py_code = PyCode_NewEmpty(filename, funcname, py_line);
    #endif
    Py_XDECREF(py_funcname);  // XDECREF since it's only set on Py3 if cline
    return py_code;
bad:
    Py_XDECREF(py_funcname);
    #if PY_MAJOR_VERSION < 3
    Py_XDECREF(py_srcfile);
    #endif
    return NULL;
}
static void __Pyx_AddTraceback(const char *funcname, int c_line,
                               int py_line, const char *filename) {
    PyCodeObject *py_code = 0;
    PyFrameObject *py_frame = 0;
    PyThreadState *tstate = __Pyx_PyThreadState_Current;
    if (c_line) {
        c_line = __Pyx_CLineForTraceback(tstate, c_line);
    }
    py_code = __pyx_find_code_object(c_line ? -c_line : py_line);
    if (!py_code) {
        py_code = __Pyx_CreateCodeObjectForTraceback(
            funcname, c_line, py_line, filename);
        if (!py_code) goto bad;
        __pyx_insert_code_object(c_line ? -c_line : py_line, py_code);
    }
    py_frame = PyFrame_New(
        tstate,            /*PyThreadState *tstate,*/
        py_code,           /*PyCodeObject *code,*/
        __pyx_d,    /*PyObject *globals,*/
        0                  /*PyObject *locals*/
    );
    if (!py_frame) goto bad;
    __Pyx_PyFrame_SetLineNumber(py_frame, py_line);
    PyTraceBack_Here(py_frame);
bad:
    Py_XDECREF(py_code);
    Py_XDECREF(py_frame);
}

#if PY_MAJOR_VERSION < 3
static int __Pyx_GetBuffer(PyObject *obj, Py_buffer *view, int flags) {
    if (PyObject_CheckBuffer(obj)) return PyObject_GetBuffer(obj, view, flags);
        if (__Pyx_TypeCheck(obj, __pyx_array_type)) return __pyx_array_getbuffer(obj, view, flags);
        if (__Pyx_TypeCheck(obj, __pyx_memoryview_type)) return __pyx_memoryview_getbuffer(obj, view, flags);
    PyErr_Format(PyExc_TypeError, "'%.200s' does not have the buffer interface", Py_TYPE(obj)->tp_name);
    return -1;
}
static void __Pyx_ReleaseBuffer(Py_buffer *view) {
    PyObject *obj = view->obj;
    if (!obj) return;
    if (PyObject_CheckBuffer(obj)) {
        PyBuffer_Release(view);
        return;
    }
    if ((0)) {}
    view->obj = NULL;
    Py_DECREF(obj);
}
#endif


/* MemviewSliceIsContig */
static int
__pyx_memviewslice_is_contig(const __Pyx_memviewslice mvs, char order, int ndim)
{
    int i, index, step, start;
    Py_ssize_t itemsize = mvs.memview->view.itemsize;
    if (order == 'F') {
        step = 1;
        start = 0;
    } else {
        step = -1;
        start = ndim - 1;
    }
    for (i = 0; i < ndim; i++) {
        index = start + step * i;
        if (mvs.suboffsets[index] >= 0 || mvs.strides[index] != itemsize)
            return 0;
        itemsize *= mvs.shape[index];
    }
    return 1;
}

/* OverlappingSlices */
static void
__pyx_get_array_memory_extents(__Pyx_memviewslice *slice,
                               void **out_start, void **out_end,
                               int ndim, size_t itemsize)
{
    char *start, *end;
    int i;
    start = end = slice->data;
    for (i = 0; i < ndim; i++) {
        Py_ssize_t stride = slice->strides[i];
        Py_ssize_t extent = slice->shape[i];
        if (extent == 0) {
            *out_start = *out_end = start;
            return;
        } else {
            if (stride > 0)
                end += stride * (extent - 1);
            else
                start += stride * (extent - 1);
        }
    }
    *out_start = start;
    *out_end = end + itemsize;
}
static int
__pyx_slices_overlap(__Pyx_memviewslice *slice1,
                     __Pyx_memviewslice *slice2,
                     int ndim, size_t itemsize)
{
    void *start1, *end1, *start2, *end2;
    __pyx_get_array_memory_extents(slice1, &start1, &end1, ndim, itemsize);
    __pyx_get_array_memory_extents(slice2, &start2, &end2, ndim, itemsize);
    return (start1 < end2) && (start2 < end1);
}

/* Capsule */
static CYTHON_INLINE PyObject *
__pyx_capsule_create(void *p, CYTHON_UNUSED const char *sig)
{
    PyObject *cobj;
#if PY_VERSION_HEX >= 0x02070000
    cobj = PyCapsule_New(p, sig, NULL);
#else
    cobj = PyCObject_FromVoidPtr(p, NULL);
#endif
    return cobj;
}

/* IsLittleEndian */
static CYTHON_INLINE int __Pyx_Is_Little_Endian(void)
{
  union {
    uint32_t u32;
    uint8_t u8[4];
  } S;
  S.u32 = 0x01020304;
  return S.u8[0] == 4;
}

/* BufferFormatCheck */
static void __Pyx_BufFmt_Init(__Pyx_BufFmt_Context* ctx,
                              __Pyx_BufFmt_StackElem* stack,
                              __Pyx_TypeInfo* type) {
  stack[0].field = &ctx->root;
  stack[0].parent_offset = 0;
  ctx->root.type = type;
  ctx->root.name = "buffer dtype";
  ctx->root.offset = 0;
  ctx->head = stack;
  ctx->head->field = &ctx->root;
  ctx->fmt_offset = 0;
  ctx->head->parent_offset = 0;
  ctx->new_packmode = '@';
  ctx->enc_packmode = '@';
  ctx->new_count = 1;
  ctx->enc_count = 0;
  ctx->enc_type = 0;
  ctx->is_complex = 0;
  ctx->is_valid_array = 0;
  ctx->struct_alignment = 0;
  while (type->typegroup == 'S') {
    ++ctx->head;
    ctx->head->field = type->fields;
    ctx->head->parent_offset = 0;
    type = type->fields->type;
  }
}
static int __Pyx_BufFmt_ParseNumber(const char** ts) {
    int count;
    const char* t = *ts;
    if (*t < '0' || *t > '9') {
      return -1;
    } else {
        count = *t++ - '0';
        while (*t >= '0' && *t <= '9') {
            count *= 10;
            count += *t++ - '0';
        }
    }
    *ts = t;
    return count;
}
static int __Pyx_BufFmt_ExpectNumber(const char **ts) {
    int number = __Pyx_BufFmt_ParseNumber(ts);
    if (number == -1)
        PyErr_Format(PyExc_ValueError,\
                     "Does not understand character buffer dtype format string ('%c')", **ts);
    return number;
}
static void __Pyx_BufFmt_RaiseUnexpectedChar(char ch) {
  PyErr_Format(PyExc_ValueError,
               "Unexpected format string character: '%c'", ch);
}
static const char* __Pyx_BufFmt_DescribeTypeChar(char ch, int is_complex) {
  switch (ch) {
    case '?': return "'bool'";
    case 'c': return "'char'";
    case 'b': return "'signed char'";
    case 'B': return "'unsigned char'";
    case 'h': return "'short'";
    case 'H': return "'unsigned short'";
    case 'i': return "'int'";
    case 'I': return "'unsigned int'";
    case 'l': return "'long'";
    case 'L': return "'unsigned long'";
    case 'q': return "'long long'";
    case 'Q': return "'unsigned long long'";
    case 'f': return (is_complex ? "'complex float'" : "'float'");
    case 'd': return (is_complex ? "'complex double'" : "'double'");
    case 'g': return (is_complex ? "'complex long double'" : "'long double'");
    case 'T': return "a struct";
    case 'O': return "Python object";
    case 'P': return "a pointer";
    case 's': case 'p': return "a string";
    case 0: return "end";
    default: return "unparseable format string";
  }
}
static size_t __Pyx_BufFmt_TypeCharToStandardSize(char ch, int is_complex) {
  switch (ch) {
    case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1;
    case 'h': case 'H': return 2;
    case 'i': case 'I': case 'l': case 'L': return 4;
    case 'q': case 'Q': return 8;
    case 'f': return (is_complex ? 8 : 4);
    case 'd': return (is_complex ? 16 : 8);
    case 'g': {
      PyErr_SetString(PyExc_ValueError, "Python does not define a standard format string size for long double ('g')..");
      return 0;
    }
    case 'O': case 'P': return sizeof(void*);
    default:
      __Pyx_BufFmt_RaiseUnexpectedChar(ch);
      return 0;
    }
}
static size_t __Pyx_BufFmt_TypeCharToNativeSize(char ch, int is_complex) {
  switch (ch) {
    case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1;
    case 'h': case 'H': return sizeof(short);
    case 'i': case 'I': return sizeof(int);
    case 'l': case 'L': return sizeof(long);
    #ifdef HAVE_LONG_LONG
    case 'q': case 'Q': return sizeof(PY_LONG_LONG);
    #endif
    case 'f': return sizeof(float) * (is_complex ? 2 : 1);
    case 'd': return sizeof(double) * (is_complex ? 2 : 1);
    case 'g': return sizeof(long double) * (is_complex ? 2 : 1);
    case 'O': case 'P': return sizeof(void*);
    default: {
      __Pyx_BufFmt_RaiseUnexpectedChar(ch);
      return 0;
    }
  }
}
typedef struct { char c; short x; } __Pyx_st_short;
typedef struct { char c; int x; } __Pyx_st_int;
typedef struct { char c; long x; } __Pyx_st_long;
typedef struct { char c; float x; } __Pyx_st_float;
typedef struct { char c; double x; } __Pyx_st_double;
typedef struct { char c; long double x; } __Pyx_st_longdouble;
typedef struct { char c; void *x; } __Pyx_st_void_p;
#ifdef HAVE_LONG_LONG
typedef struct { char c; PY_LONG_LONG x; } __Pyx_st_longlong;
#endif
static size_t __Pyx_BufFmt_TypeCharToAlignment(char ch, CYTHON_UNUSED int is_complex) {
  switch (ch) {
    case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1;
    case 'h': case 'H': return sizeof(__Pyx_st_short) - sizeof(short);
    case 'i': case 'I': return sizeof(__Pyx_st_int) - sizeof(int);
    case 'l': case 'L': return sizeof(__Pyx_st_long) - sizeof(long);
#ifdef HAVE_LONG_LONG
    case 'q': case 'Q': return sizeof(__Pyx_st_longlong) - sizeof(PY_LONG_LONG);
#endif
    case 'f': return sizeof(__Pyx_st_float) - sizeof(float);
    case 'd': return sizeof(__Pyx_st_double) - sizeof(double);
    case 'g': return sizeof(__Pyx_st_longdouble) - sizeof(long double);
    case 'P': case 'O': return sizeof(__Pyx_st_void_p) - sizeof(void*);
    default:
      __Pyx_BufFmt_RaiseUnexpectedChar(ch);
      return 0;
    }
}
/* These are for computing the padding at the end of the struct to align
   on the first member of the struct. This will probably the same as above,
   but we don't have any guarantees.
 */
typedef struct { short x; char c; } __Pyx_pad_short;
typedef struct { int x; char c; } __Pyx_pad_int;
typedef struct { long x; char c; } __Pyx_pad_long;
typedef struct { float x; char c; } __Pyx_pad_float;
typedef struct { double x; char c; } __Pyx_pad_double;
typedef struct { long double x; char c; } __Pyx_pad_longdouble;
typedef struct { void *x; char c; } __Pyx_pad_void_p;
#ifdef HAVE_LONG_LONG
typedef struct { PY_LONG_LONG x; char c; } __Pyx_pad_longlong;
#endif
static size_t __Pyx_BufFmt_TypeCharToPadding(char ch, CYTHON_UNUSED int is_complex) {
  switch (ch) {
    case '?': case 'c': case 'b': case 'B': case 's': case 'p': return 1;
    case 'h': case 'H': return sizeof(__Pyx_pad_short) - sizeof(short);
    case 'i': case 'I': return sizeof(__Pyx_pad_int) - sizeof(int);
    case 'l': case 'L': return sizeof(__Pyx_pad_long) - sizeof(long);
#ifdef HAVE_LONG_LONG
    case 'q': case 'Q': return sizeof(__Pyx_pad_longlong) - sizeof(PY_LONG_LONG);
#endif
    case 'f': return sizeof(__Pyx_pad_float) - sizeof(float);
    case 'd': return sizeof(__Pyx_pad_double) - sizeof(double);
    case 'g': return sizeof(__Pyx_pad_longdouble) - sizeof(long double);
    case 'P': case 'O': return sizeof(__Pyx_pad_void_p) - sizeof(void*);
    default:
      __Pyx_BufFmt_RaiseUnexpectedChar(ch);
      return 0;
    }
}
static char __Pyx_BufFmt_TypeCharToGroup(char ch, int is_complex) {
  switch (ch) {
    case 'c':
        return 'H';
    case 'b': case 'h': case 'i':
    case 'l': case 'q': case 's': case 'p':
        return 'I';
    case '?': case 'B': case 'H': case 'I': case 'L': case 'Q':
        return 'U';
    case 'f': case 'd': case 'g':
        return (is_complex ? 'C' : 'R');
    case 'O':
        return 'O';
    case 'P':
        return 'P';
    default: {
      __Pyx_BufFmt_RaiseUnexpectedChar(ch);
      return 0;
    }
  }
}
static void __Pyx_BufFmt_RaiseExpected(__Pyx_BufFmt_Context* ctx) {
  if (ctx->head == NULL || ctx->head->field == &ctx->root) {
    const char* expected;
    const char* quote;
    if (ctx->head == NULL) {
      expected = "end";
      quote = "";
    } else {
      expected = ctx->head->field->type->name;
      quote = "'";
    }
    PyErr_Format(PyExc_ValueError,
                 "Buffer dtype mismatch, expected %s%s%s but got %s",
                 quote, expected, quote,
                 __Pyx_BufFmt_DescribeTypeChar(ctx->enc_type, ctx->is_complex));
  } else {
    __Pyx_StructField* field = ctx->head->field;
    __Pyx_StructField* parent = (ctx->head - 1)->field;
    PyErr_Format(PyExc_ValueError,
                 "Buffer dtype mismatch, expected '%s' but got %s in '%s.%s'",
                 field->type->name, __Pyx_BufFmt_DescribeTypeChar(ctx->enc_type, ctx->is_complex),
                 parent->type->name, field->name);
  }
}
static int __Pyx_BufFmt_ProcessTypeChunk(__Pyx_BufFmt_Context* ctx) {
  char group;
  size_t size, offset, arraysize = 1;
  if (ctx->enc_type == 0) return 0;
  if (ctx->head->field->type->arraysize[0]) {
    int i, ndim = 0;
    if (ctx->enc_type == 's' || ctx->enc_type == 'p') {
        ctx->is_valid_array = ctx->head->field->type->ndim == 1;
        ndim = 1;
        if (ctx->enc_count != ctx->head->field->type->arraysize[0]) {
            PyErr_Format(PyExc_ValueError,
                         "Expected a dimension of size %zu, got %zu",
                         ctx->head->field->type->arraysize[0], ctx->enc_count);
            return -1;
        }
    }
    if (!ctx->is_valid_array) {
      PyErr_Format(PyExc_ValueError, "Expected %d dimensions, got %d",
                   ctx->head->field->type->ndim, ndim);
      return -1;
    }
    for (i = 0; i < ctx->head->field->type->ndim; i++) {
      arraysize *= ctx->head->field->type->arraysize[i];
    }
    ctx->is_valid_array = 0;
    ctx->enc_count = 1;
  }
  group = __Pyx_BufFmt_TypeCharToGroup(ctx->enc_type, ctx->is_complex);
  do {
    __Pyx_StructField* field = ctx->head->field;
    __Pyx_TypeInfo* type = field->type;
    if (ctx->enc_packmode == '@' || ctx->enc_packmode == '^') {
      size = __Pyx_BufFmt_TypeCharToNativeSize(ctx->enc_type, ctx->is_complex);
    } else {
      size = __Pyx_BufFmt_TypeCharToStandardSize(ctx->enc_type, ctx->is_complex);
    }
    if (ctx->enc_packmode == '@') {
      size_t align_at = __Pyx_BufFmt_TypeCharToAlignment(ctx->enc_type, ctx->is_complex);
      size_t align_mod_offset;
      if (align_at == 0) return -1;
      align_mod_offset = ctx->fmt_offset % align_at;
      if (align_mod_offset > 0) ctx->fmt_offset += align_at - align_mod_offset;
      if (ctx->struct_alignment == 0)
          ctx->struct_alignment = __Pyx_BufFmt_TypeCharToPadding(ctx->enc_type,
                                                                 ctx->is_complex);
    }
    if (type->size != size || type->typegroup != group) {
      if (type->typegroup == 'C' && type->fields != NULL) {
        size_t parent_offset = ctx->head->parent_offset + field->offset;
        ++ctx->head;
        ctx->head->field = type->fields;
        ctx->head->parent_offset = parent_offset;
        continue;
      }
      if ((type->typegroup == 'H' || group == 'H') && type->size == size) {
      } else {
          __Pyx_BufFmt_RaiseExpected(ctx);
          return -1;
      }
    }
    offset = ctx->head->parent_offset + field->offset;
    if (ctx->fmt_offset != offset) {
      PyErr_Format(PyExc_ValueError,
                   "Buffer dtype mismatch; next field is at offset %" CYTHON_FORMAT_SSIZE_T "d but %" CYTHON_FORMAT_SSIZE_T "d expected",
                   (Py_ssize_t)ctx->fmt_offset, (Py_ssize_t)offset);
      return -1;
    }
    ctx->fmt_offset += size;
    if (arraysize)
      ctx->fmt_offset += (arraysize - 1) * size;
    --ctx->enc_count;
    while (1) {
      if (field == &ctx->root) {
        ctx->head = NULL;
        if (ctx->enc_count != 0) {
          __Pyx_BufFmt_RaiseExpected(ctx);
          return -1;
        }
        break;
      }
      ctx->head->field = ++field;
      if (field->type == NULL) {
        --ctx->head;
        field = ctx->head->field;
        continue;
      } else if (field->type->typegroup == 'S') {
        size_t parent_offset = ctx->head->parent_offset + field->offset;
        if (field->type->fields->type == NULL) continue;
        field = field->type->fields;
        ++ctx->head;
        ctx->head->field = field;
        ctx->head->parent_offset = parent_offset;
        break;
      } else {
        break;
      }
    }
  } while (ctx->enc_count);
  ctx->enc_type = 0;
  ctx->is_complex = 0;
  return 0;
}
static PyObject *
__pyx_buffmt_parse_array(__Pyx_BufFmt_Context* ctx, const char** tsp)
{
    const char *ts = *tsp;
    int i = 0, number, ndim;
    ++ts;
    if (ctx->new_count != 1) {
        PyErr_SetString(PyExc_ValueError,
                        "Cannot handle repeated arrays in format string");
        return NULL;
    }
    if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL;
    ndim = ctx->head->field->type->ndim;
    while (*ts && *ts != ')') {
        switch (*ts) {
            case ' ': case '\f': case '\r': case '\n': case '\t': case '\v':  continue;
            default:  break;
        }
        number = __Pyx_BufFmt_ExpectNumber(&ts);
        if (number == -1) return NULL;
        if (i < ndim && (size_t) number != ctx->head->field->type->arraysize[i])
            return PyErr_Format(PyExc_ValueError,
                        "Expected a dimension of size %zu, got %d",
                        ctx->head->field->type->arraysize[i], number);
        if (*ts != ',' && *ts != ')')
            return PyErr_Format(PyExc_ValueError,
                                "Expected a comma in format string, got '%c'", *ts);
        if (*ts == ',') ts++;
        i++;
    }
    if (i != ndim)
        return PyErr_Format(PyExc_ValueError, "Expected %d dimension(s), got %d",
                            ctx->head->field->type->ndim, i);
    if (!*ts) {
        PyErr_SetString(PyExc_ValueError,
                        "Unexpected end of format string, expected ')'");
        return NULL;
    }
    ctx->is_valid_array = 1;
    ctx->new_count = 1;
    *tsp = ++ts;
    return Py_None;
}
static const char* __Pyx_BufFmt_CheckString(__Pyx_BufFmt_Context* ctx, const char* ts) {
  int got_Z = 0;
  while (1) {
    switch(*ts) {
      case 0:
        if (ctx->enc_type != 0 && ctx->head == NULL) {
          __Pyx_BufFmt_RaiseExpected(ctx);
          return NULL;
        }
        if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL;
        if (ctx->head != NULL) {
          __Pyx_BufFmt_RaiseExpected(ctx);
          return NULL;
        }
        return ts;
      case ' ':
      case '\r':
      case '\n':
        ++ts;
        break;
      case '<':
        if (!__Pyx_Is_Little_Endian()) {
          PyErr_SetString(PyExc_ValueError, "Little-endian buffer not supported on big-endian compiler");
          return NULL;
        }
        ctx->new_packmode = '=';
        ++ts;
        break;
      case '>':
      case '!':
        if (__Pyx_Is_Little_Endian()) {
          PyErr_SetString(PyExc_ValueError, "Big-endian buffer not supported on little-endian compiler");
          return NULL;
        }
        ctx->new_packmode = '=';
        ++ts;
        break;
      case '=':
      case '@':
      case '^':
        ctx->new_packmode = *ts++;
        break;
      case 'T':
        {
          const char* ts_after_sub;
          size_t i, struct_count = ctx->new_count;
          size_t struct_alignment = ctx->struct_alignment;
          ctx->new_count = 1;
          ++ts;
          if (*ts != '{') {
            PyErr_SetString(PyExc_ValueError, "Buffer acquisition: Expected '{' after 'T'");
            return NULL;
          }
          if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL;
          ctx->enc_type = 0;
          ctx->enc_count = 0;
          ctx->struct_alignment = 0;
          ++ts;
          ts_after_sub = ts;
          for (i = 0; i != struct_count; ++i) {
            ts_after_sub = __Pyx_BufFmt_CheckString(ctx, ts);
            if (!ts_after_sub) return NULL;
          }
          ts = ts_after_sub;
          if (struct_alignment) ctx->struct_alignment = struct_alignment;
        }
        break;
      case '}':
        {
          size_t alignment = ctx->struct_alignment;
          ++ts;
          if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL;
          ctx->enc_type = 0;
          if (alignment && ctx->fmt_offset % alignment) {
            ctx->fmt_offset += alignment - (ctx->fmt_offset % alignment);
          }
        }
        return ts;
      case 'x':
        if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL;
        ctx->fmt_offset += ctx->new_count;
        ctx->new_count = 1;
        ctx->enc_count = 0;
        ctx->enc_type = 0;
        ctx->enc_packmode = ctx->new_packmode;
        ++ts;
        break;
      case 'Z':
        got_Z = 1;
        ++ts;
        if (*ts != 'f' && *ts != 'd' && *ts != 'g') {
          __Pyx_BufFmt_RaiseUnexpectedChar('Z');
          return NULL;
        }
        CYTHON_FALLTHROUGH;
      case '?': case 'c': case 'b': case 'B': case 'h': case 'H': case 'i': case 'I':
      case 'l': case 'L': case 'q': case 'Q':
      case 'f': case 'd': case 'g':
      case 'O': case 'p':
        if ((ctx->enc_type == *ts) && (got_Z == ctx->is_complex) &&
            (ctx->enc_packmode == ctx->new_packmode) && (!ctx->is_valid_array)) {
          ctx->enc_count += ctx->new_count;
          ctx->new_count = 1;
          got_Z = 0;
          ++ts;
          break;
        }
        CYTHON_FALLTHROUGH;
      case 's':
        if (__Pyx_BufFmt_ProcessTypeChunk(ctx) == -1) return NULL;
        ctx->enc_count = ctx->new_count;
        ctx->enc_packmode = ctx->new_packmode;
        ctx->enc_type = *ts;
        ctx->is_complex = got_Z;
        ++ts;
        ctx->new_count = 1;
        got_Z = 0;
        break;
      case ':':
        ++ts;
        while(*ts != ':') ++ts;
        ++ts;
        break;
      case '(':
        if (!__pyx_buffmt_parse_array(ctx, &ts)) return NULL;
        break;
      default:
        {
          int number = __Pyx_BufFmt_ExpectNumber(&ts);
          if (number == -1) return NULL;
          ctx->new_count = (size_t)number;
        }
    }
  }
}

/* TypeInfoCompare */
  static int
__pyx_typeinfo_cmp(__Pyx_TypeInfo *a, __Pyx_TypeInfo *b)
{
    int i;
    if (!a || !b)
        return 0;
    if (a == b)
        return 1;
    if (a->size != b->size || a->typegroup != b->typegroup ||
            a->is_unsigned != b->is_unsigned || a->ndim != b->ndim) {
        if (a->typegroup == 'H' || b->typegroup == 'H') {
            return a->size == b->size;
        } else {
            return 0;
        }
    }
    if (a->ndim) {
        for (i = 0; i < a->ndim; i++)
            if (a->arraysize[i] != b->arraysize[i])
                return 0;
    }
    if (a->typegroup == 'S') {
        if (a->flags != b->flags)
            return 0;
        if (a->fields || b->fields) {
            if (!(a->fields && b->fields))
                return 0;
            for (i = 0; a->fields[i].type && b->fields[i].type; i++) {
                __Pyx_StructField *field_a = a->fields + i;
                __Pyx_StructField *field_b = b->fields + i;
                if (field_a->offset != field_b->offset ||
                    !__pyx_typeinfo_cmp(field_a->type, field_b->type))
                    return 0;
            }
            return !a->fields[i].type && !b->fields[i].type;
        }
    }
    return 1;
}

/* MemviewSliceValidateAndInit */
  static int
__pyx_check_strides(Py_buffer *buf, int dim, int ndim, int spec)
{
    if (buf->shape[dim] <= 1)
        return 1;
    if (buf->strides) {
        if (spec & __Pyx_MEMVIEW_CONTIG) {
            if (spec & (__Pyx_MEMVIEW_PTR|__Pyx_MEMVIEW_FULL)) {
                if (unlikely(buf->strides[dim] != sizeof(void *))) {
                    PyErr_Format(PyExc_ValueError,
                                 "Buffer is not indirectly contiguous "
                                 "in dimension %d.", dim);
                    goto fail;
                }
            } else if (unlikely(buf->strides[dim] != buf->itemsize)) {
                PyErr_SetString(PyExc_ValueError,
                                "Buffer and memoryview are not contiguous "
                                "in the same dimension.");
                goto fail;
            }
        }
        if (spec & __Pyx_MEMVIEW_FOLLOW) {
            Py_ssize_t stride = buf->strides[dim];
            if (stride < 0)
                stride = -stride;
            if (unlikely(stride < buf->itemsize)) {
                PyErr_SetString(PyExc_ValueError,
                                "Buffer and memoryview are not contiguous "
                                "in the same dimension.");
                goto fail;
            }
        }
    } else {
        if (unlikely(spec & __Pyx_MEMVIEW_CONTIG && dim != ndim - 1)) {
            PyErr_Format(PyExc_ValueError,
                         "C-contiguous buffer is not contiguous in "
                         "dimension %d", dim);
            goto fail;
        } else if (unlikely(spec & (__Pyx_MEMVIEW_PTR))) {
            PyErr_Format(PyExc_ValueError,
                         "C-contiguous buffer is not indirect in "
                         "dimension %d", dim);
            goto fail;
        } else if (unlikely(buf->suboffsets)) {
            PyErr_SetString(PyExc_ValueError,
                            "Buffer exposes suboffsets but no strides");
            goto fail;
        }
    }
    return 1;
fail:
    return 0;
}
static int
__pyx_check_suboffsets(Py_buffer *buf, int dim, CYTHON_UNUSED int ndim, int spec)
{
    if (spec & __Pyx_MEMVIEW_DIRECT) {
        if (unlikely(buf->suboffsets && buf->suboffsets[dim] >= 0)) {
            PyErr_Format(PyExc_ValueError,
                         "Buffer not compatible with direct access "
                         "in dimension %d.", dim);
            goto fail;
        }
    }
    if (spec & __Pyx_MEMVIEW_PTR) {
        if (unlikely(!buf->suboffsets || (buf->suboffsets[dim] < 0))) {
            PyErr_Format(PyExc_ValueError,
                         "Buffer is not indirectly accessible "
                         "in dimension %d.", dim);
            goto fail;
        }
    }
    return 1;
fail:
    return 0;
}
static int
__pyx_verify_contig(Py_buffer *buf, int ndim, int c_or_f_flag)
{
    int i;
    if (c_or_f_flag & __Pyx_IS_F_CONTIG) {
        Py_ssize_t stride = 1;
        for (i = 0; i < ndim; i++) {
            if (unlikely(stride * buf->itemsize != buf->strides[i]  &&  buf->shape[i] > 1)) {
                PyErr_SetString(PyExc_ValueError,
                    "Buffer not fortran contiguous.");
                goto fail;
            }
            stride = stride * buf->shape[i];
        }
    } else if (c_or_f_flag & __Pyx_IS_C_CONTIG) {
        Py_ssize_t stride = 1;
        for (i = ndim - 1; i >- 1; i--) {
            if (unlikely(stride * buf->itemsize != buf->strides[i]  &&  buf->shape[i] > 1)) {
                PyErr_SetString(PyExc_ValueError,
                    "Buffer not C contiguous.");
                goto fail;
            }
            stride = stride * buf->shape[i];
        }
    }
    return 1;
fail:
    return 0;
}
static int __Pyx_ValidateAndInit_memviewslice(
                int *axes_specs,
                int c_or_f_flag,
                int buf_flags,
                int ndim,
                __Pyx_TypeInfo *dtype,
                __Pyx_BufFmt_StackElem stack[],
                __Pyx_memviewslice *memviewslice,
                PyObject *original_obj)
{
    struct __pyx_memoryview_obj *memview, *new_memview;
    __Pyx_RefNannyDeclarations
    Py_buffer *buf;
    int i, spec = 0, retval = -1;
    __Pyx_BufFmt_Context ctx;
    int from_memoryview = __pyx_memoryview_check(original_obj);
    __Pyx_RefNannySetupContext("ValidateAndInit_memviewslice", 0);
    if (from_memoryview && __pyx_typeinfo_cmp(dtype, ((struct __pyx_memoryview_obj *)
                                                            original_obj)->typeinfo)) {
        memview = (struct __pyx_memoryview_obj *) original_obj;
        new_memview = NULL;
    } else {
        memview = (struct __pyx_memoryview_obj *) __pyx_memoryview_new(
                                            original_obj, buf_flags, 0, dtype);
        new_memview = memview;
        if (unlikely(!memview))
            goto fail;
    }
    buf = &memview->view;
    if (unlikely(buf->ndim != ndim)) {
        PyErr_Format(PyExc_ValueError,
                "Buffer has wrong number of dimensions (expected %d, got %d)",
                ndim, buf->ndim);
        goto fail;
    }
    if (new_memview) {
        __Pyx_BufFmt_Init(&ctx, stack, dtype);
        if (unlikely(!__Pyx_BufFmt_CheckString(&ctx, buf->format))) goto fail;
    }
    if (unlikely((unsigned) buf->itemsize != dtype->size)) {
        PyErr_Format(PyExc_ValueError,
                     "Item size of buffer (%" CYTHON_FORMAT_SSIZE_T "u byte%s) "
                     "does not match size of '%s' (%" CYTHON_FORMAT_SSIZE_T "u byte%s)",
                     buf->itemsize,
                     (buf->itemsize > 1) ? "s" : "",
                     dtype->name,
                     dtype->size,
                     (dtype->size > 1) ? "s" : "");
        goto fail;
    }
    if (buf->len > 0) {
        for (i = 0; i < ndim; i++) {
            spec = axes_specs[i];
            if (unlikely(!__pyx_check_strides(buf, i, ndim, spec)))
                goto fail;
            if (unlikely(!__pyx_check_suboffsets(buf, i, ndim, spec)))
                goto fail;
        }
        if (unlikely(buf->strides && !__pyx_verify_contig(buf, ndim, c_or_f_flag)))
            goto fail;
    }
    if (unlikely(__Pyx_init_memviewslice(memview, ndim, memviewslice,
                                         new_memview != NULL) == -1)) {
        goto fail;
    }
    retval = 0;
    goto no_fail;
fail:
    Py_XDECREF(new_memview);
    retval = -1;
no_fail:
    __Pyx_RefNannyFinishContext();
    return retval;
}

/* ObjectToMemviewSlice */
  static CYTHON_INLINE __Pyx_memviewslice __Pyx_PyObject_to_MemoryviewSlice_d_d_dc_int(PyObject *obj, int writable_flag) {
    __Pyx_memviewslice result = { 0, 0, { 0 }, { 0 }, { 0 } };
    __Pyx_BufFmt_StackElem stack[1];
    int axes_specs[] = { (__Pyx_MEMVIEW_DIRECT | __Pyx_MEMVIEW_FOLLOW), (__Pyx_MEMVIEW_DIRECT | __Pyx_MEMVIEW_FOLLOW), (__Pyx_MEMVIEW_DIRECT | __Pyx_MEMVIEW_CONTIG) };
    int retcode;
    if (obj == Py_None) {
        result.memview = (struct __pyx_memoryview_obj *) Py_None;
        return result;
    }
    retcode = __Pyx_ValidateAndInit_memviewslice(axes_specs, __Pyx_IS_C_CONTIG,
                                                 (PyBUF_C_CONTIGUOUS | PyBUF_FORMAT) | writable_flag, 3,
                                                 &__Pyx_TypeInfo_int, stack,
                                                 &result, obj);
    if (unlikely(retcode == -1))
        goto __pyx_fail;
    return result;
__pyx_fail:
    result.memview = NULL;
    result.data = NULL;
    return result;
}

/* ObjectToMemviewSlice */
  static CYTHON_INLINE __Pyx_memviewslice __Pyx_PyObject_to_MemoryviewSlice_d_d_dc_float(PyObject *obj, int writable_flag) {
    __Pyx_memviewslice result = { 0, 0, { 0 }, { 0 }, { 0 } };
    __Pyx_BufFmt_StackElem stack[1];
    int axes_specs[] = { (__Pyx_MEMVIEW_DIRECT | __Pyx_MEMVIEW_FOLLOW), (__Pyx_MEMVIEW_DIRECT | __Pyx_MEMVIEW_FOLLOW), (__Pyx_MEMVIEW_DIRECT | __Pyx_MEMVIEW_CONTIG) };
    int retcode;
    if (obj == Py_None) {
        result.memview = (struct __pyx_memoryview_obj *) Py_None;
        return result;
    }
    retcode = __Pyx_ValidateAndInit_memviewslice(axes_specs, __Pyx_IS_C_CONTIG,
                                                 (PyBUF_C_CONTIGUOUS | PyBUF_FORMAT) | writable_flag, 3,
                                                 &__Pyx_TypeInfo_float, stack,
                                                 &result, obj);
    if (unlikely(retcode == -1))
        goto __pyx_fail;
    return result;
__pyx_fail:
    result.memview = NULL;
    result.data = NULL;
    return result;
}

/* ObjectToMemviewSlice */
  static CYTHON_INLINE __Pyx_memviewslice __Pyx_PyObject_to_MemoryviewSlice_dc_int(PyObject *obj, int writable_flag) {
    __Pyx_memviewslice result = { 0, 0, { 0 }, { 0 }, { 0 } };
    __Pyx_BufFmt_StackElem stack[1];
    int axes_specs[] = { (__Pyx_MEMVIEW_DIRECT | __Pyx_MEMVIEW_CONTIG) };
    int retcode;
    if (obj == Py_None) {
        result.memview = (struct __pyx_memoryview_obj *) Py_None;
        return result;
    }
    retcode = __Pyx_ValidateAndInit_memviewslice(axes_specs, __Pyx_IS_C_CONTIG,
                                                 (PyBUF_C_CONTIGUOUS | PyBUF_FORMAT) | writable_flag, 1,
                                                 &__Pyx_TypeInfo_int, stack,
                                                 &result, obj);
    if (unlikely(retcode == -1))
        goto __pyx_fail;
    return result;
__pyx_fail:
    result.memview = NULL;
    result.data = NULL;
    return result;
}

/* CIntFromPyVerify */
  #define __PYX_VERIFY_RETURN_INT(target_type, func_type, func_value)\
    __PYX__VERIFY_RETURN_INT(target_type, func_type, func_value, 0)
#define __PYX_VERIFY_RETURN_INT_EXC(target_type, func_type, func_value)\
    __PYX__VERIFY_RETURN_INT(target_type, func_type, func_value, 1)
#define __PYX__VERIFY_RETURN_INT(target_type, func_type, func_value, exc)\
    {\
        func_type value = func_value;\
        if (sizeof(target_type) < sizeof(func_type)) {\
            if (unlikely(value != (func_type) (target_type) value)) {\
                func_type zero = 0;\
                if (exc && unlikely(value == (func_type)-1 && PyErr_Occurred()))\
                    return (target_type) -1;\
                if (is_unsigned && unlikely(value < zero))\
                    goto raise_neg_overflow;\
                else\
                    goto raise_overflow;\
            }\
        }\
        return (target_type) value;\
    }

/* Declarations */
  #if CYTHON_CCOMPLEX
  #ifdef __cplusplus
    static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) {
      return ::std::complex< float >(x, y);
    }
  #else
    static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) {
      return x + y*(__pyx_t_float_complex)_Complex_I;
    }
  #endif
#else
    static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_parts(float x, float y) {
      __pyx_t_float_complex z;
      z.real = x;
      z.imag = y;
      return z;
    }
#endif

/* Arithmetic */
  #if CYTHON_CCOMPLEX
#else
    static CYTHON_INLINE int __Pyx_c_eq_float(__pyx_t_float_complex a, __pyx_t_float_complex b) {
       return (a.real == b.real) && (a.imag == b.imag);
    }
    static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_sum_float(__pyx_t_float_complex a, __pyx_t_float_complex b) {
        __pyx_t_float_complex z;
        z.real = a.real + b.real;
        z.imag = a.imag + b.imag;
        return z;
    }
    static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_diff_float(__pyx_t_float_complex a, __pyx_t_float_complex b) {
        __pyx_t_float_complex z;
        z.real = a.real - b.real;
        z.imag = a.imag - b.imag;
        return z;
    }
    static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_prod_float(__pyx_t_float_complex a, __pyx_t_float_complex b) {
        __pyx_t_float_complex z;
        z.real = a.real * b.real - a.imag * b.imag;
        z.imag = a.real * b.imag + a.imag * b.real;
        return z;
    }
    #if 1
    static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quot_float(__pyx_t_float_complex a, __pyx_t_float_complex b) {
        if (b.imag == 0) {
            return __pyx_t_float_complex_from_parts(a.real / b.real, a.imag / b.real);
        } else if (fabsf(b.real) >= fabsf(b.imag)) {
            if (b.real == 0 && b.imag == 0) {
                return __pyx_t_float_complex_from_parts(a.real / b.real, a.imag / b.imag);
            } else {
                float r = b.imag / b.real;
                float s = (float)(1.0) / (b.real + b.imag * r);
                return __pyx_t_float_complex_from_parts(
                    (a.real + a.imag * r) * s, (a.imag - a.real * r) * s);
            }
        } else {
            float r = b.real / b.imag;
            float s = (float)(1.0) / (b.imag + b.real * r);
            return __pyx_t_float_complex_from_parts(
                (a.real * r + a.imag) * s, (a.imag * r - a.real) * s);
        }
    }
    #else
    static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quot_float(__pyx_t_float_complex a, __pyx_t_float_complex b) {
        if (b.imag == 0) {
            return __pyx_t_float_complex_from_parts(a.real / b.real, a.imag / b.real);
        } else {
            float denom = b.real * b.real + b.imag * b.imag;
            return __pyx_t_float_complex_from_parts(
                (a.real * b.real + a.imag * b.imag) / denom,
                (a.imag * b.real - a.real * b.imag) / denom);
        }
    }
    #endif
    static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_neg_float(__pyx_t_float_complex a) {
        __pyx_t_float_complex z;
        z.real = -a.real;
        z.imag = -a.imag;
        return z;
    }
    static CYTHON_INLINE int __Pyx_c_is_zero_float(__pyx_t_float_complex a) {
       return (a.real == 0) && (a.imag == 0);
    }
    static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_conj_float(__pyx_t_float_complex a) {
        __pyx_t_float_complex z;
        z.real =  a.real;
        z.imag = -a.imag;
        return z;
    }
    #if 1
        static CYTHON_INLINE float __Pyx_c_abs_float(__pyx_t_float_complex z) {
          #if !defined(HAVE_HYPOT) || defined(_MSC_VER)
            return sqrtf(z.real*z.real + z.imag*z.imag);
          #else
            return hypotf(z.real, z.imag);
          #endif
        }
        static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_pow_float(__pyx_t_float_complex a, __pyx_t_float_complex b) {
            __pyx_t_float_complex z;
            float r, lnr, theta, z_r, z_theta;
            if (b.imag == 0 && b.real == (int)b.real) {
                if (b.real < 0) {
                    float denom = a.real * a.real + a.imag * a.imag;
                    a.real = a.real / denom;
                    a.imag = -a.imag / denom;
                    b.real = -b.real;
                }
                switch ((int)b.real) {
                    case 0:
                        z.real = 1;
                        z.imag = 0;
                        return z;
                    case 1:
                        return a;
                    case 2:
                        return __Pyx_c_prod_float(a, a);
                    case 3:
                        z = __Pyx_c_prod_float(a, a);
                        return __Pyx_c_prod_float(z, a);
                    case 4:
                        z = __Pyx_c_prod_float(a, a);
                        return __Pyx_c_prod_float(z, z);
                }
            }
            if (a.imag == 0) {
                if (a.real == 0) {
                    return a;
                } else if (b.imag == 0) {
                    z.real = powf(a.real, b.real);
                    z.imag = 0;
                    return z;
                } else if (a.real > 0) {
                    r = a.real;
                    theta = 0;
                } else {
                    r = -a.real;
                    theta = atan2f(0.0, -1.0);
                }
            } else {
                r = __Pyx_c_abs_float(a);
                theta = atan2f(a.imag, a.real);
            }
            lnr = logf(r);
            z_r = expf(lnr * b.real - theta * b.imag);
            z_theta = theta * b.real + lnr * b.imag;
            z.real = z_r * cosf(z_theta);
            z.imag = z_r * sinf(z_theta);
            return z;
        }
    #endif
#endif

/* Declarations */
  #if CYTHON_CCOMPLEX
  #ifdef __cplusplus
    static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) {
      return ::std::complex< double >(x, y);
    }
  #else
    static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) {
      return x + y*(__pyx_t_double_complex)_Complex_I;
    }
  #endif
#else
    static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_parts(double x, double y) {
      __pyx_t_double_complex z;
      z.real = x;
      z.imag = y;
      return z;
    }
#endif

/* Arithmetic */
  #if CYTHON_CCOMPLEX
#else
    static CYTHON_INLINE int __Pyx_c_eq_double(__pyx_t_double_complex a, __pyx_t_double_complex b) {
       return (a.real == b.real) && (a.imag == b.imag);
    }
    static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_sum_double(__pyx_t_double_complex a, __pyx_t_double_complex b) {
        __pyx_t_double_complex z;
        z.real = a.real + b.real;
        z.imag = a.imag + b.imag;
        return z;
    }
    static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_diff_double(__pyx_t_double_complex a, __pyx_t_double_complex b) {
        __pyx_t_double_complex z;
        z.real = a.real - b.real;
        z.imag = a.imag - b.imag;
        return z;
    }
    static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_prod_double(__pyx_t_double_complex a, __pyx_t_double_complex b) {
        __pyx_t_double_complex z;
        z.real = a.real * b.real - a.imag * b.imag;
        z.imag = a.real * b.imag + a.imag * b.real;
        return z;
    }
    #if 1
    static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot_double(__pyx_t_double_complex a, __pyx_t_double_complex b) {
        if (b.imag == 0) {
            return __pyx_t_double_complex_from_parts(a.real / b.real, a.imag / b.real);
        } else if (fabs(b.real) >= fabs(b.imag)) {
            if (b.real == 0 && b.imag == 0) {
                return __pyx_t_double_complex_from_parts(a.real / b.real, a.imag / b.imag);
            } else {
                double r = b.imag / b.real;
                double s = (double)(1.0) / (b.real + b.imag * r);
                return __pyx_t_double_complex_from_parts(
                    (a.real + a.imag * r) * s, (a.imag - a.real * r) * s);
            }
        } else {
            double r = b.real / b.imag;
            double s = (double)(1.0) / (b.imag + b.real * r);
            return __pyx_t_double_complex_from_parts(
                (a.real * r + a.imag) * s, (a.imag * r - a.real) * s);
        }
    }
    #else
    static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot_double(__pyx_t_double_complex a, __pyx_t_double_complex b) {
        if (b.imag == 0) {
            return __pyx_t_double_complex_from_parts(a.real / b.real, a.imag / b.real);
        } else {
            double denom = b.real * b.real + b.imag * b.imag;
            return __pyx_t_double_complex_from_parts(
                (a.real * b.real + a.imag * b.imag) / denom,
                (a.imag * b.real - a.real * b.imag) / denom);
        }
    }
    #endif
    static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_neg_double(__pyx_t_double_complex a) {
        __pyx_t_double_complex z;
        z.real = -a.real;
        z.imag = -a.imag;
        return z;
    }
    static CYTHON_INLINE int __Pyx_c_is_zero_double(__pyx_t_double_complex a) {
       return (a.real == 0) && (a.imag == 0);
    }
    static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_conj_double(__pyx_t_double_complex a) {
        __pyx_t_double_complex z;
        z.real =  a.real;
        z.imag = -a.imag;
        return z;
    }
    #if 1
        static CYTHON_INLINE double __Pyx_c_abs_double(__pyx_t_double_complex z) {
          #if !defined(HAVE_HYPOT) || defined(_MSC_VER)
            return sqrt(z.real*z.real + z.imag*z.imag);
          #else
            return hypot(z.real, z.imag);
          #endif
        }
        static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_pow_double(__pyx_t_double_complex a, __pyx_t_double_complex b) {
            __pyx_t_double_complex z;
            double r, lnr, theta, z_r, z_theta;
            if (b.imag == 0 && b.real == (int)b.real) {
                if (b.real < 0) {
                    double denom = a.real * a.real + a.imag * a.imag;
                    a.real = a.real / denom;
                    a.imag = -a.imag / denom;
                    b.real = -b.real;
                }
                switch ((int)b.real) {
                    case 0:
                        z.real = 1;
                        z.imag = 0;
                        return z;
                    case 1:
                        return a;
                    case 2:
                        return __Pyx_c_prod_double(a, a);
                    case 3:
                        z = __Pyx_c_prod_double(a, a);
                        return __Pyx_c_prod_double(z, a);
                    case 4:
                        z = __Pyx_c_prod_double(a, a);
                        return __Pyx_c_prod_double(z, z);
                }
            }
            if (a.imag == 0) {
                if (a.real == 0) {
                    return a;
                } else if (b.imag == 0) {
                    z.real = pow(a.real, b.real);
                    z.imag = 0;
                    return z;
                } else if (a.real > 0) {
                    r = a.real;
                    theta = 0;
                } else {
                    r = -a.real;
                    theta = atan2(0.0, -1.0);
                }
            } else {
                r = __Pyx_c_abs_double(a);
                theta = atan2(a.imag, a.real);
            }
            lnr = log(r);
            z_r = exp(lnr * b.real - theta * b.imag);
            z_theta = theta * b.real + lnr * b.imag;
            z.real = z_r * cos(z_theta);
            z.imag = z_r * sin(z_theta);
            return z;
        }
    #endif
#endif

/* MemviewSliceCopyTemplate */
  static __Pyx_memviewslice
__pyx_memoryview_copy_new_contig(const __Pyx_memviewslice *from_mvs,
                                 const char *mode, int ndim,
                                 size_t sizeof_dtype, int contig_flag,
                                 int dtype_is_object)
{
    __Pyx_RefNannyDeclarations
    int i;
    __Pyx_memviewslice new_mvs = { 0, 0, { 0 }, { 0 }, { 0 } };
    struct __pyx_memoryview_obj *from_memview = from_mvs->memview;
    Py_buffer *buf = &from_memview->view;
    PyObject *shape_tuple = NULL;
    PyObject *temp_int = NULL;
    struct __pyx_array_obj *array_obj = NULL;
    struct __pyx_memoryview_obj *memview_obj = NULL;
    __Pyx_RefNannySetupContext("__pyx_memoryview_copy_new_contig", 0);
    for (i = 0; i < ndim; i++) {
        if (unlikely(from_mvs->suboffsets[i] >= 0)) {
            PyErr_Format(PyExc_ValueError, "Cannot copy memoryview slice with "
                                           "indirect dimensions (axis %d)", i);
            goto fail;
        }
    }
    shape_tuple = PyTuple_New(ndim);
    if (unlikely(!shape_tuple)) {
        goto fail;
    }
    __Pyx_GOTREF(shape_tuple);
    for(i = 0; i < ndim; i++) {
        temp_int = PyInt_FromSsize_t(from_mvs->shape[i]);
        if(unlikely(!temp_int)) {
            goto fail;
        } else {
            PyTuple_SET_ITEM(shape_tuple, i, temp_int);
            temp_int = NULL;
        }
    }
    array_obj = __pyx_array_new(shape_tuple, sizeof_dtype, buf->format, (char *) mode, NULL);
    if (unlikely(!array_obj)) {
        goto fail;
    }
    __Pyx_GOTREF(array_obj);
    memview_obj = (struct __pyx_memoryview_obj *) __pyx_memoryview_new(
                                    (PyObject *) array_obj, contig_flag,
                                    dtype_is_object,
                                    from_mvs->memview->typeinfo);
    if (unlikely(!memview_obj))
        goto fail;
    if (unlikely(__Pyx_init_memviewslice(memview_obj, ndim, &new_mvs, 1) < 0))
        goto fail;
    if (unlikely(__pyx_memoryview_copy_contents(*from_mvs, new_mvs, ndim, ndim,
                                                dtype_is_object) < 0))
        goto fail;
    goto no_fail;
fail:
    __Pyx_XDECREF(new_mvs.memview);
    new_mvs.memview = NULL;
    new_mvs.data = NULL;
no_fail:
    __Pyx_XDECREF(shape_tuple);
    __Pyx_XDECREF(temp_int);
    __Pyx_XDECREF(array_obj);
    __Pyx_RefNannyFinishContext();
    return new_mvs;
}

/* CIntToPy */
  static CYTHON_INLINE PyObject* __Pyx_PyInt_From_int(int value) {
#ifdef __Pyx_HAS_GCC_DIAGNOSTIC
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wconversion"
#endif
    const int neg_one = (int) -1, const_zero = (int) 0;
#ifdef __Pyx_HAS_GCC_DIAGNOSTIC
#pragma GCC diagnostic pop
#endif
    const int is_unsigned = neg_one > const_zero;
    if (is_unsigned) {
        if (sizeof(int) < sizeof(long)) {
            return PyInt_FromLong((long) value);
        } else if (sizeof(int) <= sizeof(unsigned long)) {
            return PyLong_FromUnsignedLong((unsigned long) value);
#ifdef HAVE_LONG_LONG
        } else if (sizeof(int) <= sizeof(unsigned PY_LONG_LONG)) {
            return PyLong_FromUnsignedLongLong((unsigned PY_LONG_LONG) value);
#endif
        }
    } else {
        if (sizeof(int) <= sizeof(long)) {
            return PyInt_FromLong((long) value);
#ifdef HAVE_LONG_LONG
        } else if (sizeof(int) <= sizeof(PY_LONG_LONG)) {
            return PyLong_FromLongLong((PY_LONG_LONG) value);
#endif
        }
    }
    {
        int one = 1; int little = (int)*(unsigned char *)&one;
        unsigned char *bytes = (unsigned char *)&value;
        return _PyLong_FromByteArray(bytes, sizeof(int),
                                     little, !is_unsigned);
    }
}

/* CIntFromPy */
  static CYTHON_INLINE int __Pyx_PyInt_As_int(PyObject *x) {
#ifdef __Pyx_HAS_GCC_DIAGNOSTIC
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wconversion"
#endif
    const int neg_one = (int) -1, const_zero = (int) 0;
#ifdef __Pyx_HAS_GCC_DIAGNOSTIC
#pragma GCC diagnostic pop
#endif
    const int is_unsigned = neg_one > const_zero;
#if PY_MAJOR_VERSION < 3
    if (likely(PyInt_Check(x))) {
        if (sizeof(int) < sizeof(long)) {
            __PYX_VERIFY_RETURN_INT(int, long, PyInt_AS_LONG(x))
        } else {
            long val = PyInt_AS_LONG(x);
            if (is_unsigned && unlikely(val < 0)) {
                goto raise_neg_overflow;
            }
            return (int) val;
        }
    } else
#endif
    if (likely(PyLong_Check(x))) {
        if (is_unsigned) {
#if CYTHON_USE_PYLONG_INTERNALS
            const digit* digits = ((PyLongObject*)x)->ob_digit;
            switch (Py_SIZE(x)) {
                case  0: return (int) 0;
                case  1: __PYX_VERIFY_RETURN_INT(int, digit, digits[0])
                case 2:
                    if (8 * sizeof(int) > 1 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 2 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(int, unsigned long, (((((unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(int) >= 2 * PyLong_SHIFT) {
                            return (int) (((((int)digits[1]) << PyLong_SHIFT) | (int)digits[0]));
                        }
                    }
                    break;
                case 3:
                    if (8 * sizeof(int) > 2 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 3 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(int, unsigned long, (((((((unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(int) >= 3 * PyLong_SHIFT) {
                            return (int) (((((((int)digits[2]) << PyLong_SHIFT) | (int)digits[1]) << PyLong_SHIFT) | (int)digits[0]));
                        }
                    }
                    break;
                case 4:
                    if (8 * sizeof(int) > 3 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 4 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(int, unsigned long, (((((((((unsigned long)digits[3]) << PyLong_SHIFT) | (unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(int) >= 4 * PyLong_SHIFT) {
                            return (int) (((((((((int)digits[3]) << PyLong_SHIFT) | (int)digits[2]) << PyLong_SHIFT) | (int)digits[1]) << PyLong_SHIFT) | (int)digits[0]));
                        }
                    }
                    break;
            }
#endif
#if CYTHON_COMPILING_IN_CPYTHON
            if (unlikely(Py_SIZE(x) < 0)) {
                goto raise_neg_overflow;
            }
#else
            {
                int result = PyObject_RichCompareBool(x, Py_False, Py_LT);
                if (unlikely(result < 0))
                    return (int) -1;
                if (unlikely(result == 1))
                    goto raise_neg_overflow;
            }
#endif
            if (sizeof(int) <= sizeof(unsigned long)) {
                __PYX_VERIFY_RETURN_INT_EXC(int, unsigned long, PyLong_AsUnsignedLong(x))
#ifdef HAVE_LONG_LONG
            } else if (sizeof(int) <= sizeof(unsigned PY_LONG_LONG)) {
                __PYX_VERIFY_RETURN_INT_EXC(int, unsigned PY_LONG_LONG, PyLong_AsUnsignedLongLong(x))
#endif
            }
        } else {
#if CYTHON_USE_PYLONG_INTERNALS
            const digit* digits = ((PyLongObject*)x)->ob_digit;
            switch (Py_SIZE(x)) {
                case  0: return (int) 0;
                case -1: __PYX_VERIFY_RETURN_INT(int, sdigit, (sdigit) (-(sdigit)digits[0]))
                case  1: __PYX_VERIFY_RETURN_INT(int,  digit, +digits[0])
                case -2:
                    if (8 * sizeof(int) - 1 > 1 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 2 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(int, long, -(long) (((((unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(int) - 1 > 2 * PyLong_SHIFT) {
                            return (int) (((int)-1)*(((((int)digits[1]) << PyLong_SHIFT) | (int)digits[0])));
                        }
                    }
                    break;
                case 2:
                    if (8 * sizeof(int) > 1 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 2 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(int, unsigned long, (((((unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(int) - 1 > 2 * PyLong_SHIFT) {
                            return (int) ((((((int)digits[1]) << PyLong_SHIFT) | (int)digits[0])));
                        }
                    }
                    break;
                case -3:
                    if (8 * sizeof(int) - 1 > 2 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 3 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(int, long, -(long) (((((((unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(int) - 1 > 3 * PyLong_SHIFT) {
                            return (int) (((int)-1)*(((((((int)digits[2]) << PyLong_SHIFT) | (int)digits[1]) << PyLong_SHIFT) | (int)digits[0])));
                        }
                    }
                    break;
                case 3:
                    if (8 * sizeof(int) > 2 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 3 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(int, unsigned long, (((((((unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(int) - 1 > 3 * PyLong_SHIFT) {
                            return (int) ((((((((int)digits[2]) << PyLong_SHIFT) | (int)digits[1]) << PyLong_SHIFT) | (int)digits[0])));
                        }
                    }
                    break;
                case -4:
                    if (8 * sizeof(int) - 1 > 3 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 4 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(int, long, -(long) (((((((((unsigned long)digits[3]) << PyLong_SHIFT) | (unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(int) - 1 > 4 * PyLong_SHIFT) {
                            return (int) (((int)-1)*(((((((((int)digits[3]) << PyLong_SHIFT) | (int)digits[2]) << PyLong_SHIFT) | (int)digits[1]) << PyLong_SHIFT) | (int)digits[0])));
                        }
                    }
                    break;
                case 4:
                    if (8 * sizeof(int) > 3 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 4 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(int, unsigned long, (((((((((unsigned long)digits[3]) << PyLong_SHIFT) | (unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(int) - 1 > 4 * PyLong_SHIFT) {
                            return (int) ((((((((((int)digits[3]) << PyLong_SHIFT) | (int)digits[2]) << PyLong_SHIFT) | (int)digits[1]) << PyLong_SHIFT) | (int)digits[0])));
                        }
                    }
                    break;
            }
#endif
            if (sizeof(int) <= sizeof(long)) {
                __PYX_VERIFY_RETURN_INT_EXC(int, long, PyLong_AsLong(x))
#ifdef HAVE_LONG_LONG
            } else if (sizeof(int) <= sizeof(PY_LONG_LONG)) {
                __PYX_VERIFY_RETURN_INT_EXC(int, PY_LONG_LONG, PyLong_AsLongLong(x))
#endif
            }
        }
        {
#if CYTHON_COMPILING_IN_PYPY && !defined(_PyLong_AsByteArray)
            PyErr_SetString(PyExc_RuntimeError,
                            "_PyLong_AsByteArray() not available in PyPy, cannot convert large numbers");
#else
            int val;
            PyObject *v = __Pyx_PyNumber_IntOrLong(x);
 #if PY_MAJOR_VERSION < 3
            if (likely(v) && !PyLong_Check(v)) {
                PyObject *tmp = v;
                v = PyNumber_Long(tmp);
                Py_DECREF(tmp);
            }
 #endif
            if (likely(v)) {
                int one = 1; int is_little = (int)*(unsigned char *)&one;
                unsigned char *bytes = (unsigned char *)&val;
                int ret = _PyLong_AsByteArray((PyLongObject *)v,
                                              bytes, sizeof(val),
                                              is_little, !is_unsigned);
                Py_DECREF(v);
                if (likely(!ret))
                    return val;
            }
#endif
            return (int) -1;
        }
    } else {
        int val;
        PyObject *tmp = __Pyx_PyNumber_IntOrLong(x);
        if (!tmp) return (int) -1;
        val = __Pyx_PyInt_As_int(tmp);
        Py_DECREF(tmp);
        return val;
    }
raise_overflow:
    PyErr_SetString(PyExc_OverflowError,
        "value too large to convert to int");
    return (int) -1;
raise_neg_overflow:
    PyErr_SetString(PyExc_OverflowError,
        "can't convert negative value to int");
    return (int) -1;
}

/* CIntToPy */
  static CYTHON_INLINE PyObject* __Pyx_PyInt_From_long(long value) {
#ifdef __Pyx_HAS_GCC_DIAGNOSTIC
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wconversion"
#endif
    const long neg_one = (long) -1, const_zero = (long) 0;
#ifdef __Pyx_HAS_GCC_DIAGNOSTIC
#pragma GCC diagnostic pop
#endif
    const int is_unsigned = neg_one > const_zero;
    if (is_unsigned) {
        if (sizeof(long) < sizeof(long)) {
            return PyInt_FromLong((long) value);
        } else if (sizeof(long) <= sizeof(unsigned long)) {
            return PyLong_FromUnsignedLong((unsigned long) value);
#ifdef HAVE_LONG_LONG
        } else if (sizeof(long) <= sizeof(unsigned PY_LONG_LONG)) {
            return PyLong_FromUnsignedLongLong((unsigned PY_LONG_LONG) value);
#endif
        }
    } else {
        if (sizeof(long) <= sizeof(long)) {
            return PyInt_FromLong((long) value);
#ifdef HAVE_LONG_LONG
        } else if (sizeof(long) <= sizeof(PY_LONG_LONG)) {
            return PyLong_FromLongLong((PY_LONG_LONG) value);
#endif
        }
    }
    {
        int one = 1; int little = (int)*(unsigned char *)&one;
        unsigned char *bytes = (unsigned char *)&value;
        return _PyLong_FromByteArray(bytes, sizeof(long),
                                     little, !is_unsigned);
    }
}

/* CIntFromPy */
  static CYTHON_INLINE long __Pyx_PyInt_As_long(PyObject *x) {
#ifdef __Pyx_HAS_GCC_DIAGNOSTIC
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wconversion"
#endif
    const long neg_one = (long) -1, const_zero = (long) 0;
#ifdef __Pyx_HAS_GCC_DIAGNOSTIC
#pragma GCC diagnostic pop
#endif
    const int is_unsigned = neg_one > const_zero;
#if PY_MAJOR_VERSION < 3
    if (likely(PyInt_Check(x))) {
        if (sizeof(long) < sizeof(long)) {
            __PYX_VERIFY_RETURN_INT(long, long, PyInt_AS_LONG(x))
        } else {
            long val = PyInt_AS_LONG(x);
            if (is_unsigned && unlikely(val < 0)) {
                goto raise_neg_overflow;
            }
            return (long) val;
        }
    } else
#endif
    if (likely(PyLong_Check(x))) {
        if (is_unsigned) {
#if CYTHON_USE_PYLONG_INTERNALS
            const digit* digits = ((PyLongObject*)x)->ob_digit;
            switch (Py_SIZE(x)) {
                case  0: return (long) 0;
                case  1: __PYX_VERIFY_RETURN_INT(long, digit, digits[0])
                case 2:
                    if (8 * sizeof(long) > 1 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 2 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(long, unsigned long, (((((unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(long) >= 2 * PyLong_SHIFT) {
                            return (long) (((((long)digits[1]) << PyLong_SHIFT) | (long)digits[0]));
                        }
                    }
                    break;
                case 3:
                    if (8 * sizeof(long) > 2 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 3 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(long, unsigned long, (((((((unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(long) >= 3 * PyLong_SHIFT) {
                            return (long) (((((((long)digits[2]) << PyLong_SHIFT) | (long)digits[1]) << PyLong_SHIFT) | (long)digits[0]));
                        }
                    }
                    break;
                case 4:
                    if (8 * sizeof(long) > 3 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 4 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(long, unsigned long, (((((((((unsigned long)digits[3]) << PyLong_SHIFT) | (unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(long) >= 4 * PyLong_SHIFT) {
                            return (long) (((((((((long)digits[3]) << PyLong_SHIFT) | (long)digits[2]) << PyLong_SHIFT) | (long)digits[1]) << PyLong_SHIFT) | (long)digits[0]));
                        }
                    }
                    break;
            }
#endif
#if CYTHON_COMPILING_IN_CPYTHON
            if (unlikely(Py_SIZE(x) < 0)) {
                goto raise_neg_overflow;
            }
#else
            {
                int result = PyObject_RichCompareBool(x, Py_False, Py_LT);
                if (unlikely(result < 0))
                    return (long) -1;
                if (unlikely(result == 1))
                    goto raise_neg_overflow;
            }
#endif
            if (sizeof(long) <= sizeof(unsigned long)) {
                __PYX_VERIFY_RETURN_INT_EXC(long, unsigned long, PyLong_AsUnsignedLong(x))
#ifdef HAVE_LONG_LONG
            } else if (sizeof(long) <= sizeof(unsigned PY_LONG_LONG)) {
                __PYX_VERIFY_RETURN_INT_EXC(long, unsigned PY_LONG_LONG, PyLong_AsUnsignedLongLong(x))
#endif
            }
        } else {
#if CYTHON_USE_PYLONG_INTERNALS
            const digit* digits = ((PyLongObject*)x)->ob_digit;
            switch (Py_SIZE(x)) {
                case  0: return (long) 0;
                case -1: __PYX_VERIFY_RETURN_INT(long, sdigit, (sdigit) (-(sdigit)digits[0]))
                case  1: __PYX_VERIFY_RETURN_INT(long,  digit, +digits[0])
                case -2:
                    if (8 * sizeof(long) - 1 > 1 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 2 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(long, long, -(long) (((((unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(long) - 1 > 2 * PyLong_SHIFT) {
                            return (long) (((long)-1)*(((((long)digits[1]) << PyLong_SHIFT) | (long)digits[0])));
                        }
                    }
                    break;
                case 2:
                    if (8 * sizeof(long) > 1 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 2 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(long, unsigned long, (((((unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(long) - 1 > 2 * PyLong_SHIFT) {
                            return (long) ((((((long)digits[1]) << PyLong_SHIFT) | (long)digits[0])));
                        }
                    }
                    break;
                case -3:
                    if (8 * sizeof(long) - 1 > 2 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 3 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(long, long, -(long) (((((((unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(long) - 1 > 3 * PyLong_SHIFT) {
                            return (long) (((long)-1)*(((((((long)digits[2]) << PyLong_SHIFT) | (long)digits[1]) << PyLong_SHIFT) | (long)digits[0])));
                        }
                    }
                    break;
                case 3:
                    if (8 * sizeof(long) > 2 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 3 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(long, unsigned long, (((((((unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(long) - 1 > 3 * PyLong_SHIFT) {
                            return (long) ((((((((long)digits[2]) << PyLong_SHIFT) | (long)digits[1]) << PyLong_SHIFT) | (long)digits[0])));
                        }
                    }
                    break;
                case -4:
                    if (8 * sizeof(long) - 1 > 3 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 4 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(long, long, -(long) (((((((((unsigned long)digits[3]) << PyLong_SHIFT) | (unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(long) - 1 > 4 * PyLong_SHIFT) {
                            return (long) (((long)-1)*(((((((((long)digits[3]) << PyLong_SHIFT) | (long)digits[2]) << PyLong_SHIFT) | (long)digits[1]) << PyLong_SHIFT) | (long)digits[0])));
                        }
                    }
                    break;
                case 4:
                    if (8 * sizeof(long) > 3 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 4 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(long, unsigned long, (((((((((unsigned long)digits[3]) << PyLong_SHIFT) | (unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(long) - 1 > 4 * PyLong_SHIFT) {
                            return (long) ((((((((((long)digits[3]) << PyLong_SHIFT) | (long)digits[2]) << PyLong_SHIFT) | (long)digits[1]) << PyLong_SHIFT) | (long)digits[0])));
                        }
                    }
                    break;
            }
#endif
            if (sizeof(long) <= sizeof(long)) {
                __PYX_VERIFY_RETURN_INT_EXC(long, long, PyLong_AsLong(x))
#ifdef HAVE_LONG_LONG
            } else if (sizeof(long) <= sizeof(PY_LONG_LONG)) {
                __PYX_VERIFY_RETURN_INT_EXC(long, PY_LONG_LONG, PyLong_AsLongLong(x))
#endif
            }
        }
        {
#if CYTHON_COMPILING_IN_PYPY && !defined(_PyLong_AsByteArray)
            PyErr_SetString(PyExc_RuntimeError,
                            "_PyLong_AsByteArray() not available in PyPy, cannot convert large numbers");
#else
            long val;
            PyObject *v = __Pyx_PyNumber_IntOrLong(x);
 #if PY_MAJOR_VERSION < 3
            if (likely(v) && !PyLong_Check(v)) {
                PyObject *tmp = v;
                v = PyNumber_Long(tmp);
                Py_DECREF(tmp);
            }
 #endif
            if (likely(v)) {
                int one = 1; int is_little = (int)*(unsigned char *)&one;
                unsigned char *bytes = (unsigned char *)&val;
                int ret = _PyLong_AsByteArray((PyLongObject *)v,
                                              bytes, sizeof(val),
                                              is_little, !is_unsigned);
                Py_DECREF(v);
                if (likely(!ret))
                    return val;
            }
#endif
            return (long) -1;
        }
    } else {
        long val;
        PyObject *tmp = __Pyx_PyNumber_IntOrLong(x);
        if (!tmp) return (long) -1;
        val = __Pyx_PyInt_As_long(tmp);
        Py_DECREF(tmp);
        return val;
    }
raise_overflow:
    PyErr_SetString(PyExc_OverflowError,
        "value too large to convert to long");
    return (long) -1;
raise_neg_overflow:
    PyErr_SetString(PyExc_OverflowError,
        "can't convert negative value to long");
    return (long) -1;
}

/* CIntFromPy */
  static CYTHON_INLINE char __Pyx_PyInt_As_char(PyObject *x) {
#ifdef __Pyx_HAS_GCC_DIAGNOSTIC
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wconversion"
#endif
    const char neg_one = (char) -1, const_zero = (char) 0;
#ifdef __Pyx_HAS_GCC_DIAGNOSTIC
#pragma GCC diagnostic pop
#endif
    const int is_unsigned = neg_one > const_zero;
#if PY_MAJOR_VERSION < 3
    if (likely(PyInt_Check(x))) {
        if (sizeof(char) < sizeof(long)) {
            __PYX_VERIFY_RETURN_INT(char, long, PyInt_AS_LONG(x))
        } else {
            long val = PyInt_AS_LONG(x);
            if (is_unsigned && unlikely(val < 0)) {
                goto raise_neg_overflow;
            }
            return (char) val;
        }
    } else
#endif
    if (likely(PyLong_Check(x))) {
        if (is_unsigned) {
#if CYTHON_USE_PYLONG_INTERNALS
            const digit* digits = ((PyLongObject*)x)->ob_digit;
            switch (Py_SIZE(x)) {
                case  0: return (char) 0;
                case  1: __PYX_VERIFY_RETURN_INT(char, digit, digits[0])
                case 2:
                    if (8 * sizeof(char) > 1 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 2 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(char, unsigned long, (((((unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(char) >= 2 * PyLong_SHIFT) {
                            return (char) (((((char)digits[1]) << PyLong_SHIFT) | (char)digits[0]));
                        }
                    }
                    break;
                case 3:
                    if (8 * sizeof(char) > 2 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 3 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(char, unsigned long, (((((((unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(char) >= 3 * PyLong_SHIFT) {
                            return (char) (((((((char)digits[2]) << PyLong_SHIFT) | (char)digits[1]) << PyLong_SHIFT) | (char)digits[0]));
                        }
                    }
                    break;
                case 4:
                    if (8 * sizeof(char) > 3 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 4 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(char, unsigned long, (((((((((unsigned long)digits[3]) << PyLong_SHIFT) | (unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(char) >= 4 * PyLong_SHIFT) {
                            return (char) (((((((((char)digits[3]) << PyLong_SHIFT) | (char)digits[2]) << PyLong_SHIFT) | (char)digits[1]) << PyLong_SHIFT) | (char)digits[0]));
                        }
                    }
                    break;
            }
#endif
#if CYTHON_COMPILING_IN_CPYTHON
            if (unlikely(Py_SIZE(x) < 0)) {
                goto raise_neg_overflow;
            }
#else
            {
                int result = PyObject_RichCompareBool(x, Py_False, Py_LT);
                if (unlikely(result < 0))
                    return (char) -1;
                if (unlikely(result == 1))
                    goto raise_neg_overflow;
            }
#endif
            if (sizeof(char) <= sizeof(unsigned long)) {
                __PYX_VERIFY_RETURN_INT_EXC(char, unsigned long, PyLong_AsUnsignedLong(x))
#ifdef HAVE_LONG_LONG
            } else if (sizeof(char) <= sizeof(unsigned PY_LONG_LONG)) {
                __PYX_VERIFY_RETURN_INT_EXC(char, unsigned PY_LONG_LONG, PyLong_AsUnsignedLongLong(x))
#endif
            }
        } else {
#if CYTHON_USE_PYLONG_INTERNALS
            const digit* digits = ((PyLongObject*)x)->ob_digit;
            switch (Py_SIZE(x)) {
                case  0: return (char) 0;
                case -1: __PYX_VERIFY_RETURN_INT(char, sdigit, (sdigit) (-(sdigit)digits[0]))
                case  1: __PYX_VERIFY_RETURN_INT(char,  digit, +digits[0])
                case -2:
                    if (8 * sizeof(char) - 1 > 1 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 2 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(char, long, -(long) (((((unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(char) - 1 > 2 * PyLong_SHIFT) {
                            return (char) (((char)-1)*(((((char)digits[1]) << PyLong_SHIFT) | (char)digits[0])));
                        }
                    }
                    break;
                case 2:
                    if (8 * sizeof(char) > 1 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 2 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(char, unsigned long, (((((unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(char) - 1 > 2 * PyLong_SHIFT) {
                            return (char) ((((((char)digits[1]) << PyLong_SHIFT) | (char)digits[0])));
                        }
                    }
                    break;
                case -3:
                    if (8 * sizeof(char) - 1 > 2 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 3 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(char, long, -(long) (((((((unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(char) - 1 > 3 * PyLong_SHIFT) {
                            return (char) (((char)-1)*(((((((char)digits[2]) << PyLong_SHIFT) | (char)digits[1]) << PyLong_SHIFT) | (char)digits[0])));
                        }
                    }
                    break;
                case 3:
                    if (8 * sizeof(char) > 2 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 3 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(char, unsigned long, (((((((unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(char) - 1 > 3 * PyLong_SHIFT) {
                            return (char) ((((((((char)digits[2]) << PyLong_SHIFT) | (char)digits[1]) << PyLong_SHIFT) | (char)digits[0])));
                        }
                    }
                    break;
                case -4:
                    if (8 * sizeof(char) - 1 > 3 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 4 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(char, long, -(long) (((((((((unsigned long)digits[3]) << PyLong_SHIFT) | (unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(char) - 1 > 4 * PyLong_SHIFT) {
                            return (char) (((char)-1)*(((((((((char)digits[3]) << PyLong_SHIFT) | (char)digits[2]) << PyLong_SHIFT) | (char)digits[1]) << PyLong_SHIFT) | (char)digits[0])));
                        }
                    }
                    break;
                case 4:
                    if (8 * sizeof(char) > 3 * PyLong_SHIFT) {
                        if (8 * sizeof(unsigned long) > 4 * PyLong_SHIFT) {
                            __PYX_VERIFY_RETURN_INT(char, unsigned long, (((((((((unsigned long)digits[3]) << PyLong_SHIFT) | (unsigned long)digits[2]) << PyLong_SHIFT) | (unsigned long)digits[1]) << PyLong_SHIFT) | (unsigned long)digits[0])))
                        } else if (8 * sizeof(char) - 1 > 4 * PyLong_SHIFT) {
                            return (char) ((((((((((char)digits[3]) << PyLong_SHIFT) | (char)digits[2]) << PyLong_SHIFT) | (char)digits[1]) << PyLong_SHIFT) | (char)digits[0])));
                        }
                    }
                    break;
            }
#endif
            if (sizeof(char) <= sizeof(long)) {
                __PYX_VERIFY_RETURN_INT_EXC(char, long, PyLong_AsLong(x))
#ifdef HAVE_LONG_LONG
            } else if (sizeof(char) <= sizeof(PY_LONG_LONG)) {
                __PYX_VERIFY_RETURN_INT_EXC(char, PY_LONG_LONG, PyLong_AsLongLong(x))
#endif
            }
        }
        {
#if CYTHON_COMPILING_IN_PYPY && !defined(_PyLong_AsByteArray)
            PyErr_SetString(PyExc_RuntimeError,
                            "_PyLong_AsByteArray() not available in PyPy, cannot convert large numbers");
#else
            char val;
            PyObject *v = __Pyx_PyNumber_IntOrLong(x);
 #if PY_MAJOR_VERSION < 3
            if (likely(v) && !PyLong_Check(v)) {
                PyObject *tmp = v;
                v = PyNumber_Long(tmp);
                Py_DECREF(tmp);
            }
 #endif
            if (likely(v)) {
                int one = 1; int is_little = (int)*(unsigned char *)&one;
                unsigned char *bytes = (unsigned char *)&val;
                int ret = _PyLong_AsByteArray((PyLongObject *)v,
                                              bytes, sizeof(val),
                                              is_little, !is_unsigned);
                Py_DECREF(v);
                if (likely(!ret))
                    return val;
            }
#endif
            return (char) -1;
        }
    } else {
        char val;
        PyObject *tmp = __Pyx_PyNumber_IntOrLong(x);
        if (!tmp) return (char) -1;
        val = __Pyx_PyInt_As_char(tmp);
        Py_DECREF(tmp);
        return val;
    }
raise_overflow:
    PyErr_SetString(PyExc_OverflowError,
        "value too large to convert to char");
    return (char) -1;
raise_neg_overflow:
    PyErr_SetString(PyExc_OverflowError,
        "can't convert negative value to char");
    return (char) -1;
}

/* CheckBinaryVersion */
  static int __Pyx_check_binary_version(void) {
    char ctversion[4], rtversion[4];
    PyOS_snprintf(ctversion, 4, "%d.%d", PY_MAJOR_VERSION, PY_MINOR_VERSION);
    PyOS_snprintf(rtversion, 4, "%s", Py_GetVersion());
    if (ctversion[0] != rtversion[0] || ctversion[2] != rtversion[2]) {
        char message[200];
        PyOS_snprintf(message, sizeof(message),
                      "compiletime version %s of module '%.100s' "
                      "does not match runtime version %s",
                      ctversion, __Pyx_MODULE_NAME, rtversion);
        return PyErr_WarnEx(NULL, message, 1);
    }
    return 0;
}

/* InitStrings */
  static int __Pyx_InitStrings(__Pyx_StringTabEntry *t) {
    while (t->p) {
        #if PY_MAJOR_VERSION < 3
        if (t->is_unicode) {
            *t->p = PyUnicode_DecodeUTF8(t->s, t->n - 1, NULL);
        } else if (t->intern) {
            *t->p = PyString_InternFromString(t->s);
        } else {
            *t->p = PyString_FromStringAndSize(t->s, t->n - 1);
        }
        #else
        if (t->is_unicode | t->is_str) {
            if (t->intern) {
                *t->p = PyUnicode_InternFromString(t->s);
            } else if (t->encoding) {
                *t->p = PyUnicode_Decode(t->s, t->n - 1, t->encoding, NULL);
            } else {
                *t->p = PyUnicode_FromStringAndSize(t->s, t->n - 1);
            }
        } else {
            *t->p = PyBytes_FromStringAndSize(t->s, t->n - 1);
        }
        #endif
        if (!*t->p)
            return -1;
        if (PyObject_Hash(*t->p) == -1)
            return -1;
        ++t;
    }
    return 0;
}

static CYTHON_INLINE PyObject* __Pyx_PyUnicode_FromString(const char* c_str) {
    return __Pyx_PyUnicode_FromStringAndSize(c_str, (Py_ssize_t)strlen(c_str));
}
static CYTHON_INLINE const char* __Pyx_PyObject_AsString(PyObject* o) {
    Py_ssize_t ignore;
    return __Pyx_PyObject_AsStringAndSize(o, &ignore);
}
#if __PYX_DEFAULT_STRING_ENCODING_IS_ASCII || __PYX_DEFAULT_STRING_ENCODING_IS_DEFAULT
#if !CYTHON_PEP393_ENABLED
static const char* __Pyx_PyUnicode_AsStringAndSize(PyObject* o, Py_ssize_t *length) {
    char* defenc_c;
    PyObject* defenc = _PyUnicode_AsDefaultEncodedString(o, NULL);
    if (!defenc) return NULL;
    defenc_c = PyBytes_AS_STRING(defenc);
#if __PYX_DEFAULT_STRING_ENCODING_IS_ASCII
    {
        char* end = defenc_c + PyBytes_GET_SIZE(defenc);
        char* c;
        for (c = defenc_c; c < end; c++) {
            if ((unsigned char) (*c) >= 128) {
                PyUnicode_AsASCIIString(o);
                return NULL;
            }
        }
    }
#endif
    *length = PyBytes_GET_SIZE(defenc);
    return defenc_c;
}
#else
static CYTHON_INLINE const char* __Pyx_PyUnicode_AsStringAndSize(PyObject* o, Py_ssize_t *length) {
    if (unlikely(__Pyx_PyUnicode_READY(o) == -1)) return NULL;
#if __PYX_DEFAULT_STRING_ENCODING_IS_ASCII
    if (likely(PyUnicode_IS_ASCII(o))) {
        *length = PyUnicode_GET_LENGTH(o);
        return PyUnicode_AsUTF8(o);
    } else {
        PyUnicode_AsASCIIString(o);
        return NULL;
    }
#else
    return PyUnicode_AsUTF8AndSize(o, length);
#endif
}
#endif
#endif
static CYTHON_INLINE const char* __Pyx_PyObject_AsStringAndSize(PyObject* o, Py_ssize_t *length) {
#if __PYX_DEFAULT_STRING_ENCODING_IS_ASCII || __PYX_DEFAULT_STRING_ENCODING_IS_DEFAULT
    if (
#if PY_MAJOR_VERSION < 3 && __PYX_DEFAULT_STRING_ENCODING_IS_ASCII
            __Pyx_sys_getdefaultencoding_not_ascii &&
#endif
            PyUnicode_Check(o)) {
        return __Pyx_PyUnicode_AsStringAndSize(o, length);
    } else
#endif
#if (!CYTHON_COMPILING_IN_PYPY) || (defined(PyByteArray_AS_STRING) && defined(PyByteArray_GET_SIZE))
    if (PyByteArray_Check(o)) {
        *length = PyByteArray_GET_SIZE(o);
        return PyByteArray_AS_STRING(o);
    } else
#endif
    {
        char* result;
        int r = PyBytes_AsStringAndSize(o, &result, length);
        if (unlikely(r < 0)) {
            return NULL;
        } else {
            return result;
        }
    }
}
static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject* x) {
   int is_true = x == Py_True;
   if (is_true | (x == Py_False) | (x == Py_None)) return is_true;
   else return PyObject_IsTrue(x);
}
static CYTHON_INLINE int __Pyx_PyObject_IsTrueAndDecref(PyObject* x) {
    int retval;
    if (unlikely(!x)) return -1;
    retval = __Pyx_PyObject_IsTrue(x);
    Py_DECREF(x);
    return retval;
}
static PyObject* __Pyx_PyNumber_IntOrLongWrongResultType(PyObject* result, const char* type_name) {
#if PY_MAJOR_VERSION >= 3
    if (PyLong_Check(result)) {
        if (PyErr_WarnFormat(PyExc_DeprecationWarning, 1,
                "__int__ returned non-int (type %.200s).  "
                "The ability to return an instance of a strict subclass of int "
                "is deprecated, and may be removed in a future version of Python.",
                Py_TYPE(result)->tp_name)) {
            Py_DECREF(result);
            return NULL;
        }
        return result;
    }
#endif
    PyErr_Format(PyExc_TypeError,
                 "__%.4s__ returned non-%.4s (type %.200s)",
                 type_name, type_name, Py_TYPE(result)->tp_name);
    Py_DECREF(result);
    return NULL;
}
static CYTHON_INLINE PyObject* __Pyx_PyNumber_IntOrLong(PyObject* x) {
#if CYTHON_USE_TYPE_SLOTS
  PyNumberMethods *m;
#endif
  const char *name = NULL;
  PyObject *res = NULL;
#if PY_MAJOR_VERSION < 3
  if (likely(PyInt_Check(x) || PyLong_Check(x)))
#else
  if (likely(PyLong_Check(x)))
#endif
    return __Pyx_NewRef(x);
#if CYTHON_USE_TYPE_SLOTS
  m = Py_TYPE(x)->tp_as_number;
  #if PY_MAJOR_VERSION < 3
  if (m && m->nb_int) {
    name = "int";
    res = m->nb_int(x);
  }
  else if (m && m->nb_long) {
    name = "long";
    res = m->nb_long(x);
  }
  #else
  if (likely(m && m->nb_int)) {
    name = "int";
    res = m->nb_int(x);
  }
  #endif
#else
  if (!PyBytes_CheckExact(x) && !PyUnicode_CheckExact(x)) {
    res = PyNumber_Int(x);
  }
#endif
  if (likely(res)) {
#if PY_MAJOR_VERSION < 3
    if (unlikely(!PyInt_Check(res) && !PyLong_Check(res))) {
#else
    if (unlikely(!PyLong_CheckExact(res))) {
#endif
        return __Pyx_PyNumber_IntOrLongWrongResultType(res, name);
    }
  }
  else if (!PyErr_Occurred()) {
    PyErr_SetString(PyExc_TypeError,
                    "an integer is required");
  }
  return res;
}
static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject* b) {
  Py_ssize_t ival;
  PyObject *x;
#if PY_MAJOR_VERSION < 3
  if (likely(PyInt_CheckExact(b))) {
    if (sizeof(Py_ssize_t) >= sizeof(long))
        return PyInt_AS_LONG(b);
    else
        return PyInt_AsSsize_t(b);
  }
#endif
  if (likely(PyLong_CheckExact(b))) {
    #if CYTHON_USE_PYLONG_INTERNALS
    const digit* digits = ((PyLongObject*)b)->ob_digit;
    const Py_ssize_t size = Py_SIZE(b);
    if (likely(__Pyx_sst_abs(size) <= 1)) {
        ival = likely(size) ? digits[0] : 0;
        if (size == -1) ival = -ival;
        return ival;
    } else {
      switch (size) {
         case 2:
           if (8 * sizeof(Py_ssize_t) > 2 * PyLong_SHIFT) {
             return (Py_ssize_t) (((((size_t)digits[1]) << PyLong_SHIFT) | (size_t)digits[0]));
           }
           break;
         case -2:
           if (8 * sizeof(Py_ssize_t) > 2 * PyLong_SHIFT) {
             return -(Py_ssize_t) (((((size_t)digits[1]) << PyLong_SHIFT) | (size_t)digits[0]));
           }
           break;
         case 3:
           if (8 * sizeof(Py_ssize_t) > 3 * PyLong_SHIFT) {
             return (Py_ssize_t) (((((((size_t)digits[2]) << PyLong_SHIFT) | (size_t)digits[1]) << PyLong_SHIFT) | (size_t)digits[0]));
           }
           break;
         case -3:
           if (8 * sizeof(Py_ssize_t) > 3 * PyLong_SHIFT) {
             return -(Py_ssize_t) (((((((size_t)digits[2]) << PyLong_SHIFT) | (size_t)digits[1]) << PyLong_SHIFT) | (size_t)digits[0]));
           }
           break;
         case 4:
           if (8 * sizeof(Py_ssize_t) > 4 * PyLong_SHIFT) {
             return (Py_ssize_t) (((((((((size_t)digits[3]) << PyLong_SHIFT) | (size_t)digits[2]) << PyLong_SHIFT) | (size_t)digits[1]) << PyLong_SHIFT) | (size_t)digits[0]));
           }
           break;
         case -4:
           if (8 * sizeof(Py_ssize_t) > 4 * PyLong_SHIFT) {
             return -(Py_ssize_t) (((((((((size_t)digits[3]) << PyLong_SHIFT) | (size_t)digits[2]) << PyLong_SHIFT) | (size_t)digits[1]) << PyLong_SHIFT) | (size_t)digits[0]));
           }
           break;
      }
    }
    #endif
    return PyLong_AsSsize_t(b);
  }
  x = PyNumber_Index(b);
  if (!x) return -1;
  ival = PyInt_AsSsize_t(x);
  Py_DECREF(x);
  return ival;
}
static CYTHON_INLINE Py_hash_t __Pyx_PyIndex_AsHash_t(PyObject* o) {
  if (sizeof(Py_hash_t) == sizeof(Py_ssize_t)) {
    return (Py_hash_t) __Pyx_PyIndex_AsSsize_t(o);
#if PY_MAJOR_VERSION < 3
  } else if (likely(PyInt_CheckExact(o))) {
    return PyInt_AS_LONG(o);
#endif
  } else {
    Py_ssize_t ival;
    PyObject *x;
    x = PyNumber_Index(o);
    if (!x) return -1;
    ival = PyInt_AsLong(x);
    Py_DECREF(x);
    return ival;
  }
}
static CYTHON_INLINE PyObject * __Pyx_PyBool_FromLong(long b) {
  return b ? __Pyx_NewRef(Py_True) : __Pyx_NewRef(Py_False);
}
static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t ival) {
    return PyInt_FromSize_t(ival);
}


#endif /* Py_PYTHON_H */


================================================
FILE: TTS/tts/utils/monotonic_align/core.pyx
================================================
import numpy as np

cimport cython
cimport numpy as np

from cython.parallel import prange


@cython.boundscheck(False)
@cython.wraparound(False)
cdef void maximum_path_each(int[:,::1] path, float[:,::1] value, int t_x, int t_y, float max_neg_val) nogil:
  cdef int x
  cdef int y
  cdef float v_prev
  cdef float v_cur
  cdef float tmp
  cdef int index = t_x - 1

  for y in range(t_y):
    for x in range(max(0, t_x + y - t_y), min(t_x, y + 1)):
      if x == y:
        v_cur = max_neg_val
      else:
        v_cur = value[x, y-1]
      if x == 0:
        if y == 0:
          v_prev = 0.
        else:
          v_prev = max_neg_val
      else:
        v_prev = value[x-1, y-1]
      value[x, y] = max(v_cur, v_prev) + value[x, y]

  for y in range(t_y - 1, -1, -1):
    path[index, y] = 1
    if index != 0 and (index == y or value[index, y-1] < value[index-1, y-1]):
      index = index - 1


@cython.boundscheck(False)
@cython.wraparound(False)
cpdef void maximum_path_c(int[:,:,::1] paths, float[:,:,::1] values, int[::1] t_xs, int[::1] t_ys, float max_neg_val=-1e9) nogil:
  cdef int b = values.shape[0]

  cdef int i
  for i in prange(b, nogil=True):
    maximum_path_each(paths[i], values[i], t_xs[i], t_ys[i], max_neg_val)


================================================
FILE: TTS/tts/utils/monotonic_align/setup.py
================================================
# from distutils.core import setup
# from Cython.Build import cythonize
# import numpy

# setup(name='monotonic_align',
#       ext_modules=cythonize("core.pyx"),
#       include_dirs=[numpy.get_include()])


================================================
FILE: TTS/tts/utils/speakers.py
================================================
import json
import os
from typing import Any, Dict, List, Union

import fsspec
import numpy as np
import torch
from coqpit import Coqpit

from TTS.config import get_from_config_or_model_args_with_default
from TTS.tts.utils.managers import EmbeddingManager


class SpeakerManager(EmbeddingManager):
    """Manage the speakers for multi-speaker 🐸TTS models. Load a datafile and parse the information
    in a way that can be queried by speaker or clip.

    There are 3 different scenarios considered:

    1. Models using speaker embedding layers. The datafile only maps speaker names to ids used by the embedding layer.
    2. Models using d-vectors. The datafile includes a dictionary in the following format.

    ::

        {
            'clip_name.wav':{
                'name': 'speakerA',
                'embedding'[<d_vector_values>]
            },
            ...
        }


    3. Computing the d-vectors by the speaker encoder. It loads the speaker encoder model and
    computes the d-vectors for a given clip or speaker.

    Args:
        d_vectors_file_path (str, optional): Path to the metafile including x vectors. Defaults to "".
        speaker_id_file_path (str, optional): Path to the metafile that maps speaker names to ids used by
        TTS models. Defaults to "".
        encoder_model_path (str, optional): Path to the speaker encoder model file. Defaults to "".
        encoder_config_path (str, optional): Path to the spealer encoder config file. Defaults to "".

    Examples:
        >>> # load audio processor and speaker encoder
        >>> ap = AudioProcessor(**config.audio)
        >>> manager = SpeakerManager(encoder_model_path=encoder_model_path, encoder_config_path=encoder_config_path)
        >>> # load a sample audio and compute embedding
        >>> waveform = ap.load_wav(sample_wav_path)
        >>> mel = ap.melspectrogram(waveform)
        >>> d_vector = manager.compute_embeddings(mel.T)
    """

    def __init__(
        self,
        data_items: List[List[Any]] = None,
        d_vectors_file_path: str = "",
        speaker_id_file_path: str = "",
        encoder_model_path: str = "",
        encoder_config_path: str = "",
        use_cuda: bool = False,
    ):
        super().__init__(
            embedding_file_path=d_vectors_file_path,
            id_file_path=speaker_id_file_path,
            encoder_model_path=encoder_model_path,
            encoder_config_path=encoder_config_path,
            use_cuda=use_cuda,
        )

        if data_items:
            self.set_ids_from_data(data_items, parse_key="speaker_name")

    @property
    def num_speakers(self):
        return len(self.name_to_id)

    @property
    def speaker_names(self):
        return list(self.name_to_id.keys())

    def get_speakers(self) -> List:
        return self.name_to_id

    @staticmethod
    def init_from_config(config: "Coqpit", samples: Union[List[List], List[Dict]] = None) -> "SpeakerManager":
        """Initialize a speaker manager from config

        Args:
            config (Coqpit): Config object.
            samples (Union[List[List], List[Dict]], optional): List of data samples to parse out the speaker names.
                Defaults to None.

        Returns:
            SpeakerEncoder: Speaker encoder object.
        """
        speaker_manager = None
        if get_from_config_or_model_args_with_default(config, "use_speaker_embedding", False):
            if samples:
                speaker_manager = SpeakerManager(data_items=samples)
            if get_from_config_or_model_args_with_default(config, "speaker_file", None):
                speaker_manager = SpeakerManager(
                    speaker_id_file_path=get_from_config_or_model_args_with_default(config, "speaker_file", None)
                )
            if get_from_config_or_model_args_with_default(config, "speakers_file", None):
                speaker_manager = SpeakerManager(
                    speaker_id_file_path=get_from_config_or_model_args_with_default(config, "speakers_file", None)
                )

        if get_from_config_or_model_args_with_default(config, "use_d_vector_file", False):
            speaker_manager = SpeakerManager()
            if get_from_config_or_model_args_with_default(config, "d_vector_file", None):
                speaker_manager = SpeakerManager(
                    d_vectors_file_path=get_from_config_or_model_args_with_default(config, "d_vector_file", None)
                )
        return speaker_manager


def _set_file_path(path):
    """Find the speakers.json under the given path or the above it.
    Intended to band aid the different paths returned in restored and continued training."""
    path_restore = os.path.join(os.path.dirname(path), "speakers.json")
    path_continue = os.path.join(path, "speakers.json")
    fs = fsspec.get_mapper(path).fs
    if fs.exists(path_restore):
        return path_restore
    if fs.exists(path_continue):
        return path_continue
    raise FileNotFoundError(f" [!] `speakers.json` not found in {path}")


def load_speaker_mapping(out_path):
    """Loads speaker mapping if already present."""
    if os.path.splitext(out_path)[1] == ".json":
        json_file = out_path
    else:
        json_file = _set_file_path(out_path)
    with fsspec.open(json_file, "r") as f:
        return json.load(f)


def save_speaker_mapping(out_path, speaker_mapping):
    """Saves speaker mapping if not yet present."""
    if out_path is not None:
        speakers_json_path = _set_file_path(out_path)
        with fsspec.open(speakers_json_path, "w") as f:
            json.dump(speaker_mapping, f, indent=4)


def get_speaker_manager(c: Coqpit, data: List = None, restore_path: str = None, out_path: str = None) -> SpeakerManager:
    """Initiate a `SpeakerManager` instance by the provided config.

    Args:
        c (Coqpit): Model configuration.
        restore_path (str): Path to a previous training folder.
        data (List): Data samples used in training to infer speakers from. It must be provided if speaker embedding
            layers is used. Defaults to None.
        out_path (str, optional): Save the generated speaker IDs to a output path. Defaults to None.

    Returns:
        SpeakerManager: initialized and ready to use instance.
    """
    speaker_manager = SpeakerManager()
    if c.use_speaker_embedding:
        if data is not None:
            speaker_manager.set_ids_from_data(data, parse_key="speaker_name")
        if restore_path:
            speakers_file = _set_file_path(restore_path)
            # restoring speaker manager from a previous run.
            if c.use_d_vector_file:
                # restore speaker manager with the embedding file
                if not os.path.exists(speakers_file):
                    print("WARNING: speakers.json was not found in restore_path, trying to use CONFIG.d_vector_file")
                    if not os.path.exists(c.d_vector_file):
                        raise RuntimeError(
                            "You must copy the file speakers.json to restore_path, or set a valid file in CONFIG.d_vector_file"
                        )
                    speaker_manager.load_embeddings_from_file(c.d_vector_file)
                speaker_manager.load_embeddings_from_file(speakers_file)
            elif not c.use_d_vector_file:  # restor speaker manager with speaker ID file.
                speaker_ids_from_data = speaker_manager.name_to_id
                speaker_manager.load_ids_from_file(speakers_file)
                assert all(
                    speaker in speaker_manager.name_to_id for speaker in speaker_ids_from_data
                ), " [!] You cannot introduce new speakers to a pre-trained model."
        elif c.use_d_vector_file and c.d_vector_file:
            # new speaker manager with external speaker embeddings.
            speaker_manager.load_embeddings_from_file(c.d_vector_file)
        elif c.use_d_vector_file and not c.d_vector_file:
            raise "use_d_vector_file is True, so you need pass a external speaker embedding file."
        elif c.use_speaker_embedding and "speakers_file" in c and c.speakers_file:
            # new speaker manager with speaker IDs file.
            speaker_manager.load_ids_from_file(c.speakers_file)

        if speaker_manager.num_speakers > 0:
            print(
                " > Speaker manager is loaded with {} speakers: {}".format(
                    speaker_manager.num_speakers, ", ".join(speaker_manager.name_to_id)
                )
            )

        # save file if path is defined
        if out_path:
            out_file_path = os.path.join(out_path, "speakers.json")
            print(f" > Saving `speakers.json` to {out_file_path}.")
            if c.use_d_vector_file and c.d_vector_file:
                speaker_manager.save_embeddings_to_file(out_file_path)
            else:
                speaker_manager.save_ids_to_file(out_file_path)
    return speaker_manager


def get_speaker_balancer_weights(items: list):
    speaker_names = np.array([item["speaker_name"] for item in items])
    unique_speaker_names = np.unique(speaker_names).tolist()
    speaker_ids = [unique_speaker_names.index(l) for l in speaker_names]
    speaker_count = np.array([len(np.where(speaker_names == l)[0]) for l in unique_speaker_names])
    weight_speaker = 1.0 / speaker_count
    dataset_samples_weight = np.array([weight_speaker[l] for l in speaker_ids])
    # normalize
    dataset_samples_weight = dataset_samples_weight / np.linalg.norm(dataset_samples_weight)
    return torch.from_numpy(dataset_samples_weight).float()


================================================
FILE: TTS/tts/utils/ssim.py
================================================
# Adopted from https://github.com/photosynthesis-team/piq

from typing import List, Optional, Tuple, Union

import torch
import torch.nn.functional as F
from torch.nn.modules.loss import _Loss


def _reduce(x: torch.Tensor, reduction: str = "mean") -> torch.Tensor:
    r"""Reduce input in batch dimension if needed.
    Args:
        x: Tensor with shape (N, *).
        reduction: Specifies the reduction type:
            ``'none'`` | ``'mean'`` | ``'sum'``. Default: ``'mean'``
    """
    if reduction == "none":
        return x
    if reduction == "mean":
        return x.mean(dim=0)
    if reduction == "sum":
        return x.sum(dim=0)
    raise ValueError("Unknown reduction. Expected one of {'none', 'mean', 'sum'}")


def _validate_input(
    tensors: List[torch.Tensor],
    dim_range: Tuple[int, int] = (0, -1),
    data_range: Tuple[float, float] = (0.0, -1.0),
    # size_dim_range: Tuple[float, float] = (0., -1.),
    size_range: Optional[Tuple[int, int]] = None,
) -> None:
    r"""Check that input(-s)  satisfies the requirements
    Args:
        tensors: Tensors to check
        dim_range: Allowed number of dimensions. (min, max)
        data_range: Allowed range of values in tensors. (min, max)
        size_range: Dimensions to include in size comparison. (start_dim, end_dim + 1)
    """

    if not __debug__:
        return

    x = tensors[0]

    for t in tensors:
        assert torch.is_tensor(t), f"Expected torch.Tensor, got {type(t)}"
        assert t.device == x.device, f"Expected tensors to be on {x.device}, got {t.device}"

        if size_range is None:
            assert t.size() == x.size(), f"Expected tensors with same size, got {t.size()} and {x.size()}"
        else:
            assert (
                t.size()[size_range[0] : size_range[1]] == x.size()[size_range[0] : size_range[1]]
            ), f"Expected tensors with same size at given dimensions, got {t.size()} and {x.size()}"

        if dim_range[0] == dim_range[1]:
            assert t.dim() == dim_range[0], f"Expected number of dimensions to be {dim_range[0]}, got {t.dim()}"
        elif dim_range[0] < dim_range[1]:
            assert (
                dim_range[0] <= t.dim() <= dim_range[1]
            ), f"Expected number of dimensions to be between {dim_range[0]} and {dim_range[1]}, got {t.dim()}"

        if data_range[0] < data_range[1]:
            assert data_range[0] <= t.min(), f"Expected values to be greater or equal to {data_range[0]}, got {t.min()}"
            assert t.max() <= data_range[1], f"Expected values to be lower or equal to {data_range[1]}, got {t.max()}"


def gaussian_filter(kernel_size: int, sigma: float) -> torch.Tensor:
    r"""Returns 2D Gaussian kernel N(0,`sigma`^2)
    Args:
        size: Size of the kernel
        sigma: Std of the distribution
    Returns:
        gaussian_kernel: Tensor with shape (1, kernel_size, kernel_size)
    """
    coords = torch.arange(kernel_size, dtype=torch.float32)
    coords -= (kernel_size - 1) / 2.0

    g = coords**2
    g = (-(g.unsqueeze(0) + g.unsqueeze(1)) / (2 * sigma**2)).exp()

    g /= g.sum()
    return g.unsqueeze(0)


def ssim(
    x: torch.Tensor,
    y: torch.Tensor,
    kernel_size: int = 11,
    kernel_sigma: float = 1.5,
    data_range: Union[int, float] = 1.0,
    reduction: str = "mean",
    full: bool = False,
    downsample: bool = True,
    k1: float = 0.01,
    k2: float = 0.03,
) -> List[torch.Tensor]:
    r"""Interface of Structural Similarity (SSIM) index.
    Inputs supposed to be in range ``[0, data_range]``.
    To match performance with skimage and tensorflow set ``'downsample' = True``.

    Args:
        x: An input tensor. Shape :math:`(N, C, H, W)` or :math:`(N, C, H, W, 2)`.
        y: A target tensor. Shape :math:`(N, C, H, W)` or :math:`(N, C, H, W, 2)`.
        kernel_size: The side-length of the sliding window used in comparison. Must be an odd value.
        kernel_sigma: Sigma of normal distribution.
        data_range: Maximum value range of images (usually 1.0 or 255).
        reduction: Specifies the reduction type:
            ``'none'`` | ``'mean'`` | ``'sum'``. Default:``'mean'``
        full: Return cs map or not.
        downsample: Perform average pool before SSIM computation. Default: True
        k1: Algorithm parameter, K1 (small constant).
        k2: Algorithm parameter, K2 (small constant).
            Try a larger K2 constant (e.g. 0.4) if you get a negative or NaN results.

    Returns:
        Value of Structural Similarity (SSIM) index. In case of 5D input tensors, complex value is returned
        as a tensor of size 2.

    References:
        Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004).
        Image quality assessment: From error visibility to structural similarity.
        IEEE Transactions on Image Processing, 13, 600-612.
        https://ece.uwaterloo.ca/~z70wang/publications/ssim.pdf,
        DOI: `10.1109/TIP.2003.819861`
    """
    assert kernel_size % 2 == 1, f"Kernel size must be odd, got [{kernel_size}]"
    _validate_input([x, y], dim_range=(4, 5), data_range=(0, data_range))

    x = x / float(data_range)
    y = y / float(data_range)

    # Averagepool image if the size is large enough
    f = max(1, round(min(x.size()[-2:]) / 256))
    if (f > 1) and downsample:
        x = F.avg_pool2d(x, kernel_size=f)
        y = F.avg_pool2d(y, kernel_size=f)

    kernel = gaussian_filter(kernel_size, kernel_sigma).repeat(x.size(1), 1, 1, 1).to(y)
    _compute_ssim_per_channel = _ssim_per_channel_complex if x.dim() == 5 else _ssim_per_channel
    ssim_map, cs_map = _compute_ssim_per_channel(x=x, y=y, kernel=kernel, k1=k1, k2=k2)
    ssim_val = ssim_map.mean(1)
    cs = cs_map.mean(1)

    ssim_val = _reduce(ssim_val, reduction)
    cs = _reduce(cs, reduction)

    if full:
        return [ssim_val, cs]

    return ssim_val


class SSIMLoss(_Loss):
    r"""Creates a criterion that measures the structural similarity index error between
    each element in the input :math:`x` and target :math:`y`.

    To match performance with skimage and tensorflow set ``'downsample' = True``.

    The unreduced (i.e. with :attr:`reduction` set to ``'none'``) loss can be described as:

    .. math::
        SSIM = \{ssim_1,\dots,ssim_{N \times C}\}\\
        ssim_{l}(x, y) = \frac{(2 \mu_x \mu_y + c_1) (2 \sigma_{xy} + c_2)}
        {(\mu_x^2 +\mu_y^2 + c_1)(\sigma_x^2 +\sigma_y^2 + c_2)},

    where :math:`N` is the batch size, `C` is the channel size. If :attr:`reduction` is not ``'none'``
    (default ``'mean'``), then:

    .. math::
        SSIMLoss(x, y) =
        \begin{cases}
            \operatorname{mean}(1 - SSIM), &  \text{if reduction} = \text{'mean';}\\
            \operatorname{sum}(1 - SSIM),  &  \text{if reduction} = \text{'sum'.}
        \end{cases}

    :math:`x` and :math:`y` are tensors of arbitrary shapes with a total
    of :math:`n` elements each.

    The sum operation still operates over all the elements, and divides by :math:`n`.
    The division by :math:`n` can be avoided if one sets ``reduction = 'sum'``.
    In case of 5D input tensors, complex value is returned as a tensor of size 2.

    Args:
        kernel_size: By default, the mean and covariance of a pixel is obtained
            by convolution with given filter_size.
        kernel_sigma: Standard deviation for Gaussian kernel.
        k1: Coefficient related to c1 in the above equation.
        k2: Coefficient related to c2 in the above equation.
        downsample: Perform average pool before SSIM computation. Default: True
        reduction: Specifies the reduction type:
            ``'none'`` | ``'mean'`` | ``'sum'``. Default:``'mean'``
        data_range: Maximum value range of images (usually 1.0 or 255).

    Examples:
        >>> loss = SSIMLoss()
        >>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
        >>> y = torch.rand(3, 3, 256, 256)
        >>> output = loss(x, y)
        >>> output.backward()

    References:
        Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004).
        Image quality assessment: From error visibility to structural similarity.
        IEEE Transactions on Image Processing, 13, 600-612.
        https://ece.uwaterloo.ca/~z70wang/publications/ssim.pdf,
        DOI:`10.1109/TIP.2003.819861`
    """
    __constants__ = ["kernel_size", "k1", "k2", "sigma", "kernel", "reduction"]

    def __init__(
        self,
        kernel_size: int = 11,
        kernel_sigma: float = 1.5,
        k1: float = 0.01,
        k2: float = 0.03,
        downsample: bool = True,
        reduction: str = "mean",
        data_range: Union[int, float] = 1.0,
    ) -> None:
        super().__init__()

        # Generic loss parameters.
        self.reduction = reduction

        # Loss-specific parameters.
        self.kernel_size = kernel_size

        # This check might look redundant because kernel size is checked within the ssim function anyway.
        # However, this check allows to fail fast when the loss is being initialised and training has not been started.
        assert kernel_size % 2 == 1, f"Kernel size must be odd, got [{kernel_size}]"
        self.kernel_sigma = kernel_sigma
        self.k1 = k1
        self.k2 = k2
        self.downsample = downsample
        self.data_range = data_range

    def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
        r"""Computation of Structural Similarity (SSIM) index as a loss function.

        Args:
            x: An input tensor. Shape :math:`(N, C, H, W)` or :math:`(N, C, H, W, 2)`.
            y: A target tensor. Shape :math:`(N, C, H, W)` or :math:`(N, C, H, W, 2)`.

        Returns:
            Value of SSIM loss to be minimized, i.e ``1 - ssim`` in [0, 1] range. In case of 5D input tensors,
            complex value is returned as a tensor of size 2.
        """

        score = ssim(
            x=x,
            y=y,
            kernel_size=self.kernel_size,
            kernel_sigma=self.kernel_sigma,
            downsample=self.downsample,
            data_range=self.data_range,
            reduction=self.reduction,
            full=False,
            k1=self.k1,
            k2=self.k2,
        )
        return torch.ones_like(score) - score


def _ssim_per_channel(
    x: torch.Tensor,
    y: torch.Tensor,
    kernel: torch.Tensor,
    k1: float = 0.01,
    k2: float = 0.03,
) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
    r"""Calculate Structural Similarity (SSIM) index for X and Y per channel.

    Args:
        x: An input tensor. Shape :math:`(N, C, H, W)`.
        y: A target tensor. Shape :math:`(N, C, H, W)`.
        kernel: 2D Gaussian kernel.
        k1: Algorithm parameter, K1 (small constant, see [1]).
        k2: Algorithm parameter, K2 (small constant, see [1]).
            Try a larger K2 constant (e.g. 0.4) if you get a negative or NaN results.

    Returns:
        Full Value of Structural Similarity (SSIM) index.
    """
    if x.size(-1) < kernel.size(-1) or x.size(-2) < kernel.size(-2):
        raise ValueError(
            f"Kernel size can't be greater than actual input size. Input size: {x.size()}. "
            f"Kernel size: {kernel.size()}"
        )

    c1 = k1**2
    c2 = k2**2
    n_channels = x.size(1)
    mu_x = F.conv2d(x, weight=kernel, stride=1, padding=0, groups=n_channels)
    mu_y = F.conv2d(y, weight=kernel, stride=1, padding=0, groups=n_channels)

    mu_xx = mu_x**2
    mu_yy = mu_y**2
    mu_xy = mu_x * mu_y

    sigma_xx = F.conv2d(x**2, weight=kernel, stride=1, padding=0, groups=n_channels) - mu_xx
    sigma_yy = F.conv2d(y**2, weight=kernel, stride=1, padding=0, groups=n_channels) - mu_yy
    sigma_xy = F.conv2d(x * y, weight=kernel, stride=1, padding=0, groups=n_channels) - mu_xy

    # Contrast sensitivity (CS) with alpha = beta = gamma = 1.
    cs = (2.0 * sigma_xy + c2) / (sigma_xx + sigma_yy + c2)

    # Structural similarity (SSIM)
    ss = (2.0 * mu_xy + c1) / (mu_xx + mu_yy + c1) * cs

    ssim_val = ss.mean(dim=(-1, -2))
    cs = cs.mean(dim=(-1, -2))
    return ssim_val, cs


def _ssim_per_channel_complex(
    x: torch.Tensor,
    y: torch.Tensor,
    kernel: torch.Tensor,
    k1: float = 0.01,
    k2: float = 0.03,
) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]:
    r"""Calculate Structural Similarity (SSIM) index for Complex X and Y per channel.

    Args:
        x: An input tensor. Shape :math:`(N, C, H, W, 2)`.
        y: A target tensor. Shape :math:`(N, C, H, W, 2)`.
        kernel: 2-D gauss kernel.
        k1: Algorithm parameter, K1 (small constant, see [1]).
        k2: Algorithm parameter, K2 (small constant, see [1]).
            Try a larger K2 constant (e.g. 0.4) if you get a negative or NaN results.

    Returns:
        Full Value of Complex Structural Similarity (SSIM) index.
    """
    n_channels = x.size(1)
    if x.size(-2) < kernel.size(-1) or x.size(-3) < kernel.size(-2):
        raise ValueError(
            f"Kernel size can't be greater than actual input size. Input size: {x.size()}. "
            f"Kernel size: {kernel.size()}"
        )

    c1 = k1**2
    c2 = k2**2

    x_real = x[..., 0]
    x_imag = x[..., 1]
    y_real = y[..., 0]
    y_imag = y[..., 1]

    mu1_real = F.conv2d(x_real, weight=kernel, stride=1, padding=0, groups=n_channels)
    mu1_imag = F.conv2d(x_imag, weight=kernel, stride=1, padding=0, groups=n_channels)
    mu2_real = F.conv2d(y_real, weight=kernel, stride=1, padding=0, groups=n_channels)
    mu2_imag = F.conv2d(y_imag, weight=kernel, stride=1, padding=0, groups=n_channels)

    mu1_sq = mu1_real.pow(2) + mu1_imag.pow(2)
    mu2_sq = mu2_real.pow(2) + mu2_imag.pow(2)
    mu1_mu2_real = mu1_real * mu2_real - mu1_imag * mu2_imag
    mu1_mu2_imag = mu1_real * mu2_imag + mu1_imag * mu2_real

    compensation = 1.0

    x_sq = x_real.pow(2) + x_imag.pow(2)
    y_sq = y_real.pow(2) + y_imag.pow(2)
    x_y_real = x_real * y_real - x_imag * y_imag
    x_y_imag = x_real * y_imag + x_imag * y_real

    sigma1_sq = F.conv2d(x_sq, weight=kernel, stride=1, padding=0, groups=n_channels) - mu1_sq
    sigma2_sq = F.conv2d(y_sq, weight=kernel, stride=1, padding=0, groups=n_channels) - mu2_sq
    sigma12_real = F.conv2d(x_y_real, weight=kernel, stride=1, padding=0, groups=n_channels) - mu1_mu2_real
    sigma12_imag = F.conv2d(x_y_imag, weight=kernel, stride=1, padding=0, groups=n_channels) - mu1_mu2_imag
    sigma12 = torch.stack((sigma12_imag, sigma12_real), dim=-1)
    mu1_mu2 = torch.stack((mu1_mu2_real, mu1_mu2_imag), dim=-1)
    # Set alpha = beta = gamma = 1.
    cs_map = (sigma12 * 2 + c2 * compensation) / (sigma1_sq.unsqueeze(-1) + sigma2_sq.unsqueeze(-1) + c2 * compensation)
    ssim_map = (mu1_mu2 * 2 + c1 * compensation) / (mu1_sq.unsqueeze(-1) + mu2_sq.unsqueeze(-1) + c1 * compensation)
    ssim_map = ssim_map * cs_map

    ssim_val = ssim_map.mean(dim=(-2, -3))
    cs = cs_map.mean(dim=(-2, -3))

    return ssim_val, cs


================================================
FILE: TTS/tts/utils/synthesis.py
================================================
from typing import Dict

import numpy as np
import torch
from torch import nn


def numpy_to_torch(np_array, dtype, cuda=False):
    if np_array is None:
        return None
    tensor = torch.as_tensor(np_array, dtype=dtype)
    if cuda:
        return tensor.cuda()
    return tensor


def compute_style_mel(style_wav, ap, cuda=False):
    style_mel = torch.FloatTensor(ap.melspectrogram(ap.load_wav(style_wav, sr=ap.sample_rate))).unsqueeze(0)
    if cuda:
        return style_mel.cuda()
    return style_mel


def run_model_torch(
    model: nn.Module,
    inputs: torch.Tensor,
    speaker_id: int = None,
    style_mel: torch.Tensor = None,
    style_text: str = None,
    d_vector: torch.Tensor = None,
    language_id: torch.Tensor = None,
) -> Dict:
    """Run a torch model for inference. It does not support batch inference.

    Args:
        model (nn.Module): The model to run inference.
        inputs (torch.Tensor): Input tensor with character ids.
        speaker_id (int, optional): Input speaker ids for multi-speaker models. Defaults to None.
        style_mel (torch.Tensor, optional): Spectrograms used for voice styling . Defaults to None.
        d_vector (torch.Tensor, optional): d-vector for multi-speaker models    . Defaults to None.

    Returns:
        Dict: model outputs.
    """
    input_lengths = torch.tensor(inputs.shape[1:2]).to(inputs.device)
    if hasattr(model, "module"):
        _func = model.module.inference
    else:
        _func = model.inference
    outputs = _func(
        inputs,
        aux_input={
            "x_lengths": input_lengths,
            "speaker_ids": speaker_id,
            "d_vectors": d_vector,
            "style_mel": style_mel,
            "style_text": style_text,
            "language_ids": language_id,
        },
    )
    return outputs


def trim_silence(wav, ap):
    return wav[: ap.find_endpoint(wav)]


def inv_spectrogram(postnet_output, ap, CONFIG):
    if CONFIG.model.lower() in ["tacotron"]:
        wav = ap.inv_spectrogram(postnet_output.T)
    else:
        wav = ap.inv_melspectrogram(postnet_output.T)
    return wav


def id_to_torch(aux_id, cuda=False):
    if aux_id is not None:
        aux_id = np.asarray(aux_id)
        aux_id = torch.from_numpy(aux_id)
    if cuda:
        return aux_id.cuda()
    return aux_id


def embedding_to_torch(d_vector, cuda=False):
    if d_vector is not None:
        d_vector = np.asarray(d_vector)
        d_vector = torch.from_numpy(d_vector).type(torch.FloatTensor)
        d_vector = d_vector.squeeze().unsqueeze(0)
    if cuda:
        return d_vector.cuda()
    return d_vector


# TODO: perform GL with pytorch for batching
def apply_griffin_lim(inputs, input_lens, CONFIG, ap):
    """Apply griffin-lim to each sample iterating throught the first dimension.
    Args:
        inputs (Tensor or np.Array): Features to be converted by GL. First dimension is the batch size.
        input_lens (Tensor or np.Array): 1D array of sample lengths.
        CONFIG (Dict): TTS config.
        ap (AudioProcessor): TTS audio processor.
    """
    wavs = []
    for idx, spec in enumerate(inputs):
        wav_len = (input_lens[idx] * ap.hop_length) - ap.hop_length  # inverse librosa padding
        wav = inv_spectrogram(spec, ap, CONFIG)
        # assert len(wav) == wav_len, f" [!] wav lenght: {len(wav)} vs expected: {wav_len}"
        wavs.append(wav[:wav_len])
    return wavs


def synthesis(
    model,
    text,
    CONFIG,
    use_cuda,
    speaker_id=None,
    style_wav=None,
    style_text=None,
    use_griffin_lim=False,
    do_trim_silence=False,
    d_vector=None,
    language_id=None,
):
    """Synthesize voice for the given text using Griffin-Lim vocoder or just compute output features to be passed to
    the vocoder model.

    Args:
        model (TTS.tts.models):
            The TTS model to synthesize audio with.

        text (str):
            The input text to convert to speech.

        CONFIG (Coqpit):
            Model configuration.

        use_cuda (bool):
            Enable/disable CUDA.

        speaker_id (int):
            Speaker ID passed to the speaker embedding layer in multi-speaker model. Defaults to None.

        style_wav (str | Dict[str, float]):
            Path or tensor to/of a waveform used for computing the style embedding based on GST or Capacitron.
            Defaults to None, meaning that Capacitron models will sample from the prior distribution to
            generate random but realistic prosody.

        style_text (str):
            Transcription of style_wav for Capacitron models. Defaults to None.

        enable_eos_bos_chars (bool):
            enable special chars for end of sentence and start of sentence. Defaults to False.

        do_trim_silence (bool):
            trim silence after synthesis. Defaults to False.

        d_vector (torch.Tensor):
            d-vector for multi-speaker models in share :math:`[1, D]`. Defaults to None.

        language_id (int):
            Language ID passed to the language embedding layer in multi-langual model. Defaults to None.
    """
    # GST or Capacitron processing
    # TODO: need to handle the case of setting both gst and capacitron to true somewhere
    style_mel = None
    if CONFIG.has("gst") and CONFIG.gst and style_wav is not None:
        if isinstance(style_wav, dict):
            style_mel = style_wav
        else:
            style_mel = compute_style_mel(style_wav, model.ap, cuda=use_cuda)

    if CONFIG.has("capacitron_vae") and CONFIG.use_capacitron_vae and style_wav is not None:
        style_mel = compute_style_mel(style_wav, model.ap, cuda=use_cuda)
        style_mel = style_mel.transpose(1, 2)  # [1, time, depth]

    language_name = None
    if language_id is not None:
        language = [k for k, v in model.language_manager.name_to_id.items() if v == language_id]
        assert len(language) == 1, "language_id must be a valid language"
        language_name = language[0]

    # convert text to sequence of token IDs
    text_inputs = np.asarray(
        model.tokenizer.text_to_ids(text, language=language_name),
        dtype=np.int32,
    )
    # pass tensors to backend
    if speaker_id is not None:
        speaker_id = id_to_torch(speaker_id, cuda=use_cuda)

    if d_vector is not None:
        d_vector = embedding_to_torch(d_vector, cuda=use_cuda)

    if language_id is not None:
        language_id = id_to_torch(language_id, cuda=use_cuda)

    if not isinstance(style_mel, dict):
        # GST or Capacitron style mel
        style_mel = numpy_to_torch(style_mel, torch.float, cuda=use_cuda)
        if style_text is not None:
            style_text = np.asarray(
                model.tokenizer.text_to_ids(style_text, language=language_id),
                dtype=np.int32,
            )
            style_text = numpy_to_torch(style_text, torch.long, cuda=use_cuda)
            style_text = style_text.unsqueeze(0)

    text_inputs = numpy_to_torch(text_inputs, torch.long, cuda=use_cuda)
    text_inputs = text_inputs.unsqueeze(0)
    # synthesize voice
    outputs = run_model_torch(
        model,
        text_inputs,
        speaker_id,
        style_mel,
        style_text,
        d_vector=d_vector,
        language_id=language_id,
    )
    model_outputs = outputs["model_outputs"]
    model_outputs = model_outputs[0].data.cpu().numpy()
    alignments = outputs["alignments"]

    # convert outputs to numpy
    # plot results
    wav = None
    model_outputs = model_outputs.squeeze()
    if model_outputs.ndim == 2:  # [T, C_spec]
        if use_griffin_lim:
            wav = inv_spectrogram(model_outputs, model.ap, CONFIG)
            # trim silence
            if do_trim_silence:
                wav = trim_silence(wav, model.ap)
    else:  # [T,]
        wav = model_outputs
    return_dict = {
        "wav": wav,
        "alignments": alignments,
        "text_inputs": text_inputs,
        "outputs": outputs,
    }
    return return_dict


def transfer_voice(
    model,
    CONFIG,
    use_cuda,
    reference_wav,
    speaker_id=None,
    d_vector=None,
    reference_speaker_id=None,
    reference_d_vector=None,
    do_trim_silence=False,
    use_griffin_lim=False,
):
    """Synthesize voice for the given text using Griffin-Lim vocoder or just compute output features to be passed to
    the vocoder model.

    Args:
        model (TTS.tts.models):
            The TTS model to synthesize audio with.

        CONFIG (Coqpit):
            Model configuration.

        use_cuda (bool):
            Enable/disable CUDA.

        reference_wav (str):
            Path of reference_wav to be used to voice conversion.

        speaker_id (int):
            Speaker ID passed to the speaker embedding layer in multi-speaker model. Defaults to None.

        d_vector (torch.Tensor):
            d-vector for multi-speaker models in share :math:`[1, D]`. Defaults to None.

        reference_speaker_id (int):
            Reference Speaker ID passed to the speaker embedding layer in multi-speaker model. Defaults to None.

        reference_d_vector (torch.Tensor):
            Reference d-vector for multi-speaker models in share :math:`[1, D]`. Defaults to None.

        enable_eos_bos_chars (bool):
            enable special chars for end of sentence and start of sentence. Defaults to False.

        do_trim_silence (bool):
            trim silence after synthesis. Defaults to False.
    """
    # pass tensors to backend
    if speaker_id is not None:
        speaker_id = id_to_torch(speaker_id, cuda=use_cuda)

    if d_vector is not None:
        d_vector = embedding_to_torch(d_vector, cuda=use_cuda)

    if reference_d_vector is not None:
        reference_d_vector = embedding_to_torch(reference_d_vector, cuda=use_cuda)

    # load reference_wav audio
    reference_wav = embedding_to_torch(
        model.ap.load_wav(
            reference_wav, sr=model.args.encoder_sample_rate if model.args.encoder_sample_rate else model.ap.sample_rate
        ),
        cuda=use_cuda,
    )

    if hasattr(model, "module"):
        _func = model.module.inference_voice_conversion
    else:
        _func = model.inference_voice_conversion
    model_outputs = _func(reference_wav, speaker_id, d_vector, reference_speaker_id, reference_d_vector)

    # convert outputs to numpy
    # plot results
    wav = None
    model_outputs = model_outputs.squeeze()
    if model_outputs.ndim == 2:  # [T, C_spec]
        if use_griffin_lim:
            wav = inv_spectrogram(model_outputs, model.ap, CONFIG)
            # trim silence
            if do_trim_silence:
                wav = trim_silence(wav, model.ap)
    else:  # [T,]
        wav = model_outputs

    return wav


================================================
FILE: TTS/tts/utils/text/__init__.py
================================================
from TTS.tts.utils.text.tokenizer import TTSTokenizer


================================================
FILE: TTS/tts/utils/text/characters.py
================================================
from dataclasses import replace
from typing import Dict

from TTS.tts.configs.shared_configs import CharactersConfig


def parse_symbols():
    return {
        "pad": _pad,
        "eos": _eos,
        "bos": _bos,
        "characters": _characters,
        "punctuations": _punctuations,
        "phonemes": _phonemes,
    }


# DEFAULT SET OF GRAPHEMES
_pad = "<PAD>"
_eos = "<EOS>"
_bos = "<BOS>"
_blank = "<BLNK>"  # TODO: check if we need this alongside with PAD
_characters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
_punctuations = "!'(),-.:;? "


# DEFAULT SET OF IPA PHONEMES
# Phonemes definition (All IPA characters)
_vowels = "iyɨʉɯuɪʏʊeøɘəɵɤoɛœɜɞʌɔæɐaɶɑɒᵻ"
_non_pulmonic_consonants = "ʘɓǀɗǃʄǂɠǁʛ"
_pulmonic_consonants = "pbtdʈɖcɟkɡqɢʔɴŋɲɳnɱmʙrʀⱱɾɽɸβfvθðszʃʒʂʐçʝxɣχʁħʕhɦɬɮʋɹɻjɰlɭʎʟ"
_suprasegmentals = "ˈˌːˑ"
_other_symbols = "ʍwɥʜʢʡɕʑɺɧʲ"
_diacrilics = "ɚ˞ɫ"
_phonemes = _vowels + _non_pulmonic_consonants + _pulmonic_consonants + _suprasegmentals + _other_symbols + _diacrilics


class BaseVocabulary:
    """Base Vocabulary class.

    This class only needs a vocabulary dictionary without specifying the characters.

    Args:
        vocab (Dict): A dictionary of characters and their corresponding indices.
    """

    def __init__(self, vocab: Dict, pad: str = None, blank: str = None, bos: str = None, eos: str = None):
        self.vocab = vocab
        self.pad = pad
        self.blank = blank
        self.bos = bos
        self.eos = eos

    @property
    def pad_id(self) -> int:
        """Return the index of the padding character. If the padding character is not specified, return the length
        of the vocabulary."""
        return self.char_to_id(self.pad) if self.pad else len(self.vocab)

    @property
    def blank_id(self) -> int:
        """Return the index of the blank character. If the blank character is not specified, return the length of
        the vocabulary."""
        return self.char_to_id(self.blank) if self.blank else len(self.vocab)

    @property
    def vocab(self):
        """Return the vocabulary dictionary."""
        return self._vocab

    @vocab.setter
    def vocab(self, vocab):
        """Set the vocabulary dictionary and character mapping dictionaries."""
        self._vocab = vocab
        self._char_to_id = {char: idx for idx, char in enumerate(self._vocab)}
        self._id_to_char = {
            idx: char for idx, char in enumerate(self._vocab)  # pylint: disable=unnecessary-comprehension
        }

    @staticmethod
    def init_from_config(config, **kwargs):
        """Initialize from the given config."""
        if config.characters is not None and "vocab_dict" in config.characters and config.characters.vocab_dict:
            return (
                BaseVocabulary(
                    config.characters.vocab_dict,
                    config.characters.pad,
                    config.characters.blank,
                    config.characters.bos,
                    config.characters.eos,
                ),
                config,
            )
        return BaseVocabulary(**kwargs), config

    @property
    def num_chars(self):
        """Return number of tokens in the vocabulary."""
        return len(self._vocab)

    def char_to_id(self, char: str) -> int:
        """Map a character to an token ID."""
        try:
            return self._char_to_id[char]
        except KeyError as e:
            raise KeyError(f" [!] {repr(char)} is not in the vocabulary.") from e

    def id_to_char(self, idx: int) -> str:
        """Map an token ID to a character."""
        return self._id_to_char[idx]


class BaseCharacters:
    """🐸BaseCharacters class

        Every new character class should inherit from this.

        Characters are oredered as follows ```[PAD, EOS, BOS, BLANK, CHARACTERS, PUNCTUATIONS]```.

        If you need a custom order, you need to define inherit from this class and override the ```_create_vocab``` method.

        Args:
            characters (str):
                Main set of characters to be used in the vocabulary.

            punctuations (str):
                Characters to be treated as punctuation.

            pad (str):
                Special padding character that would be ignored by the model.

            eos (str):
                End of the sentence character.

            bos (str):
                Beginning of the sentence character.

            blank (str):
                Optional character used between characters by some models for better prosody.

            is_unique (bool):
                Remove duplicates from the provided characters. Defaults to True.
    el
            is_sorted (bool):
                Sort the characters in alphabetical order. Only applies to `self.characters`. Defaults to True.
    """

    def __init__(
        self,
        characters: str = None,
        punctuations: str = None,
        pad: str = None,
        eos: str = None,
        bos: str = None,
        blank: str = None,
        is_unique: bool = False,
        is_sorted: bool = True,
    ) -> None:
        self._characters = characters
        self._punctuations = punctuations
        self._pad = pad
        self._eos = eos
        self._bos = bos
        self._blank = blank
        self.is_unique = is_unique
        self.is_sorted = is_sorted
        self._create_vocab()

    @property
    def pad_id(self) -> int:
        return self.char_to_id(self.pad) if self.pad else len(self.vocab)

    @property
    def blank_id(self) -> int:
        return self.char_to_id(self.blank) if self.blank else len(self.vocab)

    @property
    def characters(self):
        return self._characters

    @characters.setter
    def characters(self, characters):
        self._characters = characters
        self._create_vocab()

    @property
    def punctuations(self):
        return self._punctuations

    @punctuations.setter
    def punctuations(self, punctuations):
        self._punctuations = punctuations
        self._create_vocab()

    @property
    def pad(self):
        return self._pad

    @pad.setter
    def pad(self, pad):
        self._pad = pad
        self._create_vocab()

    @property
    def eos(self):
        return self._eos

    @eos.setter
    def eos(self, eos):
        self._eos = eos
        self._create_vocab()

    @property
    def bos(self):
        return self._bos

    @bos.setter
    def bos(self, bos):
        self._bos = bos
        self._create_vocab()

    @property
    def blank(self):
        return self._blank

    @blank.setter
    def blank(self, blank):
        self._blank = blank
        self._create_vocab()

    @property
    def vocab(self):
        return self._vocab

    @vocab.setter
    def vocab(self, vocab):
        self._vocab = vocab
        self._char_to_id = {char: idx for idx, char in enumerate(self.vocab)}
        self._id_to_char = {
            idx: char for idx, char in enumerate(self.vocab)  # pylint: disable=unnecessary-comprehension
        }

    @property
    def num_chars(self):
        return len(self._vocab)

    def _create_vocab(self):
        _vocab = self._characters
        if self.is_unique:
            _vocab = list(set(_vocab))
        if self.is_sorted:
            _vocab = sorted(_vocab)
        _vocab = list(_vocab)
        _vocab = [self._blank] + _vocab if self._blank is not None and len(self._blank) > 0 else _vocab
        _vocab = [self._bos] + _vocab if self._bos is not None and len(self._bos) > 0 else _vocab
        _vocab = [self._eos] + _vocab if self._eos is not None and len(self._eos) > 0 else _vocab
        _vocab = [self._pad] + _vocab if self._pad is not None and len(self._pad) > 0 else _vocab
        self.vocab = _vocab + list(self._punctuations)
        if self.is_unique:
            duplicates = {x for x in self.vocab if self.vocab.count(x) > 1}
            assert (
                len(self.vocab) == len(self._char_to_id) == len(self._id_to_char)
            ), f" [!] There are duplicate characters in the character set. {duplicates}"

    def char_to_id(self, char: str) -> int:
        try:
            return self._char_to_id[char]
        except KeyError as e:
            raise KeyError(f" [!] {repr(char)} is not in the vocabulary.") from e

    def id_to_char(self, idx: int) -> str:
        return self._id_to_char[idx]

    def print_log(self, level: int = 0):
        """
        Prints the vocabulary in a nice format.
        """
        indent = "\t" * level
        print(f"{indent}| > Characters: {self._characters}")
        print(f"{indent}| > Punctuations: {self._punctuations}")
        print(f"{indent}| > Pad: {self._pad}")
        print(f"{indent}| > EOS: {self._eos}")
        print(f"{indent}| > BOS: {self._bos}")
        print(f"{indent}| > Blank: {self._blank}")
        print(f"{indent}| > Vocab: {self.vocab}")
        print(f"{indent}| > Num chars: {self.num_chars}")

    @staticmethod
    def init_from_config(config: "Coqpit"):  # pylint: disable=unused-argument
        """Init your character class from a config.

        Implement this method for your subclass.
        """
        # use character set from config
        if config.characters is not None:
            return BaseCharacters(**config.characters), config
        # return default character set
        characters = BaseCharacters()
        new_config = replace(config, characters=characters.to_config())
        return characters, new_config

    def to_config(self) -> "CharactersConfig":
        return CharactersConfig(
            characters=self._characters,
            punctuations=self._punctuations,
            pad=self._pad,
            eos=self._eos,
            bos=self._bos,
            blank=self._blank,
            is_unique=self.is_unique,
            is_sorted=self.is_sorted,
        )


class IPAPhonemes(BaseCharacters):
    """🐸IPAPhonemes class to manage `TTS.tts` model vocabulary

    Intended to be used with models using IPAPhonemes as input.
    It uses system defaults for the undefined class arguments.

    Args:
        characters (str):
            Main set of case-sensitive characters to be used in the vocabulary. Defaults to `_phonemes`.

        punctuations (str):
            Characters to be treated as punctuation. Defaults to `_punctuations`.

        pad (str):
            Special padding character that would be ignored by the model. Defaults to `_pad`.

        eos (str):
            End of the sentence character. Defaults to `_eos`.

        bos (str):
            Beginning of the sentence character. Defaults to `_bos`.

        blank (str):
            Optional character used between characters by some models for better prosody. Defaults to `_blank`.

        is_unique (bool):
            Remove duplicates from the provided characters. Defaults to True.

        is_sorted (bool):
            Sort the characters in alphabetical order. Defaults to True.
    """

    def __init__(
        self,
        characters: str = _phonemes,
        punctuations: str = _punctuations,
        pad: str = _pad,
        eos: str = _eos,
        bos: str = _bos,
        blank: str = _blank,
        is_unique: bool = False,
        is_sorted: bool = True,
    ) -> None:
        super().__init__(characters, punctuations, pad, eos, bos, blank, is_unique, is_sorted)

    @staticmethod
    def init_from_config(config: "Coqpit"):
        """Init a IPAPhonemes object from a model config

        If characters are not defined in the config, it will be set to the default characters and the config
        will be updated.
        """
        # band-aid for compatibility with old models
        if "characters" in config and config.characters is not None:
            if "phonemes" in config.characters and config.characters.phonemes is not None:
                config.characters["characters"] = config.characters["phonemes"]
            return (
                IPAPhonemes(
                    characters=config.characters["characters"],
                    punctuations=config.characters["punctuations"],
                    pad=config.characters["pad"],
                    eos=config.characters["eos"],
                    bos=config.characters["bos"],
                    blank=config.characters["blank"],
                    is_unique=config.characters["is_unique"],
                    is_sorted=config.characters["is_sorted"],
                ),
                config,
            )
        # use character set from config
        if config.characters is not None:
            return IPAPhonemes(**config.characters), config
        # return default character set
        characters = IPAPhonemes()
        new_config = replace(config, characters=characters.to_config())
        return characters, new_config


class Graphemes(BaseCharacters):
    """🐸Graphemes class to manage `TTS.tts` model vocabulary

    Intended to be used with models using graphemes as input.
    It uses system defaults for the undefined class arguments.

    Args:
        characters (str):
            Main set of case-sensitive characters to be used in the vocabulary. Defaults to `_characters`.

        punctuations (str):
            Characters to be treated as punctuation. Defaults to `_punctuations`.

        pad (str):
            Special padding character that would be ignored by the model. Defaults to `_pad`.

        eos (str):
            End of the sentence character. Defaults to `_eos`.

        bos (str):
            Beginning of the sentence character. Defaults to `_bos`.

        is_unique (bool):
            Remove duplicates from the provided characters. Defaults to True.

        is_sorted (bool):
            Sort the characters in alphabetical order. Defaults to True.
    """

    def __init__(
        self,
        characters: str = _characters,
        punctuations: str = _punctuations,
        pad: str = _pad,
        eos: str = _eos,
        bos: str = _bos,
        blank: str = _blank,
        is_unique: bool = False,
        is_sorted: bool = True,
    ) -> None:
        super().__init__(characters, punctuations, pad, eos, bos, blank, is_unique, is_sorted)

    @staticmethod
    def init_from_config(config: "Coqpit"):
        """Init a Graphemes object from a model config

        If characters are not defined in the config, it will be set to the default characters and the config
        will be updated.
        """
        if config.characters is not None:
            # band-aid for compatibility with old models
            if "phonemes" in config.characters:
                return (
                    Graphemes(
                        characters=config.characters["characters"],
                        punctuations=config.characters["punctuations"],
                        pad=config.characters["pad"],
                        eos=config.characters["eos"],
                        bos=config.characters["bos"],
                        blank=config.characters["blank"],
                        is_unique=config.characters["is_unique"],
                        is_sorted=config.characters["is_sorted"],
                    ),
                    config,
                )
            return Graphemes(**config.characters), config
        characters = Graphemes()
        new_config = replace(config, characters=characters.to_config())
        return characters, new_config


if __name__ == "__main__":
    gr = Graphemes()
    ph = IPAPhonemes()
    gr.print_log()
    ph.print_log()


================================================
FILE: TTS/tts/utils/text/chinese_mandarin/__init__.py
================================================


================================================
FILE: TTS/tts/utils/text/chinese_mandarin/numbers.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# Licensed under WTFPL or the Unlicense or CC0.
# This uses Python 3, but it's easy to port to Python 2 by changing
# strings to u'xx'.

import itertools
import re


def _num2chinese(num: str, big=False, simp=True, o=False, twoalt=False) -> str:
    """Convert numerical arabic numbers (0->9) to chinese hanzi numbers (〇 -> 九)

    Args:
        num (str): arabic number to convert
        big (bool, optional): use financial characters. Defaults to False.
        simp (bool, optional): use simplified characters instead of tradictional characters. Defaults to True.
        o (bool, optional): use 〇 for 'zero'. Defaults to False.
        twoalt (bool, optional): use 两/兩 for 'two' when appropriate. Defaults to False.

    Raises:
        ValueError: if number is more than 1e48
        ValueError: if 'e' exposent in number

    Returns:
        str: converted number as hanzi characters
    """

    # check num first
    nd = str(num)
    if abs(float(nd)) >= 1e48:
        raise ValueError("number out of range")
    if "e" in nd:
        raise ValueError("scientific notation is not supported")
    c_symbol = "正负点" if simp else "正負點"
    if o:  # formal
        twoalt = False
    if big:
        c_basic = "零壹贰叁肆伍陆柒捌玖" if simp else "零壹貳參肆伍陸柒捌玖"
        c_unit1 = "拾佰仟"
        c_twoalt = "贰" if simp else "貳"
    else:
        c_basic = "〇一二三四五六七八九" if o else "零一二三四五六七八九"
        c_unit1 = "十百千"
        if twoalt:
            c_twoalt = "两" if simp else "兩"
        else:
            c_twoalt = "二"
    c_unit2 = "万亿兆京垓秭穰沟涧正载" if simp else "萬億兆京垓秭穰溝澗正載"
    revuniq = lambda l: "".join(k for k, g in itertools.groupby(reversed(l)))
    nd = str(num)
    result = []
    if nd[0] == "+":
        result.append(c_symbol[0])
    elif nd[0] == "-":
        result.append(c_symbol[1])
    if "." in nd:
        integer, remainder = nd.lstrip("+-").split(".")
    else:
        integer, remainder = nd.lstrip("+-"), None
    if int(integer):
        splitted = [integer[max(i - 4, 0) : i] for i in range(len(integer), 0, -4)]
        intresult = []
        for nu, unit in enumerate(splitted):
            # special cases
            if int(unit) == 0:  # 0000
                intresult.append(c_basic[0])
                continue
            if nu > 0 and int(unit) == 2:  # 0002
                intresult.append(c_twoalt + c_unit2[nu - 1])
                continue
            ulist = []
            unit = unit.zfill(4)
            for nc, ch in enumerate(reversed(unit)):
                if ch == "0":
                    if ulist:  # ???0
                        ulist.append(c_basic[0])
                elif nc == 0:
                    ulist.append(c_basic[int(ch)])
                elif nc == 1 and ch == "1" and unit[1] == "0":
                    # special case for tens
                    # edit the 'elif' if you don't like
                    # 十四, 三千零十四, 三千三百一十四
                    ulist.append(c_unit1[0])
                elif nc > 1 and ch == "2":
                    ulist.append(c_twoalt + c_unit1[nc - 1])
                else:
                    ulist.append(c_basic[int(ch)] + c_unit1[nc - 1])
            ustr = revuniq(ulist)
            if nu == 0:
                intresult.append(ustr)
            else:
                intresult.append(ustr + c_unit2[nu - 1])
        result.append(revuniq(intresult).strip(c_basic[0]))
    else:
        result.append(c_basic[0])
    if remainder:
        result.append(c_symbol[2])
        result.append("".join(c_basic[int(ch)] for ch in remainder))
    return "".join(result)


def _number_replace(match) -> str:
    """function to apply in a match, transform all numbers in a match by chinese characters

    Args:
        match (re.Match): numbers regex matches

    Returns:
        str: replaced characters for the numbers
    """
    match_str: str = match.group()
    return _num2chinese(match_str)


def replace_numbers_to_characters_in_text(text: str) -> str:
    """Replace all arabic numbers in a text by their equivalent in chinese characters (simplified)

    Args:
        text (str): input text to transform

    Returns:
        str: output text
    """
    text = re.sub(r"[0-9]+", _number_replace, text)
    return text


================================================
FILE: TTS/tts/utils/text/chinese_mandarin/phonemizer.py
================================================
from typing import List

import jieba
import pypinyin

from .pinyinToPhonemes import PINYIN_DICT


def _chinese_character_to_pinyin(text: str) -> List[str]:
    pinyins = pypinyin.pinyin(text, style=pypinyin.Style.TONE3, heteronym=False, neutral_tone_with_five=True)
    pinyins_flat_list = [item for sublist in pinyins for item in sublist]
    return pinyins_flat_list


def _chinese_pinyin_to_phoneme(pinyin: str) -> str:
    segment = pinyin[:-1]
    tone = pinyin[-1]
    phoneme = PINYIN_DICT.get(segment, [""])[0]
    return phoneme + tone


def chinese_text_to_phonemes(text: str, seperator: str = "|") -> str:
    tokenized_text = jieba.cut(text, HMM=False)
    tokenized_text = " ".join(tokenized_text)
    pinyined_text: List[str] = _chinese_character_to_pinyin(tokenized_text)

    results: List[str] = []

    for token in pinyined_text:
        if token[-1] in "12345":  # TODO transform to is_pinyin()
            pinyin_phonemes = _chinese_pinyin_to_phoneme(token)

            results += list(pinyin_phonemes)
        else:  # is ponctuation or other
            results += list(token)

    return seperator.join(results)


================================================
FILE: TTS/tts/utils/text/chinese_mandarin/pinyinToPhonemes.py
================================================
PINYIN_DICT = {
    "a": ["a"],
    "ai": ["ai"],
    "an": ["an"],
    "ang": ["ɑŋ"],
    "ao": ["aʌ"],
    "ba": ["ba"],
    "bai": ["bai"],
    "ban": ["ban"],
    "bang": ["bɑŋ"],
    "bao": ["baʌ"],
    # "be": ["be"], doesnt exist
    "bei": ["bɛi"],
    "ben": ["bœn"],
    "beng": ["bɵŋ"],
    "bi": ["bi"],
    "bian": ["biɛn"],
    "biao": ["biaʌ"],
    "bie": ["bie"],
    "bin": ["bin"],
    "bing": ["bɨŋ"],
    "bo": ["bo"],
    "bu": ["bu"],
    "ca": ["tsa"],
    "cai": ["tsai"],
    "can": ["tsan"],
    "cang": ["tsɑŋ"],
    "cao": ["tsaʌ"],
    "ce": ["tsø"],
    "cen": ["tsœn"],
    "ceng": ["tsɵŋ"],
    "cha": ["ʈʂa"],
    "chai": ["ʈʂai"],
    "chan": ["ʈʂan"],
    "chang": ["ʈʂɑŋ"],
    "chao": ["ʈʂaʌ"],
    "che": ["ʈʂø"],
    "chen": ["ʈʂœn"],
    "cheng": ["ʈʂɵŋ"],
    "chi": ["ʈʂʏ"],
    "chong": ["ʈʂoŋ"],
    "chou": ["ʈʂou"],
    "chu": ["ʈʂu"],
    "chua": ["ʈʂua"],
    "chuai": ["ʈʂuai"],
    "chuan": ["ʈʂuan"],
    "chuang": ["ʈʂuɑŋ"],
    "chui": ["ʈʂuei"],
    "chun": ["ʈʂun"],
    "chuo": ["ʈʂuo"],
    "ci": ["tsɪ"],
    "cong": ["tsoŋ"],
    "cou": ["tsou"],
    "cu": ["tsu"],
    "cuan": ["tsuan"],
    "cui": ["tsuei"],
    "cun": ["tsun"],
    "cuo": ["tsuo"],
    "da": ["da"],
    "dai": ["dai"],
    "dan": ["dan"],
    "dang": ["dɑŋ"],
    "dao": ["daʌ"],
    "de": ["dø"],
    "dei": ["dei"],
    # "den": ["dœn"],
    "deng": ["dɵŋ"],
    "di": ["di"],
    "dia": ["dia"],
    "dian": ["diɛn"],
    "diao": ["diaʌ"],
    "die": ["die"],
    "ding": ["dɨŋ"],
    "diu": ["dio"],
    "dong": ["doŋ"],
    "dou": ["dou"],
    "du": ["du"],
    "duan": ["duan"],
    "dui": ["duei"],
    "dun": ["dun"],
    "duo": ["duo"],
    "e": ["ø"],
    "ei": ["ei"],
    "en": ["œn"],
    # "ng": ["œn"],
    # "eng": ["ɵŋ"],
    "er": ["er"],
    "fa": ["fa"],
    "fan": ["fan"],
    "fang": ["fɑŋ"],
    "fei": ["fei"],
    "fen": ["fœn"],
    "feng": ["fɵŋ"],
    "fo": ["fo"],
    "fou": ["fou"],
    "fu": ["fu"],
    "ga": ["ga"],
    "gai": ["gai"],
    "gan": ["gan"],
    "gang": ["gɑŋ"],
    "gao": ["gaʌ"],
    "ge": ["gø"],
    "gei": ["gei"],
    "gen": ["gœn"],
    "geng": ["gɵŋ"],
    "gong": ["goŋ"],
    "gou": ["gou"],
    "gu": ["gu"],
    "gua": ["gua"],
    "guai": ["guai"],
    "guan": ["guan"],
    "guang": ["guɑŋ"],
    "gui": ["guei"],
    "gun": ["gun"],
    "guo": ["guo"],
    "ha": ["xa"],
    "hai": ["xai"],
    "han": ["xan"],
    "hang": ["xɑŋ"],
    "hao": ["xaʌ"],
    "he": ["xø"],
    "hei": ["xei"],
    "hen": ["xœn"],
    "heng": ["xɵŋ"],
    "hong": ["xoŋ"],
    "hou": ["xou"],
    "hu": ["xu"],
    "hua": ["xua"],
    "huai": ["xuai"],
    "huan": ["xuan"],
    "huang": ["xuɑŋ"],
    "hui": ["xuei"],
    "hun": ["xun"],
    "huo": ["xuo"],
    "ji": ["dʑi"],
    "jia": ["dʑia"],
    "jian": ["dʑiɛn"],
    "jiang": ["dʑiɑŋ"],
    "jiao": ["dʑiaʌ"],
    "jie": ["dʑie"],
    "jin": ["dʑin"],
    "jing": ["dʑɨŋ"],
    "jiong": ["dʑioŋ"],
    "jiu": ["dʑio"],
    "ju": ["dʑy"],
    "juan": ["dʑyɛn"],
    "jue": ["dʑye"],
    "jun": ["dʑyn"],
    "ka": ["ka"],
    "kai": ["kai"],
    "kan": ["kan"],
    "kang": ["kɑŋ"],
    "kao": ["kaʌ"],
    "ke": ["kø"],
    "kei": ["kei"],
    "ken": ["kœn"],
    "keng": ["kɵŋ"],
    "kong": ["koŋ"],
    "kou": ["kou"],
    "ku": ["ku"],
    "kua": ["kua"],
    "kuai": ["kuai"],
    "kuan": ["kuan"],
    "kuang": ["kuɑŋ"],
    "kui": ["kuei"],
    "kun": ["kun"],
    "kuo": ["kuo"],
    "la": ["la"],
    "lai": ["lai"],
    "lan": ["lan"],
    "lang": ["lɑŋ"],
    "lao": ["laʌ"],
    "le": ["lø"],
    "lei": ["lei"],
    "leng": ["lɵŋ"],
    "li": ["li"],
    "lia": ["lia"],
    "lian": ["liɛn"],
    "liang": ["liɑŋ"],
    "liao": ["liaʌ"],
    "lie": ["lie"],
    "lin": ["lin"],
    "ling": ["lɨŋ"],
    "liu": ["lio"],
    "lo": ["lo"],
    "long": ["loŋ"],
    "lou": ["lou"],
    "lu": ["lu"],
    "lv": ["ly"],
    "luan": ["luan"],
    "lve": ["lye"],
    "lue": ["lue"],
    "lun": ["lun"],
    "luo": ["luo"],
    "ma": ["ma"],
    "mai": ["mai"],
    "man": ["man"],
    "mang": ["mɑŋ"],
    "mao": ["maʌ"],
    "me": ["mø"],
    "mei": ["mei"],
    "men": ["mœn"],
    "meng": ["mɵŋ"],
    "mi": ["mi"],
    "mian": ["miɛn"],
    "miao": ["miaʌ"],
    "mie": ["mie"],
    "min": ["min"],
    "ming": ["mɨŋ"],
    "miu": ["mio"],
    "mo": ["mo"],
    "mou": ["mou"],
    "mu": ["mu"],
    "na": ["na"],
    "nai": ["nai"],
    "nan": ["nan"],
    "nang": ["nɑŋ"],
    "nao": ["naʌ"],
    "ne": ["nø"],
    "nei": ["nei"],
    "nen": ["nœn"],
    "neng": ["nɵŋ"],
    "ni": ["ni"],
    "nia": ["nia"],
    "nian": ["niɛn"],
    "niang": ["niɑŋ"],
    "niao": ["niaʌ"],
    "nie": ["nie"],
    "nin": ["nin"],
    "ning": ["nɨŋ"],
    "niu": ["nio"],
    "nong": ["noŋ"],
    "nou": ["nou"],
    "nu": ["nu"],
    "nv": ["ny"],
    "nuan": ["nuan"],
    "nve": ["nye"],
    "nue": ["nye"],
    "nuo": ["nuo"],
    "o": ["o"],
    "ou": ["ou"],
    "pa": ["pa"],
    "pai": ["pai"],
    "pan": ["pan"],
    "pang": ["pɑŋ"],
    "pao": ["paʌ"],
    "pe": ["pø"],
    "pei": ["pei"],
    "pen": ["pœn"],
    "peng": ["pɵŋ"],
    "pi": ["pi"],
    "pian": ["piɛn"],
    "piao": ["piaʌ"],
    "pie": ["pie"],
    "pin": ["pin"],
    "ping": ["pɨŋ"],
    "po": ["po"],
    "pou": ["pou"],
    "pu": ["pu"],
    "qi": ["tɕi"],
    "qia": ["tɕia"],
    "qian": ["tɕiɛn"],
    "qiang": ["tɕiɑŋ"],
    "qiao": ["tɕiaʌ"],
    "qie": ["tɕie"],
    "qin": ["tɕin"],
    "qing": ["tɕɨŋ"],
    "qiong": ["tɕioŋ"],
    "qiu": ["tɕio"],
    "qu": ["tɕy"],
    "quan": ["tɕyɛn"],
    "que": ["tɕye"],
    "qun": ["tɕyn"],
    "ran": ["ʐan"],
    "rang": ["ʐɑŋ"],
    "rao": ["ʐaʌ"],
    "re": ["ʐø"],
    "ren": ["ʐœn"],
    "reng": ["ʐɵŋ"],
    "ri": ["ʐʏ"],
    "rong": ["ʐoŋ"],
    "rou": ["ʐou"],
    "ru": ["ʐu"],
    "rua": ["ʐua"],
    "ruan": ["ʐuan"],
    "rui": ["ʐuei"],
    "run": ["ʐun"],
    "ruo": ["ʐuo"],
    "sa": ["sa"],
    "sai": ["sai"],
    "san": ["san"],
    "sang": ["sɑŋ"],
    "sao": ["saʌ"],
    "se": ["sø"],
    "sen": ["sœn"],
    "seng": ["sɵŋ"],
    "sha": ["ʂa"],
    "shai": ["ʂai"],
    "shan": ["ʂan"],
    "shang": ["ʂɑŋ"],
    "shao": ["ʂaʌ"],
    "she": ["ʂø"],
    "shei": ["ʂei"],
    "shen": ["ʂœn"],
    "sheng": ["ʂɵŋ"],
    "shi": ["ʂʏ"],
    "shou": ["ʂou"],
    "shu": ["ʂu"],
    "shua": ["ʂua"],
    "shuai": ["ʂuai"],
    "shuan": ["ʂuan"],
    "shuang": ["ʂuɑŋ"],
    "shui": ["ʂuei"],
    "shun": ["ʂun"],
    "shuo": ["ʂuo"],
    "si": ["sɪ"],
    "song": ["soŋ"],
    "sou": ["sou"],
    "su": ["su"],
    "suan": ["suan"],
    "sui": ["suei"],
    "sun": ["sun"],
    "suo": ["suo"],
    "ta": ["ta"],
    "tai": ["tai"],
    "tan": ["tan"],
    "tang": ["tɑŋ"],
    "tao": ["taʌ"],
    "te": ["tø"],
    "tei": ["tei"],
    "teng": ["tɵŋ"],
    "ti": ["ti"],
    "tian": ["tiɛn"],
    "tiao": ["tiaʌ"],
    "tie": ["tie"],
    "ting": ["tɨŋ"],
    "tong": ["toŋ"],
    "tou": ["tou"],
    "tu": ["tu"],
    "tuan": ["tuan"],
    "tui": ["tuei"],
    "tun": ["tun"],
    "tuo": ["tuo"],
    "wa": ["wa"],
    "wai": ["wai"],
    "wan": ["wan"],
    "wang": ["wɑŋ"],
    "wei": ["wei"],
    "wen": ["wœn"],
    "weng": ["wɵŋ"],
    "wo": ["wo"],
    "wu": ["wu"],
    "xi": ["ɕi"],
    "xia": ["ɕia"],
    "xian": ["ɕiɛn"],
    "xiang": ["ɕiɑŋ"],
    "xiao": ["ɕiaʌ"],
    "xie": ["ɕie"],
    "xin": ["ɕin"],
    "xing": ["ɕɨŋ"],
    "xiong": ["ɕioŋ"],
    "xiu": ["ɕio"],
    "xu": ["ɕy"],
    "xuan": ["ɕyɛn"],
    "xue": ["ɕye"],
    "xun": ["ɕyn"],
    "ya": ["ia"],
    "yan": ["iɛn"],
    "yang": ["iɑŋ"],
    "yao": ["iaʌ"],
    "ye": ["ie"],
    "yi": ["i"],
    "yin": ["in"],
    "ying": ["ɨŋ"],
    "yo": ["io"],
    "yong": ["ioŋ"],
    "you": ["io"],
    "yu": ["y"],
    "yuan": ["yɛn"],
    "yue": ["ye"],
    "yun": ["yn"],
    "za": ["dza"],
    "zai": ["dzai"],
    "zan": ["dzan"],
    "zang": ["dzɑŋ"],
    "zao": ["dzaʌ"],
    "ze": ["dzø"],
    "zei": ["dzei"],
    "zen": ["dzœn"],
    "zeng": ["dzɵŋ"],
    "zha": ["dʒa"],
    "zhai": ["dʒai"],
    "zhan": ["dʒan"],
    "zhang": ["dʒɑŋ"],
    "zhao": ["dʒaʌ"],
    "zhe": ["dʒø"],
    # "zhei": ["dʒei"], it doesn't exist
    "zhen": ["dʒœn"],
    "zheng": ["dʒɵŋ"],
    "zhi": ["dʒʏ"],
    "zhong": ["dʒoŋ"],
    "zhou": ["dʒou"],
    "zhu": ["dʒu"],
    "zhua": ["dʒua"],
    "zhuai": ["dʒuai"],
    "zhuan": ["dʒuan"],
    "zhuang": ["dʒuɑŋ"],
    "zhui": ["dʒuei"],
    "zhun": ["dʒun"],
    "zhuo": ["dʒuo"],
    "zi": ["dzɪ"],
    "zong": ["dzoŋ"],
    "zou": ["dzou"],
    "zu": ["dzu"],
    "zuan": ["dzuan"],
    "zui": ["dzuei"],
    "zun": ["dzun"],
    "zuo": ["dzuo"],
}


================================================
FILE: TTS/tts/utils/text/cleaners.py
================================================
"""Set of default text cleaners"""
# TODO: pick the cleaner for languages dynamically

import re

from anyascii import anyascii

from TTS.tts.utils.text.chinese_mandarin.numbers import replace_numbers_to_characters_in_text

from .english.abbreviations import abbreviations_en
from .english.number_norm import normalize_numbers as en_normalize_numbers
from .english.time_norm import expand_time_english
from .french.abbreviations import abbreviations_fr

# Regular expression matching whitespace:
_whitespace_re = re.compile(r"\s+")


def expand_abbreviations(text, lang="en"):
    if lang == "en":
        _abbreviations = abbreviations_en
    elif lang == "fr":
        _abbreviations = abbreviations_fr
    for regex, replacement in _abbreviations:
        text = re.sub(regex, replacement, text)
    return text


def lowercase(text):
    return text.lower()


def collapse_whitespace(text):
    return re.sub(_whitespace_re, " ", text).strip()


def convert_to_ascii(text):
    return anyascii(text)


def remove_aux_symbols(text):
    text = re.sub(r"[\<\>\(\)\[\]\"]+", "", text)
    return text


def replace_symbols(text, lang="en"):
    """Replace symbols based on the lenguage tag.

    Args:
      text:
       Input text.
      lang:
        Lenguage identifier. ex: "en", "fr", "pt", "ca".

    Returns:
      The modified text
      example:
        input args:
            text: "si l'avi cau, diguem-ho"
            lang: "ca"
        Output:
            text: "si lavi cau, diguemho"
    """
    text = text.replace(";", ",")
    text = text.replace("-", " ") if lang != "ca" else text.replace("-", "")
    text = text.replace(":", ",")
    if lang == "en":
        text = text.replace("&", " and ")
    elif lang == "fr":
        text = text.replace("&", " et ")
    elif lang == "pt":
        text = text.replace("&", " e ")
    elif lang == "ca":
        text = text.replace("&", " i ")
        text = text.replace("'", "")
    return text


def basic_cleaners(text):
    """Basic pipeline that lowercases and collapses whitespace without transliteration."""
    text = lowercase(text)
    text = collapse_whitespace(text)
    return text


def transliteration_cleaners(text):
    """Pipeline for non-English text that transliterates to ASCII."""
    # text = convert_to_ascii(text)
    text = lowercase(text)
    text = collapse_whitespace(text)
    return text


def basic_german_cleaners(text):
    """Pipeline for German text"""
    text = lowercase(text)
    text = collapse_whitespace(text)
    return text


# TODO: elaborate it
def basic_turkish_cleaners(text):
    """Pipeline for Turkish text"""
    text = text.replace("I", "ı")
    text = lowercase(text)
    text = collapse_whitespace(text)
    return text


def english_cleaners(text):
    """Pipeline for English text, including number and abbreviation expansion."""
    # text = convert_to_ascii(text)
    text = lowercase(text)
    text = expand_time_english(text)
    text = en_normalize_numbers(text)
    text = expand_abbreviations(text)
    text = replace_symbols(text)
    text = remove_aux_symbols(text)
    text = collapse_whitespace(text)
    return text


def phoneme_cleaners(text):
    """Pipeline for phonemes mode, including number and abbreviation expansion."""
    text = en_normalize_numbers(text)
    text = expand_abbreviations(text)
    text = replace_symbols(text)
    text = remove_aux_symbols(text)
    text = collapse_whitespace(text)
    return text


def french_cleaners(text):
    """Pipeline for French text. There is no need to expand numbers, phonemizer already does that"""
    text = expand_abbreviations(text, lang="fr")
    text = lowercase(text)
    text = replace_symbols(text, lang="fr")
    text = remove_aux_symbols(text)
    text = collapse_whitespace(text)
    return text


def portuguese_cleaners(text):
    """Basic pipeline for Portuguese text. There is no need to expand abbreviation and
    numbers, phonemizer already does that"""
    text = lowercase(text)
    text = replace_symbols(text, lang="pt")
    text = remove_aux_symbols(text)
    text = collapse_whitespace(text)
    return text


def chinese_mandarin_cleaners(text: str) -> str:
    """Basic pipeline for chinese"""
    text = replace_numbers_to_characters_in_text(text)
    return text


def multilingual_cleaners(text):
    """Pipeline for multilingual text"""
    text = lowercase(text)
    text = replace_symbols(text, lang=None)
    text = remove_aux_symbols(text)
    text = collapse_whitespace(text)
    return text


================================================
FILE: TTS/tts/utils/text/cmudict.py
================================================
# -*- coding: utf-8 -*-

import re

VALID_SYMBOLS = [
    "AA",
    "AA0",
    "AA1",
    "AA2",
    "AE",
    "AE0",
    "AE1",
    "AE2",
    "AH",
    "AH0",
    "AH1",
    "AH2",
    "AO",
    "AO0",
    "AO1",
    "AO2",
    "AW",
    "AW0",
    "AW1",
    "AW2",
    "AY",
    "AY0",
    "AY1",
    "AY2",
    "B",
    "CH",
    "D",
    "DH",
    "EH",
    "EH0",
    "EH1",
    "EH2",
    "ER",
    "ER0",
    "ER1",
    "ER2",
    "EY",
    "EY0",
    "EY1",
    "EY2",
    "F",
    "G",
    "HH",
    "IH",
    "IH0",
    "IH1",
    "IH2",
    "IY",
    "IY0",
    "IY1",
    "IY2",
    "JH",
    "K",
    "L",
    "M",
    "N",
    "NG",
    "OW",
    "OW0",
    "OW1",
    "OW2",
    "OY",
    "OY0",
    "OY1",
    "OY2",
    "P",
    "R",
    "S",
    "SH",
    "T",
    "TH",
    "UH",
    "UH0",
    "UH1",
    "UH2",
    "UW",
    "UW0",
    "UW1",
    "UW2",
    "V",
    "W",
    "Y",
    "Z",
    "ZH",
]


class CMUDict:
    """Thin wrapper around CMUDict data. http://www.speech.cs.cmu.edu/cgi-bin/cmudict"""

    def __init__(self, file_or_path, keep_ambiguous=True):
        if isinstance(file_or_path, str):
            with open(file_or_path, encoding="latin-1") as f:
                entries = _parse_cmudict(f)
        else:
            entries = _parse_cmudict(file_or_path)
        if not keep_ambiguous:
            entries = {word: pron for word, pron in entries.items() if len(pron) == 1}
        self._entries = entries

    def __len__(self):
        return len(self._entries)

    def lookup(self, word):
        """Returns list of ARPAbet pronunciations of the given word."""
        return self._entries.get(word.upper())

    @staticmethod
    def get_arpabet(word, cmudict, punctuation_symbols):
        first_symbol, last_symbol = "", ""
        if word and word[0] in punctuation_symbols:
            first_symbol = word[0]
            word = word[1:]
        if word and word[-1] in punctuation_symbols:
            last_symbol = word[-1]
            word = word[:-1]
        arpabet = cmudict.lookup(word)
        if arpabet is not None:
            return first_symbol + "{%s}" % arpabet[0] + last_symbol
        return first_symbol + word + last_symbol


_alt_re = re.compile(r"\([0-9]+\)")


def _parse_cmudict(file):
    cmudict = {}
    for line in file:
        if line and (line[0] >= "A" and line[0] <= "Z" or line[0] == "'"):
            parts = line.split("  ")
            word = re.sub(_alt_re, "", parts[0])
            pronunciation = _get_pronunciation(parts[1])
            if pronunciation:
                if word in cmudict:
                    cmudict[word].append(pronunciation)
                else:
                    cmudict[word] = [pronunciation]
    return cmudict


def _get_pronunciation(s):
    parts = s.strip().split(" ")
    for part in parts:
        if part not in VALID_SYMBOLS:
            return None
    return " ".join(parts)


================================================
FILE: TTS/tts/utils/text/english/__init__.py
================================================


================================================
FILE: TTS/tts/utils/text/english/abbreviations.py
================================================
import re

# List of (regular expression, replacement) pairs for abbreviations in english:
abbreviations_en = [
    (re.compile("\\b%s\\." % x[0], re.IGNORECASE), x[1])
    for x in [
        ("mrs", "misess"),
        ("mr", "mister"),
        ("dr", "doctor"),
        ("st", "saint"),
        ("co", "company"),
        ("jr", "junior"),
        ("maj", "major"),
        ("gen", "general"),
        ("drs", "doctors"),
        ("rev", "reverend"),
        ("lt", "lieutenant"),
        ("hon", "honorable"),
        ("sgt", "sergeant"),
        ("capt", "captain"),
        ("esq", "esquire"),
        ("ltd", "limited"),
        ("col", "colonel"),
        ("ft", "fort"),
    ]
]


================================================
FILE: TTS/tts/utils/text/english/number_norm.py
================================================
""" from https://github.com/keithito/tacotron """

import re
from typing import Dict

import inflect

_inflect = inflect.engine()
_comma_number_re = re.compile(r"([0-9][0-9\,]+[0-9])")
_decimal_number_re = re.compile(r"([0-9]+\.[0-9]+)")
_currency_re = re.compile(r"(£|\$|¥)([0-9\,\.]*[0-9]+)")
_ordinal_re = re.compile(r"[0-9]+(st|nd|rd|th)")
_number_re = re.compile(r"-?[0-9]+")


def _remove_commas(m):
    return m.group(1).replace(",", "")


def _expand_decimal_point(m):
    return m.group(1).replace(".", " point ")


def __expand_currency(value: str, inflection: Dict[float, str]) -> str:
    parts = value.replace(",", "").split(".")
    if len(parts) > 2:
        return f"{value} {inflection[2]}"  # Unexpected format
    text = []
    integer = int(parts[0]) if parts[0] else 0
    if integer > 0:
        integer_unit = inflection.get(integer, inflection[2])
        text.append(f"{integer} {integer_unit}")
    fraction = int(parts[1]) if len(parts) > 1 and parts[1] else 0
    if fraction > 0:
        fraction_unit = inflection.get(fraction / 100, inflection[0.02])
        text.append(f"{fraction} {fraction_unit}")
    if len(text) == 0:
        return f"zero {inflection[2]}"
    return " ".join(text)


def _expand_currency(m: "re.Match") -> str:
    currencies = {
        "$": {
            0.01: "cent",
            0.02: "cents",
            1: "dollar",
            2: "dollars",
        },
        "€": {
            0.01: "cent",
            0.02: "cents",
            1: "euro",
            2: "euros",
        },
        "£": {
            0.01: "penny",
            0.02: "pence",
            1: "pound sterling",
            2: "pounds sterling",
        },
        "¥": {
            # TODO rin
            0.02: "sen",
            2: "yen",
        },
    }
    unit = m.group(1)
    currency = currencies[unit]
    value = m.group(2)
    return __expand_currency(value, currency)


def _expand_ordinal(m):
    return _inflect.number_to_words(m.group(0))


def _expand_number(m):
    num = int(m.group(0))
    if 1000 < num < 3000:
        if num == 2000:
            return "two thousand"
        if 2000 < num < 2010:
            return "two thousand " + _inflect.number_to_words(num % 100)
        if num % 100 == 0:
            return _inflect.number_to_words(num // 100) + " hundred"
        return _inflect.number_to_words(num, andword="", zero="oh", group=2).replace(", ", " ")
    return _inflect.number_to_words(num, andword="")


def normalize_numbers(text):
    text = re.sub(_comma_number_re, _remove_commas, text)
    text = re.sub(_currency_re, _expand_currency, text)
    text = re.sub(_decimal_number_re, _expand_decimal_point, text)
    text = re.sub(_ordinal_re, _expand_ordinal, text)
    text = re.sub(_number_re, _expand_number, text)
    return text


================================================
FILE: TTS/tts/utils/text/english/time_norm.py
================================================
import re

import inflect

_inflect = inflect.engine()

_time_re = re.compile(
    r"""\b
                          ((0?[0-9])|(1[0-1])|(1[2-9])|(2[0-3]))  # hours
                          :
                          ([0-5][0-9])                            # minutes
                          \s*(a\\.m\\.|am|pm|p\\.m\\.|a\\.m|p\\.m)? # am/pm
                          \b""",
    re.IGNORECASE | re.X,
)


def _expand_num(n: int) -> str:
    return _inflect.number_to_words(n)


def _expand_time_english(match: "re.Match") -> str:
    hour = int(match.group(1))
    past_noon = hour >= 12
    time = []
    if hour > 12:
        hour -= 12
    elif hour == 0:
        hour = 12
        past_noon = True
    time.append(_expand_num(hour))

    minute = int(match.group(6))
    if minute > 0:
        if minute < 10:
            time.append("oh")
        time.append(_expand_num(minute))
    am_pm = match.group(7)
    if am_pm is None:
        time.append("p m" if past_noon else "a m")
    else:
        time.extend(list(am_pm.replace(".", "")))
    return " ".join(time)


def expand_time_english(text: str) -> str:
    return re.sub(_time_re, _expand_time_english, text)


================================================
FILE: TTS/tts/utils/text/french/__init__.py
================================================


================================================
FILE: TTS/tts/utils/text/french/abbreviations.py
================================================
import re

# List of (regular expression, replacement) pairs for abbreviations in french:
abbreviations_fr = [
    (re.compile("\\b%s\\." % x[0], re.IGNORECASE), x[1])
    for x in [
        ("M", "monsieur"),
        ("Mlle", "mademoiselle"),
        ("Mlles", "mesdemoiselles"),
        ("Mme", "Madame"),
        ("Mmes", "Mesdames"),
        ("N.B", "nota bene"),
        ("M", "monsieur"),
        ("p.c.q", "parce que"),
        ("Pr", "professeur"),
        ("qqch", "quelque chose"),
        ("rdv", "rendez-vous"),
        ("max", "maximum"),
        ("min", "minimum"),
        ("no", "numéro"),
        ("adr", "adresse"),
        ("dr", "docteur"),
        ("st", "saint"),
        ("co", "companie"),
        ("jr", "junior"),
        ("sgt", "sergent"),
        ("capt", "capitain"),
        ("col", "colonel"),
        ("av", "avenue"),
        ("av. J.-C", "avant Jésus-Christ"),
        ("apr. J.-C", "après Jésus-Christ"),
        ("art", "article"),
        ("boul", "boulevard"),
        ("c.-à-d", "c’est-à-dire"),
        ("etc", "et cetera"),
        ("ex", "exemple"),
        ("excl", "exclusivement"),
        ("boul", "boulevard"),
    ]
] + [
    (re.compile("\\b%s" % x[0]), x[1])
    for x in [
        ("Mlle", "mademoiselle"),
        ("Mlles", "mesdemoiselles"),
        ("Mme", "Madame"),
        ("Mmes", "Mesdames"),
    ]
]


================================================
FILE: TTS/tts/utils/text/japanese/__init__.py
================================================


================================================
FILE: TTS/tts/utils/text/japanese/phonemizer.py
================================================
# Convert Japanese text to phonemes which is
# compatible with Julius https://github.com/julius-speech/segmentation-kit

import re
import unicodedata

import MeCab
from num2words import num2words

_CONVRULES = [
    # Conversion of 2 letters
    "アァ/ a a",
    "イィ/ i i",
    "イェ/ i e",
    "イャ/ y a",
    "ウゥ/ u:",
    "エェ/ e e",
    "オォ/ o:",
    "カァ/ k a:",
    "キィ/ k i:",
    "クゥ/ k u:",
    "クャ/ ky a",
    "クュ/ ky u",
    "クョ/ ky o",
    "ケェ/ k e:",
    "コォ/ k o:",
    "ガァ/ g a:",
    "ギィ/ g i:",
    "グゥ/ g u:",
    "グャ/ gy a",
    "グュ/ gy u",
    "グョ/ gy o",
    "ゲェ/ g e:",
    "ゴォ/ g o:",
    "サァ/ s a:",
    "シィ/ sh i:",
    "スゥ/ s u:",
    "スャ/ sh a",
    "スュ/ sh u",
    "スョ/ sh o",
    "セェ/ s e:",
    "ソォ/ s o:",
    "ザァ/ z a:",
    "ジィ/ j i:",
    "ズゥ/ z u:",
    "ズャ/ zy a",
    "ズュ/ zy u",
    "ズョ/ zy o",
    "ゼェ/ z e:",
    "ゾォ/ z o:",
    "タァ/ t a:",
    "チィ/ ch i:",
    "ツァ/ ts a",
    "ツィ/ ts i",
    "ツゥ/ ts u:",
    "ツャ/ ch a",
    "ツュ/ ch u",
    "ツョ/ ch o",
    "ツェ/ ts e",
    "ツォ/ ts o",
    "テェ/ t e:",
    "トォ/ t o:",
    "ダァ/ d a:",
    "ヂィ/ j i:",
    "ヅゥ/ d u:",
    "ヅャ/ zy a",
    "ヅュ/ zy u",
    "ヅョ/ zy o",
    "デェ/ d e:",
    "ドォ/ d o:",
    "ナァ/ n a:",
    "ニィ/ n i:",
    "ヌゥ/ n u:",
    "ヌャ/ ny a",
    "ヌュ/ ny u",
    "ヌョ/ ny o",
    "ネェ/ n e:",
    "ノォ/ n o:",
    "ハァ/ h a:",
    "ヒィ/ h i:",
    "フゥ/ f u:",
    "フャ/ hy a",
    "フュ/ hy u",
    "フョ/ hy o",
    "ヘェ/ h e:",
    "ホォ/ h o:",
    "バァ/ b a:",
    "ビィ/ b i:",
    "ブゥ/ b u:",
    "フャ/ hy a",
    "ブュ/ by u",
    "フョ/ hy o",
    "ベェ/ b e:",
    "ボォ/ b o:",
    "パァ/ p a:",
    "ピィ/ p i:",
    "プゥ/ p u:",
    "プャ/ py a",
    "プュ/ py u",
    "プョ/ py o",
    "ペェ/ p e:",
    "ポォ/ p o:",
    "マァ/ m a:",
    "ミィ/ m i:",
    "ムゥ/ m u:",
    "ムャ/ my a",
    "ムュ/ my u",
    "ムョ/ my o",
    "メェ/ m e:",
    "モォ/ m o:",
    "ヤァ/ y a:",
    "ユゥ/ y u:",
    "ユャ/ y a:",
    "ユュ/ y u:",
    "ユョ/ y o:",
    "ヨォ/ y o:",
    "ラァ/ r a:",
    "リィ/ r i:",
    "ルゥ/ r u:",
    "ルャ/ ry a",
    "ルュ/ ry u",
    "ルョ/ ry o",
    "レェ/ r e:",
    "ロォ/ r o:",
    "ワァ/ w a:",
    "ヲォ/ o:",
    "ディ/ d i",
    "デェ/ d e:",
    "デャ/ dy a",
    "デュ/ dy u",
    "デョ/ dy o",
    "ティ/ t i",
    "テェ/ t e:",
    "テャ/ ty a",
    "テュ/ ty u",
    "テョ/ ty o",
    "スィ/ s i",
    "ズァ/ z u a",
    "ズィ/ z i",
    "ズゥ/ z u",
    "ズャ/ zy a",
    "ズュ/ zy u",
    "ズョ/ zy o",
    "ズェ/ z e",
    "ズォ/ z o",
    "キャ/ ky a",
    "キュ/ ky u",
    "キョ/ ky o",
    "シャ/ sh a",
    "シュ/ sh u",
    "シェ/ sh e",
    "ショ/ sh o",
    "チャ/ ch a",
    "チュ/ ch u",
    "チェ/ ch e",
    "チョ/ ch o",
    "トゥ/ t u",
    "トャ/ ty a",
    "トュ/ ty u",
    "トョ/ ty o",
    "ドァ/ d o a",
    "ドゥ/ d u",
    "ドャ/ dy a",
    "ドュ/ dy u",
    "ドョ/ dy o",
    "ドォ/ d o:",
    "ニャ/ ny a",
    "ニュ/ ny u",
    "ニョ/ ny o",
    "ヒャ/ hy a",
    "ヒュ/ hy u",
    "ヒョ/ hy o",
    "ミャ/ my a",
    "ミュ/ my u",
    "ミョ/ my o",
    "リャ/ ry a",
    "リュ/ ry u",
    "リョ/ ry o",
    "ギャ/ gy a",
    "ギュ/ gy u",
    "ギョ/ gy o",
    "ヂェ/ j e",
    "ヂャ/ j a",
    "ヂュ/ j u",
    "ヂョ/ j o",
    "ジェ/ j e",
    "ジャ/ j a",
    "ジュ/ j u",
    "ジョ/ j o",
    "ビャ/ by a",
    "ビュ/ by u",
    "ビョ/ by o",
    "ピャ/ py a",
    "ピュ/ py u",
    "ピョ/ py o",
    "ウァ/ u a",
    "ウィ/ w i",
    "ウェ/ w e",
    "ウォ/ w o",
    "ファ/ f a",
    "フィ/ f i",
    "フゥ/ f u",
    "フャ/ hy a",
    "フュ/ hy u",
    "フョ/ hy o",
    "フェ/ f e",
    "フォ/ f o",
    "ヴァ/ b a",
    "ヴィ/ b i",
    "ヴェ/ b e",
    "ヴォ/ b o",
    "ヴュ/ by u",
    # Conversion of 1 letter
    "ア/ a",
    "イ/ i",
    "ウ/ u",
    "エ/ e",
    "オ/ o",
    "カ/ k a",
    "キ/ k i",
    "ク/ k u",
    "ケ/ k e",
    "コ/ k o",
    "サ/ s a",
    "シ/ sh i",
    "ス/ s u",
    "セ/ s e",
    "ソ/ s o",
    "タ/ t a",
    "チ/ ch i",
    "ツ/ ts u",
    "テ/ t e",
    "ト/ t o",
    "ナ/ n a",
    "ニ/ n i",
    "ヌ/ n u",
    "ネ/ n e",
    "ノ/ n o",
    "ハ/ h a",
    "ヒ/ h i",
    "フ/ f u",
    "ヘ/ h e",
    "ホ/ h o",
    "マ/ m a",
    "ミ/ m i",
    "ム/ m u",
    "メ/ m e",
    "モ/ m o",
    "ラ/ r a",
    "リ/ r i",
    "ル/ r u",
    "レ/ r e",
    "ロ/ r o",
    "ガ/ g a",
    "ギ/ g i",
    "グ/ g u",
    "ゲ/ g e",
    "ゴ/ g o",
    "ザ/ z a",
    "ジ/ j i",
    "ズ/ z u",
    "ゼ/ z e",
    "ゾ/ z o",
    "ダ/ d a",
    "ヂ/ j i",
    "ヅ/ z u",
    "デ/ d e",
    "ド/ d o",
    "バ/ b a",
    "ビ/ b i",
    "ブ/ b u",
    "ベ/ b e",
    "ボ/ b o",
    "パ/ p a",
    "ピ/ p i",
    "プ/ p u",
    "ペ/ p e",
    "ポ/ p o",
    "ヤ/ y a",
    "ユ/ y u",
    "ヨ/ y o",
    "ワ/ w a",
    "ヰ/ i",
    "ヱ/ e",
    "ヲ/ o",
    "ン/ N",
    "ッ/ q",
    "ヴ/ b u",
    "ー/:",
    # Try converting broken text
    "ァ/ a",
    "ィ/ i",
    "ゥ/ u",
    "ェ/ e",
    "ォ/ o",
    "ヮ/ w a",
    "ォ/ o",
    # Symbols
    "、/ ,",
    "。/ .",
    "！/ !",
    "？/ ?",
    "・/ ,",
]

_COLON_RX = re.compile(":+")
_REJECT_RX = re.compile("[^ a-zA-Z:,.?]")


def _makerulemap():
    l = [tuple(x.split("/")) for x in _CONVRULES]
    return tuple({k: v for k, v in l if len(k) == i} for i in (1, 2))


_RULEMAP1, _RULEMAP2 = _makerulemap()


def kata2phoneme(text: str) -> str:
    """Convert katakana text to phonemes."""
    text = text.strip()
    res = ""
    while text:
        if len(text) >= 2:
            x = _RULEMAP2.get(text[:2])
            if x is not None:
                text = text[2:]
                res += x
                continue
        x = _RULEMAP1.get(text[0])
        if x is not None:
            text = text[1:]
            res += x
            continue
        res += " " + text[0]
        text = text[1:]
    res = _COLON_RX.sub(":", res)
    return res[1:]


_KATAKANA = "".join(chr(ch) for ch in range(ord("ァ"), ord("ン") + 1))
_HIRAGANA = "".join(chr(ch) for ch in range(ord("ぁ"), ord("ん") + 1))
_HIRA2KATATRANS = str.maketrans(_HIRAGANA, _KATAKANA)


def hira2kata(text: str) -> str:
    text = text.translate(_HIRA2KATATRANS)
    return text.replace("う゛", "ヴ")


_SYMBOL_TOKENS = set(list("・、。？！"))
_NO_YOMI_TOKENS = set(list("「」『』―（）［］[]　…"))
_TAGGER = MeCab.Tagger()


def text2kata(text: str) -> str:
    parsed = _TAGGER.parse(text)
    res = []
    for line in parsed.split("\n"):
        if line == "EOS":
            break
        parts = line.split("\t")

        word, yomi = parts[0], parts[1]
        if yomi:
            res.append(yomi)
        else:
            if word in _SYMBOL_TOKENS:
                res.append(word)
            elif word in ("っ", "ッ"):
                res.append("ッ")
            elif word in _NO_YOMI_TOKENS:
                pass
            else:
                res.append(word)
    return hira2kata("".join(res))


_ALPHASYMBOL_YOMI = {
    "#": "シャープ",
    "%": "パーセント",
    "&": "アンド",
    "+": "プラス",
    "-": "マイナス",
    ":": "コロン",
    ";": "セミコロン",
    "<": "小なり",
    "=": "イコール",
    ">": "大なり",
    "@": "アット",
    "a": "エー",
    "b": "ビー",
    "c": "シー",
    "d": "ディー",
    "e": "イー",
    "f": "エフ",
    "g": "ジー",
    "h": "エイチ",
    "i": "アイ",
    "j": "ジェー",
    "k": "ケー",
    "l": "エル",
    "m": "エム",
    "n": "エヌ",
    "o": "オー",
    "p": "ピー",
    "q": "キュー",
    "r": "アール",
    "s": "エス",
    "t": "ティー",
    "u": "ユー",
    "v": "ブイ",
    "w": "ダブリュー",
    "x": "エックス",
    "y": "ワイ",
    "z": "ゼット",
    "α": "アルファ",
    "β": "ベータ",
    "γ": "ガンマ",
    "δ": "デルタ",
    "ε": "イプシロン",
    "ζ": "ゼータ",
    "η": "イータ",
    "θ": "シータ",
    "ι": "イオタ",
    "κ": "カッパ",
    "λ": "ラムダ",
    "μ": "ミュー",
    "ν": "ニュー",
    "ξ": "クサイ",
    "ο": "オミクロン",
    "π": "パイ",
    "ρ": "ロー",
    "σ": "シグマ",
    "τ": "タウ",
    "υ": "ウプシロン",
    "φ": "ファイ",
    "χ": "カイ",
    "ψ": "プサイ",
    "ω": "オメガ",
}


_NUMBER_WITH_SEPARATOR_RX = re.compile("[0-9]{1,3}(,[0-9]{3})+")
_CURRENCY_MAP = {"$": "ドル", "¥": "円", "£": "ポンド", "€": "ユーロ"}
_CURRENCY_RX = re.compile(r"([$¥£€])([0-9.]*[0-9])")
_NUMBER_RX = re.compile(r"[0-9]+(\.[0-9]+)?")


def japanese_convert_numbers_to_words(text: str) -> str:
    res = _NUMBER_WITH_SEPARATOR_RX.sub(lambda m: m[0].replace(",", ""), text)
    res = _CURRENCY_RX.sub(lambda m: m[2] + _CURRENCY_MAP.get(m[1], m[1]), res)
    res = _NUMBER_RX.sub(lambda m: num2words(m[0], lang="ja"), res)
    return res


def japanese_convert_alpha_symbols_to_words(text: str) -> str:
    return "".join([_ALPHASYMBOL_YOMI.get(ch, ch) for ch in text.lower()])


def japanese_text_to_phonemes(text: str) -> str:
    """Convert Japanese text to phonemes."""
    res = unicodedata.normalize("NFKC", text)
    res = japanese_convert_numbers_to_words(res)
    res = japanese_convert_alpha_symbols_to_words(res)
    res = text2kata(res)
    res = kata2phoneme(res)
    return res.replace(" ", "")


================================================
FILE: TTS/tts/utils/text/korean/__init__.py
================================================


================================================
FILE: TTS/tts/utils/text/korean/ko_dictionary.py
================================================
# coding: utf-8
# Add the word you want to the dictionary.
etc_dictionary = {"1+1": "원플러스원", "2+1": "투플러스원"}


english_dictionary = {
    "KOREA": "코리아",
    "IDOL": "아이돌",
    "IT": "아이티",
    "IQ": "아이큐",
    "UP": "업",
    "DOWN": "다운",
    "PC": "피씨",
    "CCTV": "씨씨티비",
    "SNS": "에스엔에스",
    "AI": "에이아이",
    "CEO": "씨이오",
    "A": "에이",
    "B": "비",
    "C": "씨",
    "D": "디",
    "E": "이",
    "F": "에프",
    "G": "지",
    "H": "에이치",
    "I": "아이",
    "J": "제이",
    "K": "케이",
    "L": "엘",
    "M": "엠",
    "N": "엔",
    "O": "오",
    "P": "피",
    "Q": "큐",
    "R": "알",
    "S": "에스",
    "T": "티",
    "U": "유",
    "V": "브이",
    "W": "더블유",
    "X": "엑스",
    "Y": "와이",
    "Z": "제트",
}


================================================
FILE: TTS/tts/utils/text/korean/korean.py
================================================
﻿# coding: utf-8
# Code based on https://github.com/carpedm20/multi-speaker-tacotron-tensorflow/blob/master/text/korean.py
import re

from TTS.tts.utils.text.korean.ko_dictionary import english_dictionary, etc_dictionary


def normalize(text):
    text = text.strip()
    text = re.sub("[⺀-⺙⺛-⻳⼀-⿕々〇〡-〩〸-〺〻㐀-䶵一-鿃豈-鶴侮-頻並-龎]", "", text)
    text = normalize_with_dictionary(text, etc_dictionary)
    text = normalize_english(text)
    text = text.lower()
    return text


def normalize_with_dictionary(text, dic):
    if any(key in text for key in dic.keys()):
        pattern = re.compile("|".join(re.escape(key) for key in dic.keys()))
        return pattern.sub(lambda x: dic[x.group()], text)
    return text


def normalize_english(text):
    def fn(m):
        word = m.group()
        if word in english_dictionary:
            return english_dictionary.get(word)
        return word

    text = re.sub("([A-Za-z]+)", fn, text)
    return text


================================================
FILE: TTS/tts/utils/text/korean/phonemizer.py
================================================
from jamo import hangul_to_jamo

from TTS.tts.utils.text.korean.korean import normalize

g2p = None


def korean_text_to_phonemes(text, character: str = "hangeul") -> str:
    """

    The input and output values look the same, but they are different in Unicode.

    example :

        input = '하늘' (Unicode : \ud558\ub298), (하 + 늘)
        output = '하늘' (Unicode :\u1112\u1161\u1102\u1173\u11af), (ᄒ + ᅡ + ᄂ + ᅳ + ᆯ)

    """
    global g2p  # pylint: disable=global-statement
    if g2p is None:
        from g2pkk import G2p

        g2p = G2p()

    if character == "english":
        from anyascii import anyascii

        text = normalize(text)
        text = g2p(text)
        text = anyascii(text)
        return text

    text = normalize(text)
    text = g2p(text)
    text = list(hangul_to_jamo(text))  # '하늘' --> ['ᄒ', 'ᅡ', 'ᄂ', 'ᅳ', 'ᆯ']
    return "".join(text)


================================================
FILE: TTS/tts/utils/text/phonemizers/__init__.py
================================================
from TTS.tts.utils.text.phonemizers.base import BasePhonemizer
from TTS.tts.utils.text.phonemizers.espeak_wrapper import ESpeak
from TTS.tts.utils.text.phonemizers.gruut_wrapper import Gruut
from TTS.tts.utils.text.phonemizers.ja_jp_phonemizer import JA_JP_Phonemizer
from TTS.tts.utils.text.phonemizers.ko_kr_phonemizer import KO_KR_Phonemizer
from TTS.tts.utils.text.phonemizers.zh_cn_phonemizer import ZH_CN_Phonemizer

PHONEMIZERS = {b.name(): b for b in (ESpeak, Gruut, JA_JP_Phonemizer)}


ESPEAK_LANGS = list(ESpeak.supported_languages().keys())
GRUUT_LANGS = list(Gruut.supported_languages())


# Dict setting default phonemizers for each language
# Add Gruut languages
_ = [Gruut.name()] * len(GRUUT_LANGS)
DEF_LANG_TO_PHONEMIZER = dict(list(zip(GRUUT_LANGS, _)))


# Add ESpeak languages and override any existing ones
_ = [ESpeak.name()] * len(ESPEAK_LANGS)
_new_dict = dict(list(zip(list(ESPEAK_LANGS), _)))
DEF_LANG_TO_PHONEMIZER.update(_new_dict)

# Force default for some languages
DEF_LANG_TO_PHONEMIZER["en"] = DEF_LANG_TO_PHONEMIZER["en-us"]
DEF_LANG_TO_PHONEMIZER["ja-jp"] = JA_JP_Phonemizer.name()
DEF_LANG_TO_PHONEMIZER["zh-cn"] = ZH_CN_Phonemizer.name()
DEF_LANG_TO_PHONEMIZER["ko-kr"] = KO_KR_Phonemizer.name()


def get_phonemizer_by_name(name: str, **kwargs) -> BasePhonemizer:
    """Initiate a phonemizer by name

    Args:
        name (str):
            Name of the phonemizer that should match `phonemizer.name()`.

        kwargs (dict):
            Extra keyword arguments that should be passed to the phonemizer.
    """
    if name == "espeak":
        return ESpeak(**kwargs)
    if name == "gruut":
        return Gruut(**kwargs)
    if name == "zh_cn_phonemizer":
        return ZH_CN_Phonemizer(**kwargs)
    if name == "ja_jp_phonemizer":
        return JA_JP_Phonemizer(**kwargs)
    if name == "ko_kr_phonemizer":
        return KO_KR_Phonemizer(**kwargs)
    raise ValueError(f"Phonemizer {name} not found")


if __name__ == "__main__":
    print(DEF_LANG_TO_PHONEMIZER)


================================================
FILE: TTS/tts/utils/text/phonemizers/base.py
================================================
import abc
from typing import List, Tuple

from TTS.tts.utils.text.punctuation import Punctuation


class BasePhonemizer(abc.ABC):
    """Base phonemizer class

    Phonemization follows the following steps:
        1. Preprocessing:
            - remove empty lines
            - remove punctuation
            - keep track of punctuation marks

        2. Phonemization:
            - convert text to phonemes

        3. Postprocessing:
            - join phonemes
            - restore punctuation marks

    Args:
        language (str):
            Language used by the phonemizer.

        punctuations (List[str]):
            List of punctuation marks to be preserved.

        keep_puncs (bool):
            Whether to preserve punctuation marks or not.
    """

    def __init__(self, language, punctuations=Punctuation.default_puncs(), keep_puncs=False):
        # ensure the backend is installed on the system
        if not self.is_available():
            raise RuntimeError("{} not installed on your system".format(self.name()))  # pragma: nocover

        # ensure the backend support the requested language
        self._language = self._init_language(language)

        # setup punctuation processing
        self._keep_puncs = keep_puncs
        self._punctuator = Punctuation(punctuations)

    def _init_language(self, language):
        """Language initialization

        This method may be overloaded in child classes (see Segments backend)

        """
        if not self.is_supported_language(language):
            raise RuntimeError(f'language "{language}" is not supported by the ' f"{self.name()} backend")
        return language

    @property
    def language(self):
        """The language code configured to be used for phonemization"""
        return self._language

    @staticmethod
    @abc.abstractmethod
    def name():
        """The name of the backend"""
        ...

    @classmethod
    @abc.abstractmethod
    def is_available(cls):
        """Returns True if the backend is installed, False otherwise"""
        ...

    @classmethod
    @abc.abstractmethod
    def version(cls):
        """Return the backend version as a tuple (major, minor, patch)"""
        ...

    @staticmethod
    @abc.abstractmethod
    def supported_languages():
        """Return a dict of language codes -> name supported by the backend"""
        ...

    def is_supported_language(self, language):
        """Returns True if `language` is supported by the backend"""
        return language in self.supported_languages()

    @abc.abstractmethod
    def _phonemize(self, text, separator):
        """The main phonemization method"""

    def _phonemize_preprocess(self, text) -> Tuple[List[str], List]:
        """Preprocess the text before phonemization

        1. remove spaces
        2. remove punctuation

        Override this if you need a different behaviour
        """
        text = text.strip()
        if self._keep_puncs:
            # a tuple (text, punctuation marks)
            return self._punctuator.strip_to_restore(text)
        return [self._punctuator.strip(text)], []

    def _phonemize_postprocess(self, phonemized, punctuations) -> str:
        """Postprocess the raw phonemized output

        Override this if you need a different behaviour
        """
        if self._keep_puncs:
            return self._punctuator.restore(phonemized, punctuations)[0]
        return phonemized[0]

    def phonemize(self, text: str, separator="|", language: str = None) -> str:  # pylint: disable=unused-argument
        """Returns the `text` phonemized for the given language

        Args:
            text (str):
                Text to be phonemized.

            separator (str):
                string separator used between phonemes. Default to '_'.

        Returns:
            (str): Phonemized text
        """
        text, punctuations = self._phonemize_preprocess(text)
        phonemized = []
        for t in text:
            p = self._phonemize(t, separator)
            phonemized.append(p)
        phonemized = self._phonemize_postprocess(phonemized, punctuations)
        return phonemized

    def print_logs(self, level: int = 0):
        indent = "\t" * level
        print(f"{indent}| > phoneme language: {self.language}")
        print(f"{indent}| > phoneme backend: {self.name()}")


================================================
FILE: TTS/tts/utils/text/phonemizers/espeak_wrapper.py
================================================
import logging
import re
import subprocess
from typing import Dict, List

from packaging.version import Version

from TTS.tts.utils.text.phonemizers.base import BasePhonemizer
from TTS.tts.utils.text.punctuation import Punctuation


def is_tool(name):
    from shutil import which

    return which(name) is not None


# Use a regex pattern to match the espeak version, because it may be
# symlinked to espeak-ng, which moves the version bits to another spot.
espeak_version_pattern = re.compile(r"text-to-speech:\s(?P<version>\d+\.\d+(\.\d+)?)")


def get_espeak_version():
    output = subprocess.getoutput("espeak --version")
    match = espeak_version_pattern.search(output)

    return match.group("version")


def get_espeakng_version():
    output = subprocess.getoutput("espeak-ng --version")
    return output.split()[3]


# priority: espeakng > espeak
if is_tool("espeak-ng"):
    _DEF_ESPEAK_LIB = "espeak-ng"
    _DEF_ESPEAK_VER = get_espeakng_version()
elif is_tool("espeak"):
    _DEF_ESPEAK_LIB = "espeak"
    _DEF_ESPEAK_VER = get_espeak_version()
else:
    _DEF_ESPEAK_LIB = None
    _DEF_ESPEAK_VER = None


def _espeak_exe(espeak_lib: str, args: List, sync=False) -> List[str]:
    """Run espeak with the given arguments."""
    cmd = [
        espeak_lib,
        "-q",
        "-b",
        "1",  # UTF8 text encoding
    ]
    cmd.extend(args)
    logging.debug("espeakng: executing %s", repr(cmd))

    with subprocess.Popen(
        cmd,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
    ) as p:
        res = iter(p.stdout.readline, b"")
        if not sync:
            p.stdout.close()
            if p.stderr:
                p.stderr.close()
            if p.stdin:
                p.stdin.close()
            return res
        res2 = []
        for line in res:
            res2.append(line)
        p.stdout.close()
        if p.stderr:
            p.stderr.close()
        if p.stdin:
            p.stdin.close()
        p.wait()
    return res2


class ESpeak(BasePhonemizer):
    """ESpeak wrapper calling `espeak` or `espeak-ng` from the command-line the perform G2P

    Args:
        language (str):
            Valid language code for the used backend.

        backend (str):
            Name of the backend library to use. `espeak` or `espeak-ng`. If None, set automatically
            prefering `espeak-ng` over `espeak`. Defaults to None.

        punctuations (str):
            Characters to be treated as punctuation. Defaults to Punctuation.default_puncs().

        keep_puncs (bool):
            If True, keep the punctuations after phonemization. Defaults to True.

    Example:

        >>> from TTS.tts.utils.text.phonemizers import ESpeak
        >>> phonemizer = ESpeak("tr")
        >>> phonemizer.phonemize("Bu Türkçe, bir örnektir.", separator="|")
        'b|ʊ t|ˈø|r|k|tʃ|ɛ, b|ɪ|r œ|r|n|ˈɛ|c|t|ɪ|r.'

    """

    _ESPEAK_LIB = _DEF_ESPEAK_LIB
    _ESPEAK_VER = _DEF_ESPEAK_VER

    def __init__(self, language: str, backend=None, punctuations=Punctuation.default_puncs(), keep_puncs=True):
        if self._ESPEAK_LIB is None:
            raise Exception(" [!] No espeak backend found. Install espeak-ng or espeak to your system.")
        self.backend = self._ESPEAK_LIB

        # band-aid for backwards compatibility
        if language == "en":
            language = "en-us"
        if language == "zh-cn":
            language = "cmn"

        super().__init__(language, punctuations=punctuations, keep_puncs=keep_puncs)
        if backend is not None:
            self.backend = backend

    @property
    def backend(self):
        return self._ESPEAK_LIB

    @property
    def backend_version(self):
        return self._ESPEAK_VER

    @backend.setter
    def backend(self, backend):
        if backend not in ["espeak", "espeak-ng"]:
            raise Exception("Unknown backend: %s" % backend)
        self._ESPEAK_LIB = backend
        self._ESPEAK_VER = get_espeakng_version() if backend == "espeak-ng" else get_espeak_version()

    def auto_set_espeak_lib(self) -> None:
        if is_tool("espeak-ng"):
            self._ESPEAK_LIB = "espeak-ng"
            self._ESPEAK_VER = get_espeakng_version()
        elif is_tool("espeak"):
            self._ESPEAK_LIB = "espeak"
            self._ESPEAK_VER = get_espeak_version()
        else:
            raise Exception("Cannot set backend automatically. espeak-ng or espeak not found")

    @staticmethod
    def name():
        return "espeak"

    def phonemize_espeak(self, text: str, separator: str = "|", tie=False) -> str:
        """Convert input text to phonemes.

        Args:
            text (str):
                Text to be converted to phonemes.

            tie (bool, optional) : When True use a '͡' character between
                consecutive characters of a single phoneme. Else separate phoneme
                with '_'. This option requires espeak>=1.49. Default to False.
        """
        # set arguments
        args = ["-v", f"{self._language}"]
        # espeak and espeak-ng parses `ipa` differently
        if tie:
            # use '͡' between phonemes
            if self.backend == "espeak":
                args.append("--ipa=1")
            else:
                args.append("--ipa=3")
        else:
            # split with '_'
            if self.backend == "espeak":
                if Version(self.backend_version) >= Version("1.48.15"):
                    args.append("--ipa=1")
                else:
                    args.append("--ipa=3")
            else:
                args.append("--ipa=1")
        if tie:
            args.append("--tie=%s" % tie)

        args.append('"' + text + '"')
        # compute phonemes
        phonemes = ""
        for line in _espeak_exe(self._ESPEAK_LIB, args, sync=True):
            logging.debug("line: %s", repr(line))
            ph_decoded = line.decode("utf8").strip()
            # espeak need to skip first two characters of the retuned text:
            #   version 1.48.03: "_ p_ɹ_ˈaɪ_ɚ t_ə n_oʊ_v_ˈɛ_m_b_ɚ t_w_ˈɛ_n_t_i t_ˈuː\n"
            #   version 1.48.15: " p_ɹ_ˈaɪ_ɚ t_ə n_oʊ_v_ˈɛ_m_b_ɚ t_w_ˈɛ_n_t_i t_ˈuː\n"
            # espeak-ng need to skip the first character of the retuned text:
            #   "_p_ɹ_ˈaɪ_ɚ t_ə n_oʊ_v_ˈɛ_m_b_ɚ t_w_ˈɛ_n_t_i t_ˈuː\n"

            # dealing with the conditions descrived above
            ph_decoded = ph_decoded[:1].replace("_", "") + ph_decoded[1:]

            # espeak-ng backend can add language flags that need to be removed:
            #   "sɛʁtˈɛ̃ mˈo kɔm (en)fˈʊtbɔːl(fr) ʒenˈɛʁ de- flˈaɡ də- lˈɑ̃ɡ."
            # phonemize needs to remove the language flags of the returned text:
            #   "sɛʁtˈɛ̃ mˈo kɔm fˈʊtbɔːl ʒenˈɛʁ de- flˈaɡ də- lˈɑ̃ɡ."
            ph_decoded = re.sub(r"\(.+?\)", "", ph_decoded)

            phonemes += ph_decoded.strip()
        return phonemes.replace("_", separator)

    def _phonemize(self, text, separator=None):
        return self.phonemize_espeak(text, separator, tie=False)

    @staticmethod
    def supported_languages() -> Dict:
        """Get a dictionary of supported languages.

        Returns:
            Dict: Dictionary of language codes.
        """
        if _DEF_ESPEAK_LIB is None:
            return {}
        args = ["--voices"]
        langs = {}
        count = 0
        for line in _espeak_exe(_DEF_ESPEAK_LIB, args, sync=True):
            line = line.decode("utf8").strip()
            if count > 0:
                cols = line.split()
                lang_code = cols[1]
                lang_name = cols[3]
                langs[lang_code] = lang_name
            logging.debug("line: %s", repr(line))
            count += 1
        return langs

    def version(self) -> str:
        """Get the version of the used backend.

        Returns:
            str: Version of the used backend.
        """
        args = ["--version"]
        for line in _espeak_exe(self.backend, args, sync=True):
            version = line.decode("utf8").strip().split()[2]
            logging.debug("line: %s", repr(line))
            return version

    @classmethod
    def is_available(cls):
        """Return true if ESpeak is available else false"""
        return is_tool("espeak") or is_tool("espeak-ng")


if __name__ == "__main__":
    e = ESpeak(language="en-us")
    print(e.supported_languages())
    print(e.version())
    print(e.language)
    print(e.name())
    print(e.is_available())

    e = ESpeak(language="en-us", keep_puncs=False)
    print("`" + e.phonemize("hello how are you today?") + "`")

    e = ESpeak(language="en-us", keep_puncs=True)
    print("`" + e.phonemize("hello how are you today?") + "`")


================================================
FILE: TTS/tts/utils/text/phonemizers/gruut_wrapper.py
================================================
import importlib
from typing import List

import gruut
from gruut_ipa import IPA

from TTS.tts.utils.text.phonemizers.base import BasePhonemizer
from TTS.tts.utils.text.punctuation import Punctuation

# Table for str.translate to fix gruut/TTS phoneme mismatch
GRUUT_TRANS_TABLE = str.maketrans("g", "ɡ")


class Gruut(BasePhonemizer):
    """Gruut wrapper for G2P

    Args:
        language (str):
            Valid language code for the used backend.

        punctuations (str):
            Characters to be treated as punctuation. Defaults to `Punctuation.default_puncs()`.

        keep_puncs (bool):
            If true, keep the punctuations after phonemization. Defaults to True.

        use_espeak_phonemes (bool):
            If true, use espeak lexicons instead of default Gruut lexicons. Defaults to False.

        keep_stress (bool):
            If true, keep the stress characters after phonemization. Defaults to False.

    Example:

        >>> from TTS.tts.utils.text.phonemizers.gruut_wrapper import Gruut
        >>> phonemizer = Gruut('en-us')
        >>> phonemizer.phonemize("Be a voice, not an! echo?", separator="|")
        'b|i| ə| v|ɔ|ɪ|s, n|ɑ|t| ə|n! ɛ|k|o|ʊ?'
    """

    def __init__(
        self,
        language: str,
        punctuations=Punctuation.default_puncs(),
        keep_puncs=True,
        use_espeak_phonemes=False,
        keep_stress=False,
    ):
        super().__init__(language, punctuations=punctuations, keep_puncs=keep_puncs)
        self.use_espeak_phonemes = use_espeak_phonemes
        self.keep_stress = keep_stress

    @staticmethod
    def name():
        return "gruut"

    def phonemize_gruut(self, text: str, separator: str = "|", tie=False) -> str:  # pylint: disable=unused-argument
        """Convert input text to phonemes.

        Gruut phonemizes the given `str` by seperating each phoneme character with `separator`, even for characters
        that constitude a single sound.

        It doesn't affect 🐸TTS since it individually converts each character to token IDs.

        Examples::
            "hello how are you today?" -> `h|ɛ|l|o|ʊ| h|a|ʊ| ɑ|ɹ| j|u| t|ə|d|e|ɪ`

        Args:
            text (str):
                Text to be converted to phonemes.

            tie (bool, optional) : When True use a '͡' character between
                consecutive characters of a single phoneme. Else separate phoneme
                with '_'. This option requires espeak>=1.49. Default to False.
        """
        ph_list = []
        for sentence in gruut.sentences(text, lang=self.language, espeak=self.use_espeak_phonemes):
            for word in sentence:
                if word.is_break:
                    # Use actual character for break phoneme (e.g., comma)
                    if ph_list:
                        # Join with previous word
                        ph_list[-1].append(word.text)
                    else:
                        # First word is punctuation
                        ph_list.append([word.text])
                elif word.phonemes:
                    # Add phonemes for word
                    word_phonemes = []

                    for word_phoneme in word.phonemes:
                        if not self.keep_stress:
                            # Remove primary/secondary stress
                            word_phoneme = IPA.without_stress(word_phoneme)

                        word_phoneme = word_phoneme.translate(GRUUT_TRANS_TABLE)

                        if word_phoneme:
                            # Flatten phonemes
                            word_phonemes.extend(word_phoneme)

                    if word_phonemes:
                        ph_list.append(word_phonemes)

        ph_words = [separator.join(word_phonemes) for word_phonemes in ph_list]
        ph = f"{separator} ".join(ph_words)
        return ph

    def _phonemize(self, text, separator):
        return self.phonemize_gruut(text, separator, tie=False)

    def is_supported_language(self, language):
        """Returns True if `language` is supported by the backend"""
        return gruut.is_language_supported(language)

    @staticmethod
    def supported_languages() -> List:
        """Get a dictionary of supported languages.

        Returns:
            List: List of language codes.
        """
        return list(gruut.get_supported_languages())

    def version(self):
        """Get the version of the used backend.

        Returns:
            str: Version of the used backend.
        """
        return gruut.__version__

    @classmethod
    def is_available(cls):
        """Return true if ESpeak is available else false"""
        return importlib.util.find_spec("gruut") is not None


if __name__ == "__main__":
    e = Gruut(language="en-us")
    print(e.supported_languages())
    print(e.version())
    print(e.language)
    print(e.name())
    print(e.is_available())

    e = Gruut(language="en-us", keep_puncs=False)
    print("`" + e.phonemize("hello how are you today?") + "`")

    e = Gruut(language="en-us", keep_puncs=True)
    print("`" + e.phonemize("hello how, are you today?") + "`")


================================================
FILE: TTS/tts/utils/text/phonemizers/ja_jp_phonemizer.py
================================================
from typing import Dict

from TTS.tts.utils.text.japanese.phonemizer import japanese_text_to_phonemes
from TTS.tts.utils.text.phonemizers.base import BasePhonemizer

_DEF_JA_PUNCS = "、.,[]()?!〽~『』「」【】"

_TRANS_TABLE = {"、": ","}


def trans(text):
    for i, j in _TRANS_TABLE.items():
        text = text.replace(i, j)
    return text


class JA_JP_Phonemizer(BasePhonemizer):
    """🐸TTS Ja-Jp phonemizer using functions in `TTS.tts.utils.text.japanese.phonemizer`

    TODO: someone with JA knowledge should check this implementation

    Example:

        >>> from TTS.tts.utils.text.phonemizers import JA_JP_Phonemizer
        >>> phonemizer = JA_JP_Phonemizer()
        >>> phonemizer.phonemize("どちらに行きますか？", separator="|")
        'd|o|c|h|i|r|a|n|i|i|k|i|m|a|s|u|k|a|?'

    """

    language = "ja-jp"

    def __init__(self, punctuations=_DEF_JA_PUNCS, keep_puncs=True, **kwargs):  # pylint: disable=unused-argument
        super().__init__(self.language, punctuations=punctuations, keep_puncs=keep_puncs)

    @staticmethod
    def name():
        return "ja_jp_phonemizer"

    def _phonemize(self, text: str, separator: str = "|") -> str:
        ph = japanese_text_to_phonemes(text)
        if separator is not None or separator != "":
            return separator.join(ph)
        return ph

    def phonemize(self, text: str, separator="|", language=None) -> str:
        """Custom phonemize for JP_JA

        Skip pre-post processing steps used by the other phonemizers.
        """
        return self._phonemize(text, separator)

    @staticmethod
    def supported_languages() -> Dict:
        return {"ja-jp": "Japanese (Japan)"}

    def version(self) -> str:
        return "0.0.1"

    def is_available(self) -> bool:
        return True


# if __name__ == "__main__":
#     text = "これは、電話をかけるための私の日本語の例のテキストです。"
#     e = JA_JP_Phonemizer()
#     print(e.supported_languages())
#     print(e.version())
#     print(e.language)
#     print(e.name())
#     print(e.is_available())
#     print("`" + e.phonemize(text) + "`")


================================================
FILE: TTS/tts/utils/text/phonemizers/ko_kr_phonemizer.py
================================================
from typing import Dict

from TTS.tts.utils.text.korean.phonemizer import korean_text_to_phonemes
from TTS.tts.utils.text.phonemizers.base import BasePhonemizer

_DEF_KO_PUNCS = "、.,[]()?!〽~『』「」【】"


class KO_KR_Phonemizer(BasePhonemizer):
    """🐸TTS ko_kr_phonemizer using functions in `TTS.tts.utils.text.korean.phonemizer`

    TODO: Add Korean to character (ᄀᄁᄂᄃᄄᄅᄆᄇᄈᄉᄊᄋᄌᄍᄎᄏᄐᄑ하ᅢᅣᅤᅥᅦᅧᅨᅩᅪᅫᅬᅭᅮᅯᅰᅱᅲᅳᅴᅵᆨᆩᆪᆫᆬᆭᆮᆯᆰᆱᆲᆳᆴᆵᆶᆷᆸᆹᆺᆻᆼᆽᆾᆿᇀᇁᇂ)

    Example:

        >>> from TTS.tts.utils.text.phonemizers import KO_KR_Phonemizer
        >>> phonemizer = KO_KR_Phonemizer()
        >>> phonemizer.phonemize("이 문장은 음성합성 테스트를 위한 문장입니다.", separator="|")
        'ᄋ|ᅵ| |ᄆ|ᅮ|ᆫ|ᄌ|ᅡ|ᆼ|ᄋ|ᅳ| |ᄂ|ᅳ|ᆷ|ᄉ|ᅥ|ᆼ|ᄒ|ᅡ|ᆸ|ᄊ|ᅥ|ᆼ| |ᄐ|ᅦ|ᄉ|ᅳ|ᄐ|ᅳ|ᄅ|ᅳ| |ᄅ|ᅱ|ᄒ|ᅡ|ᆫ| |ᄆ|ᅮ|ᆫ|ᄌ|ᅡ|ᆼ|ᄋ|ᅵ|ᆷ|ᄂ|ᅵ|ᄃ|ᅡ|.'

        >>> from TTS.tts.utils.text.phonemizers import KO_KR_Phonemizer
        >>> phonemizer = KO_KR_Phonemizer()
        >>> phonemizer.phonemize("이 문장은 음성합성 테스트를 위한 문장입니다.", separator="|", character='english')
        'I| |M|u|n|J|a|n|g|E|u| |N|e|u|m|S|e|o|n|g|H|a|b|S|s|e|o|n|g| |T|e|S|e|u|T|e|u|L|e|u| |L|w|i|H|a|n| |M|u|n|J|a|n|g|I|m|N|i|D|a|.'

    """

    language = "ko-kr"

    def __init__(self, punctuations=_DEF_KO_PUNCS, keep_puncs=True, **kwargs):  # pylint: disable=unused-argument
        super().__init__(self.language, punctuations=punctuations, keep_puncs=keep_puncs)

    @staticmethod
    def name():
        return "ko_kr_phonemizer"

    def _phonemize(self, text: str, separator: str = "", character: str = "hangeul") -> str:
        ph = korean_text_to_phonemes(text, character=character)
        if separator is not None or separator != "":
            return separator.join(ph)
        return ph

    def phonemize(self, text: str, separator: str = "", character: str = "hangeul", language=None) -> str:
        return self._phonemize(text, separator, character)

    @staticmethod
    def supported_languages() -> Dict:
        return {"ko-kr": "hangeul(korean)"}

    def version(self) -> str:
        return "0.0.2"

    def is_available(self) -> bool:
        return True


if __name__ == "__main__":
    texts = "이 문장은 음성합성 테스트를 위한 문장입니다."
    e = KO_KR_Phonemizer()
    print(e.supported_languages())
    print(e.version())
    print(e.language)
    print(e.name())
    print(e.is_available())
    print(e.phonemize(texts))


================================================
FILE: TTS/tts/utils/text/phonemizers/multi_phonemizer.py
================================================
from typing import Dict, List

from TTS.tts.utils.text.phonemizers import DEF_LANG_TO_PHONEMIZER, get_phonemizer_by_name


class MultiPhonemizer:
    """🐸TTS multi-phonemizer that operates phonemizers for multiple langugages

    Args:
        custom_lang_to_phonemizer (Dict):
            Custom phonemizer mapping if you want to change the defaults. In the format of
            `{"lang_code", "phonemizer_name"}`. When it is None, `DEF_LANG_TO_PHONEMIZER` is used. Defaults to `{}`.

    TODO: find a way to pass custom kwargs to the phonemizers
    """

    lang_to_phonemizer = {}

    def __init__(self, lang_to_phonemizer_name: Dict = {}) -> None:  # pylint: disable=dangerous-default-value
        for k, v in lang_to_phonemizer_name.items():
            if v == "" and k in DEF_LANG_TO_PHONEMIZER.keys():
                lang_to_phonemizer_name[k] = DEF_LANG_TO_PHONEMIZER[k]
            elif v == "":
                raise ValueError(f"Phonemizer wasn't set for language {k} and doesn't have a default.")
        self.lang_to_phonemizer_name = lang_to_phonemizer_name
        self.lang_to_phonemizer = self.init_phonemizers(self.lang_to_phonemizer_name)

    @staticmethod
    def init_phonemizers(lang_to_phonemizer_name: Dict) -> Dict:
        lang_to_phonemizer = {}
        for k, v in lang_to_phonemizer_name.items():
            lang_to_phonemizer[k] = get_phonemizer_by_name(v, language=k)
        return lang_to_phonemizer

    @staticmethod
    def name():
        return "multi-phonemizer"

    def phonemize(self, text, separator="|", language=""):
        if language == "":
            raise ValueError("Language must be set for multi-phonemizer to phonemize.")
        return self.lang_to_phonemizer[language].phonemize(text, separator)

    def supported_languages(self) -> List:
        return list(self.lang_to_phonemizer.keys())

    def print_logs(self, level: int = 0):
        indent = "\t" * level
        print(f"{indent}| > phoneme language: {self.supported_languages()}")
        print(f"{indent}| > phoneme backend: {self.name()}")


# if __name__ == "__main__":
#     texts = {
#         "tr": "Merhaba, bu Türkçe bit örnek!",
#         "en-us": "Hello, this is English example!",
#         "de": "Hallo, das ist ein Deutches Beipiel!",
#         "zh-cn": "这是中国的例子",
#     }
#     phonemes = {}
#     ph = MultiPhonemizer({"tr": "espeak", "en-us": "", "de": "gruut", "zh-cn": ""})
#     for lang, text in texts.items():
#         phoneme = ph.phonemize(text, lang)
#         phonemes[lang] = phoneme
#     print(phonemes)


================================================
FILE: TTS/tts/utils/text/phonemizers/zh_cn_phonemizer.py
================================================
from typing import Dict

from TTS.tts.utils.text.chinese_mandarin.phonemizer import chinese_text_to_phonemes
from TTS.tts.utils.text.phonemizers.base import BasePhonemizer

_DEF_ZH_PUNCS = "、.,[]()?!〽~『』「」【】"


class ZH_CN_Phonemizer(BasePhonemizer):
    """🐸TTS Zh-Cn phonemizer using functions in `TTS.tts.utils.text.chinese_mandarin.phonemizer`

    Args:
        punctuations (str):
            Set of characters to be treated as punctuation. Defaults to `_DEF_ZH_PUNCS`.

        keep_puncs (bool):
            If True, keep the punctuations after phonemization. Defaults to False.

    Example ::

        "这是，样本中文。" -> `d|ʒ|ø|4| |ʂ|ʏ|4| |，| |i|ɑ|ŋ|4|b|œ|n|3| |d|ʒ|o|ŋ|1|w|œ|n|2| |。`

    TODO: someone with Mandarin knowledge should check this implementation
    """

    language = "zh-cn"

    def __init__(self, punctuations=_DEF_ZH_PUNCS, keep_puncs=False, **kwargs):  # pylint: disable=unused-argument
        super().__init__(self.language, punctuations=punctuations, keep_puncs=keep_puncs)

    @staticmethod
    def name():
        return "zh_cn_phonemizer"

    @staticmethod
    def phonemize_zh_cn(text: str, separator: str = "|") -> str:
        ph = chinese_text_to_phonemes(text, separator)
        return ph

    def _phonemize(self, text, separator):
        return self.phonemize_zh_cn(text, separator)

    @staticmethod
    def supported_languages() -> Dict:
        return {"zh-cn": "Chinese (China)"}

    def version(self) -> str:
        return "0.0.1"

    def is_available(self) -> bool:
        return True


# if __name__ == "__main__":
#     text = "这是，样本中文。"
#     e = ZH_CN_Phonemizer()
#     print(e.supported_languages())
#     print(e.version())
#     print(e.language)
#     print(e.name())
#     print(e.is_available())
#     print("`" + e.phonemize(text) + "`")


================================================
FILE: TTS/tts/utils/text/punctuation.py
================================================
import collections
import re
from enum import Enum

import six

_DEF_PUNCS = ';:,.!?¡¿—…"«»“”'

_PUNC_IDX = collections.namedtuple("_punc_index", ["punc", "position"])


class PuncPosition(Enum):
    """Enum for the punctuations positions"""

    BEGIN = 0
    END = 1
    MIDDLE = 2
    ALONE = 3


class Punctuation:
    """Handle punctuations in text.

    Just strip punctuations from text or strip and restore them later.

    Args:
        puncs (str): The punctuations to be processed. Defaults to `_DEF_PUNCS`.

    Example:
        >>> punc = Punctuation()
        >>> punc.strip("This is. example !")
        'This is example'

        >>> text_striped, punc_map = punc.strip_to_restore("This is. example !")
        >>> ' '.join(text_striped)
        'This is example'

        >>> text_restored = punc.restore(text_striped, punc_map)
        >>> text_restored[0]
        'This is. example !'
    """

    def __init__(self, puncs: str = _DEF_PUNCS):
        self.puncs = puncs

    @staticmethod
    def default_puncs():
        """Return default set of punctuations."""
        return _DEF_PUNCS

    @property
    def puncs(self):
        return self._puncs

    @puncs.setter
    def puncs(self, value):
        if not isinstance(value, six.string_types):
            raise ValueError("[!] Punctuations must be of type str.")
        self._puncs = "".join(list(dict.fromkeys(list(value))))  # remove duplicates without changing the oreder
        self.puncs_regular_exp = re.compile(rf"(\s*[{re.escape(self._puncs)}]+\s*)+")

    def strip(self, text):
        """Remove all the punctuations by replacing with `space`.

        Args:
            text (str): The text to be processed.

        Example::

            "This is. example !" -> "This is example "
        """
        return re.sub(self.puncs_regular_exp, " ", text).rstrip().lstrip()

    def strip_to_restore(self, text):
        """Remove punctuations from text to restore them later.

        Args:
            text (str): The text to be processed.

        Examples ::

            "This is. example !" -> [["This is", "example"], [".", "!"]]

        """
        text, puncs = self._strip_to_restore(text)
        return text, puncs

    def _strip_to_restore(self, text):
        """Auxiliary method for Punctuation.preserve()"""
        matches = list(re.finditer(self.puncs_regular_exp, text))
        if not matches:
            return [text], []
        # the text is only punctuations
        if len(matches) == 1 and matches[0].group() == text:
            return [], [_PUNC_IDX(text, PuncPosition.ALONE)]
        # build a punctuation map to be used later to restore punctuations
        puncs = []
        for match in matches:
            position = PuncPosition.MIDDLE
            if match == matches[0] and text.startswith(match.group()):
                position = PuncPosition.BEGIN
            elif match == matches[-1] and text.endswith(match.group()):
                position = PuncPosition.END
            puncs.append(_PUNC_IDX(match.group(), position))
        # convert str text to a List[str], each item is separated by a punctuation
        splitted_text = []
        for idx, punc in enumerate(puncs):
            split = text.split(punc.punc)
            prefix, suffix = split[0], punc.punc.join(split[1:])
            splitted_text.append(prefix)
            # if the text does not end with a punctuation, add it to the last item
            if idx == len(puncs) - 1 and len(suffix) > 0:
                splitted_text.append(suffix)
            text = suffix
        return splitted_text, puncs

    @classmethod
    def restore(cls, text, puncs):
        """Restore punctuation in a text.

        Args:
            text (str): The text to be processed.
            puncs (List[str]): The list of punctuations map to be used for restoring.

        Examples ::

            ['This is', 'example'], ['.', '!'] -> "This is. example!"

        """
        return cls._restore(text, puncs, 0)

    @classmethod
    def _restore(cls, text, puncs, num):  # pylint: disable=too-many-return-statements
        """Auxiliary method for Punctuation.restore()"""
        if not puncs:
            return text

        # nothing have been phonemized, returns the puncs alone
        if not text:
            return ["".join(m.punc for m in puncs)]

        current = puncs[0]

        if current.position == PuncPosition.BEGIN:
            return cls._restore([current.punc + text[0]] + text[1:], puncs[1:], num)

        if current.position == PuncPosition.END:
            return [text[0] + current.punc] + cls._restore(text[1:], puncs[1:], num + 1)

        if current.position == PuncPosition.ALONE:
            return [current.mark] + cls._restore(text, puncs[1:], num + 1)

        # POSITION == MIDDLE
        if len(text) == 1:  # pragma: nocover
            # a corner case where the final part of an intermediate
            # mark (I) has not been phonemized
            return cls._restore([text[0] + current.punc], puncs[1:], num)

        return cls._restore([text[0] + current.punc + text[1]] + text[2:], puncs[1:], num)


# if __name__ == "__main__":
#     punc = Punctuation()
#     text = "This is. This is, example!"

#     print(punc.strip(text))

#     split_text, puncs = punc.strip_to_restore(text)
#     print(split_text, " ---- ", puncs)

#     restored_text = punc.restore(split_text, puncs)
#     print(restored_text)


================================================
FILE: TTS/tts/utils/text/tokenizer.py
================================================
from typing import Callable, Dict, List, Union

from TTS.tts.utils.text import cleaners
from TTS.tts.utils.text.characters import Graphemes, IPAPhonemes
from TTS.tts.utils.text.phonemizers import DEF_LANG_TO_PHONEMIZER, get_phonemizer_by_name
from TTS.tts.utils.text.phonemizers.multi_phonemizer import MultiPhonemizer
from TTS.utils.generic_utils import get_import_path, import_class


class TTSTokenizer:
    """🐸TTS tokenizer to convert input characters to token IDs and back.

    Token IDs for OOV chars are discarded but those are stored in `self.not_found_characters` for later.

    Args:
        use_phonemes (bool):
            Whether to use phonemes instead of characters. Defaults to False.

        characters (Characters):
            A Characters object to use for character-to-ID and ID-to-character mappings.

        text_cleaner (callable):
            A function to pre-process the text before tokenization and phonemization. Defaults to None.

        phonemizer (Phonemizer):
            A phonemizer object or a dict that maps language codes to phonemizer objects. Defaults to None.

    Example:

        >>> from TTS.tts.utils.text.tokenizer import TTSTokenizer
        >>> tokenizer = TTSTokenizer(use_phonemes=False, characters=Graphemes())
        >>> text = "Hello world!"
        >>> ids = tokenizer.text_to_ids(text)
        >>> text_hat = tokenizer.ids_to_text(ids)
        >>> assert text == text_hat
    """

    def __init__(
        self,
        use_phonemes=False,
        text_cleaner: Callable = None,
        characters: "BaseCharacters" = None,
        phonemizer: Union["Phonemizer", Dict] = None,
        add_blank: bool = False,
        use_eos_bos=False,
    ):
        self.text_cleaner = text_cleaner
        self.use_phonemes = use_phonemes
        self.add_blank = add_blank
        self.use_eos_bos = use_eos_bos
        self.characters = characters
        self.not_found_characters = []
        self.phonemizer = phonemizer

    @property
    def characters(self):
        return self._characters

    @characters.setter
    def characters(self, new_characters):
        self._characters = new_characters
        self.pad_id = self.characters.char_to_id(self.characters.pad) if self.characters.pad else None
        self.blank_id = self.characters.char_to_id(self.characters.blank) if self.characters.blank else None

    def encode(self, text: str) -> List[int]:
        """Encodes a string of text as a sequence of IDs."""
        token_ids = []
        for char in text:
            try:
                idx = self.characters.char_to_id(char)
                token_ids.append(idx)
            except KeyError:
                # discard but store not found characters
                if char not in self.not_found_characters:
                    self.not_found_characters.append(char)
                    print(text)
                    print(f" [!] Character {repr(char)} not found in the vocabulary. Discarding it.")
        return token_ids

    def decode(self, token_ids: List[int]) -> str:
        """Decodes a sequence of IDs to a string of text."""
        text = ""
        for token_id in token_ids:
            text += self.characters.id_to_char(token_id)
        return text

    def text_to_ids(self, text: str, language: str = None) -> List[int]:  # pylint: disable=unused-argument
        """Converts a string of text to a sequence of token IDs.

        Args:
            text(str):
                The text to convert to token IDs.

            language(str):
                The language code of the text. Defaults to None.

        TODO:
            - Add support for language-specific processing.

        1. Text normalizatin
        2. Phonemization (if use_phonemes is True)
        3. Add blank char between characters
        4. Add BOS and EOS characters
        5. Text to token IDs
        """
        # TODO: text cleaner should pick the right routine based on the language
        if self.text_cleaner is not None:
            text = self.text_cleaner(text)
        if self.use_phonemes:
            text = self.phonemizer.phonemize(text, separator="", language=language)
        if self.add_blank:
            text = self.intersperse_blank_char(text, True)
        if self.use_eos_bos:
            text = self.pad_with_bos_eos(text)
        return self.encode(text)

    def ids_to_text(self, id_sequence: List[int]) -> str:
        """Converts a sequence of token IDs to a string of text."""
        return self.decode(id_sequence)

    def pad_with_bos_eos(self, char_sequence: List[str]):
        """Pads a sequence with the special BOS and EOS characters."""
        return [self.characters.bos] + list(char_sequence) + [self.characters.eos]

    def intersperse_blank_char(self, char_sequence: List[str], use_blank_char: bool = False):
        """Intersperses the blank character between characters in a sequence.

        Use the ```blank``` character if defined else use the ```pad``` character.
        """
        char_to_use = self.characters.blank if use_blank_char else self.characters.pad
        result = [char_to_use] * (len(char_sequence) * 2 + 1)
        result[1::2] = char_sequence
        return result

    def print_logs(self, level: int = 0):
        indent = "\t" * level
        print(f"{indent}| > add_blank: {self.add_blank}")
        print(f"{indent}| > use_eos_bos: {self.use_eos_bos}")
        print(f"{indent}| > use_phonemes: {self.use_phonemes}")
        if self.use_phonemes:
            print(f"{indent}| > phonemizer:")
            self.phonemizer.print_logs(level + 1)
        if len(self.not_found_characters) > 0:
            print(f"{indent}| > {len(self.not_found_characters)} not found characters:")
            for char in self.not_found_characters:
                print(f"{indent}| > {char}")

    @staticmethod
    def init_from_config(config: "Coqpit", characters: "BaseCharacters" = None):
        """Init Tokenizer object from config

        Args:
            config (Coqpit): Coqpit model config.
            characters (BaseCharacters): Defines the model character set. If not set, use the default options based on
                the config values. Defaults to None.
        """
        # init cleaners
        text_cleaner = None
        if isinstance(config.text_cleaner, (str, list)):
            text_cleaner = getattr(cleaners, config.text_cleaner)

        # init characters
        if characters is None:
            # set characters based on defined characters class
            if config.characters and config.characters.characters_class:
                CharactersClass = import_class(config.characters.characters_class)
                characters, new_config = CharactersClass.init_from_config(config)
            # set characters based on config
            else:
                if config.use_phonemes:
                    # init phoneme set
                    characters, new_config = IPAPhonemes().init_from_config(config)
                else:
                    # init character set
                    characters, new_config = Graphemes().init_from_config(config)

        else:
            characters, new_config = characters.init_from_config(config)

        # set characters class
        new_config.characters.characters_class = get_import_path(characters)

        # init phonemizer
        phonemizer = None
        if config.use_phonemes:
            if "phonemizer" in config and config.phonemizer == "multi_phonemizer":
                lang_to_phonemizer_name = {}
                for dataset in config.datasets:
                    if dataset.language != "":
                        lang_to_phonemizer_name[dataset.language] = dataset.phonemizer
                    else:
                        raise ValueError("Multi phonemizer requires language to be set for each dataset.")
                phonemizer = MultiPhonemizer(lang_to_phonemizer_name)
            else:
                phonemizer_kwargs = {"language": config.phoneme_language}
                if "phonemizer" in config and config.phonemizer:
                    phonemizer = get_phonemizer_by_name(config.phonemizer, **phonemizer_kwargs)
                else:
                    try:
                        phonemizer = get_phonemizer_by_name(
                            DEF_LANG_TO_PHONEMIZER[config.phoneme_language], **phonemizer_kwargs
                        )
                        new_config.phonemizer = phonemizer.name()
                    except KeyError as e:
                        raise ValueError(
                            f"""No phonemizer found for language {config.phoneme_language}.
                            You may need to install a third party library for this language."""
                        ) from e

        return (
            TTSTokenizer(
                config.use_phonemes, text_cleaner, characters, phonemizer, config.add_blank, config.enable_eos_bos_chars
            ),
            new_config,
        )


================================================
FILE: TTS/tts/utils/visual.py
================================================
import librosa
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import torch
from matplotlib.colors import LogNorm

matplotlib.use("Agg")


def plot_alignment(alignment, info=None, fig_size=(16, 10), title=None, output_fig=False, plot_log=False):
    if isinstance(alignment, torch.Tensor):
        alignment_ = alignment.detach().cpu().numpy().squeeze()
    else:
        alignment_ = alignment
    alignment_ = alignment_.astype(np.float32) if alignment_.dtype == np.float16 else alignment_
    fig, ax = plt.subplots(figsize=fig_size)
    im = ax.imshow(
        alignment_.T, aspect="auto", origin="lower", interpolation="none", norm=LogNorm() if plot_log else None
    )
    fig.colorbar(im, ax=ax)
    xlabel = "Decoder timestep"
    if info is not None:
        xlabel += "\n\n" + info
    plt.xlabel(xlabel)
    plt.ylabel("Encoder timestep")
    # plt.yticks(range(len(text)), list(text))
    plt.tight_layout()
    if title is not None:
        plt.title(title)
    if not output_fig:
        plt.close()
    return fig


def plot_spectrogram(spectrogram, ap=None, fig_size=(16, 10), output_fig=False):
    if isinstance(spectrogram, torch.Tensor):
        spectrogram_ = spectrogram.detach().cpu().numpy().squeeze().T
    else:
        spectrogram_ = spectrogram.T
    spectrogram_ = spectrogram_.astype(np.float32) if spectrogram_.dtype == np.float16 else spectrogram_
    if ap is not None:
        spectrogram_ = ap.denormalize(spectrogram_)  # pylint: disable=protected-access
    fig = plt.figure(figsize=fig_size)
    plt.imshow(spectrogram_, aspect="auto", origin="lower")
    plt.colorbar()
    plt.tight_layout()
    if not output_fig:
        plt.close()
    return fig


def plot_pitch(pitch, spectrogram, ap=None, fig_size=(30, 10), output_fig=False):
    """Plot pitch curves on top of the spectrogram.

    Args:
        pitch (np.array): Pitch values.
        spectrogram (np.array): Spectrogram values.

    Shapes:
        pitch: :math:`(T,)`
        spec: :math:`(C, T)`
    """

    if isinstance(spectrogram, torch.Tensor):
        spectrogram_ = spectrogram.detach().cpu().numpy().squeeze().T
    else:
        spectrogram_ = spectrogram.T
    spectrogram_ = spectrogram_.astype(np.float32) if spectrogram_.dtype == np.float16 else spectrogram_
    if ap is not None:
        spectrogram_ = ap.denormalize(spectrogram_)  # pylint: disable=protected-access

    old_fig_size = plt.rcParams["figure.figsize"]
    if fig_size is not None:
        plt.rcParams["figure.figsize"] = fig_size

    fig, ax = plt.subplots()

    ax.imshow(spectrogram_, aspect="auto", origin="lower")
    ax.set_xlabel("time")
    ax.set_ylabel("spec_freq")

    ax2 = ax.twinx()
    ax2.plot(pitch, linewidth=5.0, color="red")
    ax2.set_ylabel("F0")

    plt.rcParams["figure.figsize"] = old_fig_size
    if not output_fig:
        plt.close()
    return fig


def plot_avg_pitch(pitch, chars, fig_size=(30, 10), output_fig=False):
    """Plot pitch curves on top of the input characters.

    Args:
        pitch (np.array): Pitch values.
        chars (str): Characters to place to the x-axis.

    Shapes:
        pitch: :math:`(T,)`
    """
    old_fig_size = plt.rcParams["figure.figsize"]
    if fig_size is not None:
        plt.rcParams["figure.figsize"] = fig_size

    fig, ax = plt.subplots()

    x = np.array(range(len(chars)))
    my_xticks = chars
    plt.xticks(x, my_xticks)

    ax.set_xlabel("characters")
    ax.set_ylabel("freq")

    ax2 = ax.twinx()
    ax2.plot(pitch, linewidth=5.0, color="red")
    ax2.set_ylabel("F0")

    plt.rcParams["figure.figsize"] = old_fig_size
    if not output_fig:
        plt.close()
    return fig


def plot_avg_energy(energy, chars, fig_size=(30, 10), output_fig=False):
    """Plot energy curves on top of the input characters.

    Args:
        energy (np.array): energy values.
        chars (str): Characters to place to the x-axis.

    Shapes:
        energy: :math:`(T,)`
    """
    old_fig_size = plt.rcParams["figure.figsize"]
    if fig_size is not None:
        plt.rcParams["figure.figsize"] = fig_size

    fig, ax = plt.subplots()

    x = np.array(range(len(chars)))
    my_xticks = chars
    plt.xticks(x, my_xticks)

    ax.set_xlabel("characters")
    ax.set_ylabel("freq")

    ax2 = ax.twinx()
    ax2.plot(energy, linewidth=5.0, color="red")
    ax2.set_ylabel("energy")

    plt.rcParams["figure.figsize"] = old_fig_size
    if not output_fig:
        plt.close()
    return fig


def visualize(
    alignment,
    postnet_output,
    text,
    hop_length,
    CONFIG,
    tokenizer,
    stop_tokens=None,
    decoder_output=None,
    output_path=None,
    figsize=(8, 24),
    output_fig=False,
):
    """Intended to be used in Notebooks."""

    if decoder_output is not None:
        num_plot = 4
    else:
        num_plot = 3

    label_fontsize = 16
    fig = plt.figure(figsize=figsize)

    plt.subplot(num_plot, 1, 1)
    plt.imshow(alignment.T, aspect="auto", origin="lower", interpolation=None)
    plt.xlabel("Decoder timestamp", fontsize=label_fontsize)
    plt.ylabel("Encoder timestamp", fontsize=label_fontsize)
    # compute phoneme representation and back
    if CONFIG.use_phonemes:
        seq = tokenizer.text_to_ids(text)
        text = tokenizer.ids_to_text(seq)
        print(text)
    plt.yticks(range(len(text)), list(text))
    plt.colorbar()

    if stop_tokens is not None:
        # plot stopnet predictions
        plt.subplot(num_plot, 1, 2)
        plt.plot(range(len(stop_tokens)), list(stop_tokens))

    # plot postnet spectrogram
    plt.subplot(num_plot, 1, 3)
    librosa.display.specshow(
        postnet_output.T,
        sr=CONFIG.audio["sample_rate"],
        hop_length=hop_length,
        x_axis="time",
        y_axis="linear",
        fmin=CONFIG.audio["mel_fmin"],
        fmax=CONFIG.audio["mel_fmax"],
    )

    plt.xlabel("Time", fontsize=label_fontsize)
    plt.ylabel("Hz", fontsize=label_fontsize)
    plt.tight_layout()
    plt.colorbar()

    if decoder_output is not None:
        plt.subplot(num_plot, 1, 4)
        librosa.display.specshow(
            decoder_output.T,
            sr=CONFIG.audio["sample_rate"],
            hop_length=hop_length,
            x_axis="time",
            y_axis="linear",
            fmin=CONFIG.audio["mel_fmin"],
            fmax=CONFIG.audio["mel_fmax"],
        )
        plt.xlabel("Time", fontsize=label_fontsize)
        plt.ylabel("Hz", fontsize=label_fontsize)
        plt.tight_layout()
        plt.colorbar()

    if output_path:
        print(output_path)
        fig.savefig(output_path)
        plt.close()

    if not output_fig:
        plt.close()


================================================
FILE: TTS/utils/__init__.py
================================================


================================================
FILE: TTS/utils/audio/__init__.py
================================================
from TTS.utils.audio.processor import AudioProcessor


================================================
FILE: TTS/utils/audio/numpy_transforms.py
================================================
from typing import Tuple

import librosa
import numpy as np
import scipy
import soundfile as sf
from librosa import magphase, pyin

# For using kwargs
# pylint: disable=unused-argument


def build_mel_basis(
    *,
    sample_rate: int = None,
    fft_size: int = None,
    num_mels: int = None,
    mel_fmax: int = None,
    mel_fmin: int = None,
    **kwargs,
) -> np.ndarray:
    """Build melspectrogram basis.

    Returns:
        np.ndarray: melspectrogram basis.
    """
    if mel_fmax is not None:
        assert mel_fmax <= sample_rate // 2
        assert mel_fmax - mel_fmin > 0
    return librosa.filters.mel(sr=sample_rate, n_fft=fft_size, n_mels=num_mels, fmin=mel_fmin, fmax=mel_fmax)


def millisec_to_length(
    *, frame_length_ms: int = None, frame_shift_ms: int = None, sample_rate: int = None, **kwargs
) -> Tuple[int, int]:
    """Compute hop and window length from milliseconds.

    Returns:
        Tuple[int, int]: hop length and window length for STFT.
    """
    factor = frame_length_ms / frame_shift_ms
    assert (factor).is_integer(), " [!] frame_shift_ms should divide frame_length_ms"
    win_length = int(frame_length_ms / 1000.0 * sample_rate)
    hop_length = int(win_length / float(factor))
    return win_length, hop_length


def _log(x, base):
    if base == 10:
        return np.log10(x)
    return np.log(x)


def _exp(x, base):
    if base == 10:
        return np.power(10, x)
    return np.exp(x)


def amp_to_db(*, x: np.ndarray = None, gain: float = 1, base: int = 10, **kwargs) -> np.ndarray:
    """Convert amplitude values to decibels.

    Args:
        x (np.ndarray): Amplitude spectrogram.
        gain (float): Gain factor. Defaults to 1.
        base (int): Logarithm base. Defaults to 10.

    Returns:
        np.ndarray: Decibels spectrogram.
    """
    assert (x < 0).sum() == 0, " [!] Input values must be non-negative."
    return gain * _log(np.maximum(1e-8, x), base)


# pylint: disable=no-self-use
def db_to_amp(*, x: np.ndarray = None, gain: float = 1, base: int = 10, **kwargs) -> np.ndarray:
    """Convert decibels spectrogram to amplitude spectrogram.

    Args:
        x (np.ndarray): Decibels spectrogram.
        gain (float): Gain factor. Defaults to 1.
        base (int): Logarithm base. Defaults to 10.

    Returns:
        np.ndarray: Amplitude spectrogram.
    """
    return _exp(x / gain, base)


def preemphasis(*, x: np.ndarray, coef: float = 0.97, **kwargs) -> np.ndarray:
    """Apply pre-emphasis to the audio signal. Useful to reduce the correlation between neighbouring signal values.

    Args:
        x (np.ndarray): Audio signal.

    Raises:
        RuntimeError: Preemphasis coeff is set to 0.

    Returns:
        np.ndarray: Decorrelated audio signal.
    """
    if coef == 0:
        raise RuntimeError(" [!] Preemphasis is set 0.0.")
    return scipy.signal.lfilter([1, -coef], [1], x)


def deemphasis(*, x: np.ndarray = None, coef: float = 0.97, **kwargs) -> np.ndarray:
    """Reverse pre-emphasis."""
    if coef == 0:
        raise RuntimeError(" [!] Preemphasis is set 0.0.")
    return scipy.signal.lfilter([1], [1, -coef], x)


def spec_to_mel(*, spec: np.ndarray, mel_basis: np.ndarray = None, **kwargs) -> np.ndarray:
    """Convert a full scale linear spectrogram output of a network to a melspectrogram.

    Args:
        spec (np.ndarray): Normalized full scale linear spectrogram.

    Shapes:
        - spec: :math:`[C, T]`

    Returns:
        np.ndarray: Normalized melspectrogram.
    """
    return np.dot(mel_basis, spec)


def mel_to_spec(*, mel: np.ndarray = None, mel_basis: np.ndarray = None, **kwargs) -> np.ndarray:
    """Convert a melspectrogram to full scale spectrogram."""
    assert (mel < 0).sum() == 0, " [!] Input values must be non-negative."
    inv_mel_basis = np.linalg.pinv(mel_basis)
    return np.maximum(1e-10, np.dot(inv_mel_basis, mel))


def wav_to_spec(*, wav: np.ndarray = None, **kwargs) -> np.ndarray:
    """Compute a spectrogram from a waveform.

    Args:
        wav (np.ndarray): Waveform. Shape :math:`[T_wav,]`

    Returns:
        np.ndarray: Spectrogram. Shape :math:`[C, T_spec]`. :math:`T_spec == T_wav / hop_length`
    """
    D = stft(y=wav, **kwargs)
    S = np.abs(D)
    return S.astype(np.float32)


def wav_to_mel(*, wav: np.ndarray = None, mel_basis=None, **kwargs) -> np.ndarray:
    """Compute a melspectrogram from a waveform."""
    D = stft(y=wav, **kwargs)
    S = spec_to_mel(spec=np.abs(D), mel_basis=mel_basis, **kwargs)
    return S.astype(np.float32)


def spec_to_wav(*, spec: np.ndarray, power: float = 1.5, **kwargs) -> np.ndarray:
    """Convert a spectrogram to a waveform using Griffi-Lim vocoder."""
    S = spec.copy()
    return griffin_lim(spec=S**power, **kwargs)


def mel_to_wav(*, mel: np.ndarray = None, power: float = 1.5, **kwargs) -> np.ndarray:
    """Convert a melspectrogram to a waveform using Griffi-Lim vocoder."""
    S = mel.copy()
    S = mel_to_spec(mel=S, mel_basis=kwargs["mel_basis"])  # Convert back to linear
    return griffin_lim(spec=S**power, **kwargs)


### STFT and ISTFT ###
def stft(
    *,
    y: np.ndarray = None,
    fft_size: int = None,
    hop_length: int = None,
    win_length: int = None,
    pad_mode: str = "reflect",
    window: str = "hann",
    center: bool = True,
    **kwargs,
) -> np.ndarray:
    """Librosa STFT wrapper.

    Check http://librosa.org/doc/main/generated/librosa.stft.html argument details.

    Returns:
        np.ndarray: Complex number array.
    """
    return librosa.stft(
        y=y,
        n_fft=fft_size,
        hop_length=hop_length,
        win_length=win_length,
        pad_mode=pad_mode,
        window=window,
        center=center,
    )


def istft(
    *,
    y: np.ndarray = None,
    fft_size: int = None,
    hop_length: int = None,
    win_length: int = None,
    window: str = "hann",
    center: bool = True,
    **kwargs,
) -> np.ndarray:
    """Librosa iSTFT wrapper.

    Check http://librosa.org/doc/main/generated/librosa.istft.html argument details.

    Returns:
        np.ndarray: Complex number array.
    """
    return librosa.istft(y, hop_length=hop_length, win_length=win_length, center=center, window=window)


def griffin_lim(*, spec: np.ndarray = None, num_iter=60, **kwargs) -> np.ndarray:
    angles = np.exp(2j * np.pi * np.random.rand(*spec.shape))
    S_complex = np.abs(spec).astype(np.complex)
    y = istft(y=S_complex * angles, **kwargs)
    if not np.isfinite(y).all():
        print(" [!] Waveform is not finite everywhere. Skipping the GL.")
        return np.array([0.0])
    for _ in range(num_iter):
        angles = np.exp(1j * np.angle(stft(y=y, **kwargs)))
        y = istft(y=S_complex * angles, **kwargs)
    return y


def compute_stft_paddings(
    *, x: np.ndarray = None, hop_length: int = None, pad_two_sides: bool = False, **kwargs
) -> Tuple[int, int]:
    """Compute paddings used by Librosa's STFT. Compute right padding (final frame) or both sides padding
    (first and final frames)"""
    pad = (x.shape[0] // hop_length + 1) * hop_length - x.shape[0]
    if not pad_two_sides:
        return 0, pad
    return pad // 2, pad // 2 + pad % 2


def compute_f0(
    *,
    x: np.ndarray = None,
    pitch_fmax: float = None,
    pitch_fmin: float = None,
    hop_length: int = None,
    win_length: int = None,
    sample_rate: int = None,
    stft_pad_mode: str = "reflect",
    center: bool = True,
    **kwargs,
) -> np.ndarray:
    """Compute pitch (f0) of a waveform using the same parameters used for computing melspectrogram.

    Args:
        x (np.ndarray): Waveform. Shape :math:`[T_wav,]`
        pitch_fmax (float): Pitch max value.
        pitch_fmin (float): Pitch min value.
        hop_length (int): Number of frames between STFT columns.
        win_length (int): STFT window length.
        sample_rate (int): Audio sampling rate.
        stft_pad_mode (str): Padding mode for STFT.
        center (bool): Centered padding.

    Returns:
        np.ndarray: Pitch. Shape :math:`[T_pitch,]`. :math:`T_pitch == T_wav / hop_length`

    Examples:
        >>> WAV_FILE = filename = librosa.util.example_audio_file()
        >>> from TTS.config import BaseAudioConfig
        >>> from TTS.utils.audio import AudioProcessor
        >>> conf = BaseAudioConfig(pitch_fmax=640, pitch_fmin=1)
        >>> ap = AudioProcessor(**conf)
        >>> wav = ap.load_wav(WAV_FILE, sr=ap.sample_rate)[:5 * ap.sample_rate]
        >>> pitch = ap.compute_f0(wav)
    """
    assert pitch_fmax is not None, " [!] Set `pitch_fmax` before caling `compute_f0`."
    assert pitch_fmin is not None, " [!] Set `pitch_fmin` before caling `compute_f0`."

    f0, voiced_mask, _ = pyin(
        y=x.astype(np.double),
        fmin=pitch_fmin,
        fmax=pitch_fmax,
        sr=sample_rate,
        frame_length=win_length,
        win_length=win_length // 2,
        hop_length=hop_length,
        pad_mode=stft_pad_mode,
        center=center,
        n_thresholds=100,
        beta_parameters=(2, 18),
        boltzmann_parameter=2,
        resolution=0.1,
        max_transition_rate=35.92,
        switch_prob=0.01,
        no_trough_prob=0.01,
    )
    f0[~voiced_mask] = 0.0

    return f0


def compute_energy(y: np.ndarray, **kwargs) -> np.ndarray:
    """Compute energy of a waveform using the same parameters used for computing melspectrogram.
    Args:
      x (np.ndarray): Waveform. Shape :math:`[T_wav,]`
    Returns:
      np.ndarray: energy. Shape :math:`[T_energy,]`. :math:`T_energy == T_wav / hop_length`
    Examples:
      >>> WAV_FILE = filename = librosa.util.example_audio_file()
      >>> from TTS.config import BaseAudioConfig
      >>> from TTS.utils.audio import AudioProcessor
      >>> conf = BaseAudioConfig()
      >>> ap = AudioProcessor(**conf)
      >>> wav = ap.load_wav(WAV_FILE, sr=ap.sample_rate)[:5 * ap.sample_rate]
      >>> energy = ap.compute_energy(wav)
    """
    x = stft(y=y, **kwargs)
    mag, _ = magphase(x)
    energy = np.sqrt(np.sum(mag**2, axis=0))
    return energy


### Audio Processing ###
def find_endpoint(
    *,
    wav: np.ndarray = None,
    trim_db: float = -40,
    sample_rate: int = None,
    min_silence_sec=0.8,
    gain: float = None,
    base: int = None,
    **kwargs,
) -> int:
    """Find the last point without silence at the end of a audio signal.

    Args:
        wav (np.ndarray): Audio signal.
        threshold_db (int, optional): Silence threshold in decibels. Defaults to -40.
        min_silence_sec (float, optional): Ignore silences that are shorter then this in secs. Defaults to 0.8.
        gian (float, optional): Gain to be used to convert trim_db to trim_amp. Defaults to None.
        base (int, optional): Base of the logarithm used to convert trim_db to trim_amp. Defaults to 10.

    Returns:
        int: Last point without silence.
    """
    window_length = int(sample_rate * min_silence_sec)
    hop_length = int(window_length / 4)
    threshold = db_to_amp(x=-trim_db, gain=gain, base=base)
    for x in range(hop_length, len(wav) - window_length, hop_length):
        if np.max(wav[x : x + window_length]) < threshold:
            return x + hop_length
    return len(wav)


def trim_silence(
    *,
    wav: np.ndarray = None,
    sample_rate: int = None,
    trim_db: float = None,
    win_length: int = None,
    hop_length: int = None,
    **kwargs,
) -> np.ndarray:
    """Trim silent parts with a threshold and 0.01 sec margin"""
    margin = int(sample_rate * 0.01)
    wav = wav[margin:-margin]
    return librosa.effects.trim(wav, top_db=trim_db, frame_length=win_length, hop_length=hop_length)[0]


def volume_norm(*, x: np.ndarray = None, coef: float = 0.95, **kwargs) -> np.ndarray:
    """Normalize the volume of an audio signal.

    Args:
        x (np.ndarray): Raw waveform.
        coef (float): Coefficient to rescale the maximum value. Defaults to 0.95.

    Returns:
        np.ndarray: Volume normalized waveform.
    """
    return x / abs(x).max() * coef


def rms_norm(*, wav: np.ndarray = None, db_level: float = -27.0, **kwargs) -> np.ndarray:
    r = 10 ** (db_level / 20)
    a = np.sqrt((len(wav) * (r**2)) / np.sum(wav**2))
    return wav * a


def rms_volume_norm(*, x: np.ndarray, db_level: float = -27.0, **kwargs) -> np.ndarray:
    """Normalize the volume based on RMS of the signal.

    Args:
        x (np.ndarray): Raw waveform.
        db_level (float): Target dB level in RMS. Defaults to -27.0.

    Returns:
        np.ndarray: RMS normalized waveform.
    """
    assert -99 <= db_level <= 0, " [!] db_level should be between -99 and 0"
    wav = rms_norm(wav=x, db_level=db_level)
    return wav


def load_wav(*, filename: str, sample_rate: int = None, resample: bool = False, **kwargs) -> np.ndarray:
    """Read a wav file using Librosa and optionally resample, silence trim, volume normalize.

    Resampling slows down loading the file significantly. Therefore it is recommended to resample the file before.

    Args:
        filename (str): Path to the wav file.
        sr (int, optional): Sampling rate for resampling. Defaults to None.
        resample (bool, optional): Resample the audio file when loading. Slows down the I/O time. Defaults to False.

    Returns:
        np.ndarray: Loaded waveform.
    """
    if resample:
        # loading with resampling. It is significantly slower.
        x, _ = librosa.load(filename, sr=sample_rate)
    else:
        # SF is faster than librosa for loading files
        x, _ = sf.read(filename)
    return x


def save_wav(*, wav: np.ndarray, path: str, sample_rate: int = None, **kwargs) -> None:
    """Save float waveform to a file using Scipy.

    Args:
        wav (np.ndarray): Waveform with float values in range [-1, 1] to save.
        path (str): Path to a output file.
        sr (int, optional): Sampling rate used for saving to the file. Defaults to None.
    """
    wav_norm = wav * (32767 / max(0.01, np.max(np.abs(wav))))
    scipy.io.wavfile.write(path, sample_rate, wav_norm.astype(np.int16))


def mulaw_encode(*, wav: np.ndarray, mulaw_qc: int, **kwargs) -> np.ndarray:
    mu = 2**mulaw_qc - 1
    signal = np.sign(wav) * np.log(1 + mu * np.abs(wav)) / np.log(1.0 + mu)
    signal = (signal + 1) / 2 * mu + 0.5
    return np.floor(
        signal,
    )


def mulaw_decode(*, wav, mulaw_qc: int, **kwargs) -> np.ndarray:
    """Recovers waveform from quantized values."""
    mu = 2**mulaw_qc - 1
    x = np.sign(wav) / mu * ((1 + mu) ** np.abs(wav) - 1)
    return x


def encode_16bits(*, x: np.ndarray, **kwargs) -> np.ndarray:
    return np.clip(x * 2**15, -(2**15), 2**15 - 1).astype(np.int16)


def quantize(*, x: np.ndarray, quantize_bits: int, **kwargs) -> np.ndarray:
    """Quantize a waveform to a given number of bits.

    Args:
        x (np.ndarray): Waveform to quantize. Must be normalized into the range `[-1, 1]`.
        quantize_bits (int): Number of quantization bits.

    Returns:
        np.ndarray: Quantized waveform.
    """
    return (x + 1.0) * (2**quantize_bits - 1) / 2


def dequantize(*, x, quantize_bits, **kwargs) -> np.ndarray:
    """Dequantize a waveform from the given number of bits."""
    return 2 * x / (2**quantize_bits - 1) - 1


================================================
FILE: TTS/utils/audio/processor.py
================================================
from typing import Dict, Tuple

import librosa
import numpy as np
import scipy.io.wavfile
import scipy.signal
import soundfile as sf

from TTS.tts.utils.helpers import StandardScaler
from TTS.utils.audio.numpy_transforms import compute_f0

# pylint: disable=too-many-public-methods


class AudioProcessor(object):
    """Audio Processor for TTS.

    Note:
        All the class arguments are set to default values to enable a flexible initialization
        of the class with the model config. They are not meaningful for all the arguments.

    Args:
        sample_rate (int, optional):
            target audio sampling rate. Defaults to None.

        resample (bool, optional):
            enable/disable resampling of the audio clips when the target sampling rate does not match the original sampling rate. Defaults to False.

        num_mels (int, optional):
            number of melspectrogram dimensions. Defaults to None.

        log_func (int, optional):
            log exponent used for converting spectrogram aplitude to DB.

        min_level_db (int, optional):
            minimum db threshold for the computed melspectrograms. Defaults to None.

        frame_shift_ms (int, optional):
            milliseconds of frames between STFT columns. Defaults to None.

        frame_length_ms (int, optional):
            milliseconds of STFT window length. Defaults to None.

        hop_length (int, optional):
            number of frames between STFT columns. Used if ```frame_shift_ms``` is None. Defaults to None.

        win_length (int, optional):
            STFT window length. Used if ```frame_length_ms``` is None. Defaults to None.

        ref_level_db (int, optional):
            reference DB level to avoid background noise. In general <20DB corresponds to the air noise. Defaults to None.

        fft_size (int, optional):
            FFT window size for STFT. Defaults to 1024.

        power (int, optional):
            Exponent value applied to the spectrogram before GriffinLim. Defaults to None.

        preemphasis (float, optional):
            Preemphasis coefficient. Preemphasis is disabled if == 0.0. Defaults to 0.0.

        signal_norm (bool, optional):
            enable/disable signal normalization. Defaults to None.

        symmetric_norm (bool, optional):
            enable/disable symmetric normalization. If set True normalization is performed in the range [-k, k] else [0, k], Defaults to None.

        max_norm (float, optional):
            ```k``` defining the normalization range. Defaults to None.

        mel_fmin (int, optional):
            minimum filter frequency for computing melspectrograms. Defaults to None.

        mel_fmax (int, optional):
            maximum filter frequency for computing melspectrograms. Defaults to None.

        pitch_fmin (int, optional):
            minimum filter frequency for computing pitch. Defaults to None.

        pitch_fmax (int, optional):
            maximum filter frequency for computing pitch. Defaults to None.

        spec_gain (int, optional):
            gain applied when converting amplitude to DB. Defaults to 20.

        stft_pad_mode (str, optional):
            Padding mode for STFT. Defaults to 'reflect'.

        clip_norm (bool, optional):
            enable/disable clipping the our of range values in the normalized audio signal. Defaults to True.

        griffin_lim_iters (int, optional):
            Number of GriffinLim iterations. Defaults to None.

        do_trim_silence (bool, optional):
            enable/disable silence trimming when loading the audio signal. Defaults to False.

        trim_db (int, optional):
            DB threshold used for silence trimming. Defaults to 60.

        do_sound_norm (bool, optional):
            enable/disable signal normalization. Defaults to False.

        do_amp_to_db_linear (bool, optional):
            enable/disable amplitude to dB conversion of linear spectrograms. Defaults to True.

        do_amp_to_db_mel (bool, optional):
            enable/disable amplitude to dB conversion of mel spectrograms. Defaults to True.

        do_rms_norm (bool, optional):
            enable/disable RMS volume normalization when loading an audio file. Defaults to False.

        db_level (int, optional):
            dB level used for rms normalization. The range is -99 to 0. Defaults to None.

        stats_path (str, optional):
            Path to the computed stats file. Defaults to None.

        verbose (bool, optional):
            enable/disable logging. Defaults to True.

    """

    def __init__(
        self,
        sample_rate=None,
        resample=False,
        num_mels=None,
        log_func="np.log10",
        min_level_db=None,
        frame_shift_ms=None,
        frame_length_ms=None,
        hop_length=None,
        win_length=None,
        ref_level_db=None,
        fft_size=1024,
        power=None,
        preemphasis=0.0,
        signal_norm=None,
        symmetric_norm=None,
        max_norm=None,
        mel_fmin=None,
        mel_fmax=None,
        pitch_fmax=None,
        pitch_fmin=None,
        spec_gain=20,
        stft_pad_mode="reflect",
        clip_norm=True,
        griffin_lim_iters=None,
        do_trim_silence=False,
        trim_db=60,
        do_sound_norm=False,
        do_amp_to_db_linear=True,
        do_amp_to_db_mel=True,
        do_rms_norm=False,
        db_level=None,
        stats_path=None,
        verbose=True,
        **_,
    ):
        # setup class attributed
        self.sample_rate = sample_rate
        self.resample = resample
        self.num_mels = num_mels
        self.log_func = log_func
        self.min_level_db = min_level_db or 0
        self.frame_shift_ms = frame_shift_ms
        self.frame_length_ms = frame_length_ms
        self.ref_level_db = ref_level_db
        self.fft_size = fft_size
        self.power = power
        self.preemphasis = preemphasis
        self.griffin_lim_iters = griffin_lim_iters
        self.signal_norm = signal_norm
        self.symmetric_norm = symmetric_norm
        self.mel_fmin = mel_fmin or 0
        self.mel_fmax = mel_fmax
        self.pitch_fmin = pitch_fmin
        self.pitch_fmax = pitch_fmax
        self.spec_gain = float(spec_gain)
        self.stft_pad_mode = stft_pad_mode
        self.max_norm = 1.0 if max_norm is None else float(max_norm)
        self.clip_norm = clip_norm
        self.do_trim_silence = do_trim_silence
        self.trim_db = trim_db
        self.do_sound_norm = do_sound_norm
        self.do_amp_to_db_linear = do_amp_to_db_linear
        self.do_amp_to_db_mel = do_amp_to_db_mel
        self.do_rms_norm = do_rms_norm
        self.db_level = db_level
        self.stats_path = stats_path
        # setup exp_func for db to amp conversion
        if log_func == "np.log":
            self.base = np.e
        elif log_func == "np.log10":
            self.base = 10
        else:
            raise ValueError(" [!] unknown `log_func` value.")
        # setup stft parameters
        if hop_length is None:
            # compute stft parameters from given time values
            self.hop_length, self.win_length = self._stft_parameters()
        else:
            # use stft parameters from config file
            self.hop_length = hop_length
            self.win_length = win_length
        assert min_level_db != 0.0, " [!] min_level_db is 0"
        assert (
            self.win_length <= self.fft_size
        ), f" [!] win_length cannot be larger than fft_size - {self.win_length} vs {self.fft_size}"
        members = vars(self)
        if verbose:
            print(" > Setting up Audio Processor...")
            for key, value in members.items():
                print(" | > {}:{}".format(key, value))
        # create spectrogram utils
        self.mel_basis = self._build_mel_basis()
        self.inv_mel_basis = np.linalg.pinv(self._build_mel_basis())
        # setup scaler
        if stats_path and signal_norm:
            mel_mean, mel_std, linear_mean, linear_std, _ = self.load_stats(stats_path)
            self.setup_scaler(mel_mean, mel_std, linear_mean, linear_std)
            self.signal_norm = True
            self.max_norm = None
            self.clip_norm = None
            self.symmetric_norm = None

    @staticmethod
    def init_from_config(config: "Coqpit", verbose=True):
        if "audio" in config:
            return AudioProcessor(verbose=verbose, **config.audio)
        return AudioProcessor(verbose=verbose, **config)

    ### setting up the parameters ###
    def _build_mel_basis(
        self,
    ) -> np.ndarray:
        """Build melspectrogram basis.

        Returns:
            np.ndarray: melspectrogram basis.
        """
        if self.mel_fmax is not None:
            assert self.mel_fmax <= self.sample_rate // 2
        return librosa.filters.mel(
            self.sample_rate, self.fft_size, n_mels=self.num_mels, fmin=self.mel_fmin, fmax=self.mel_fmax
        )

    def _stft_parameters(
        self,
    ) -> Tuple[int, int]:
        """Compute the real STFT parameters from the time values.

        Returns:
            Tuple[int, int]: hop length and window length for STFT.
        """
        factor = self.frame_length_ms / self.frame_shift_ms
        assert (factor).is_integer(), " [!] frame_shift_ms should divide frame_length_ms"
        hop_length = int(self.frame_shift_ms / 1000.0 * self.sample_rate)
        win_length = int(hop_length * factor)
        return hop_length, win_length

    ### normalization ###
    def normalize(self, S: np.ndarray) -> np.ndarray:
        """Normalize values into `[0, self.max_norm]` or `[-self.max_norm, self.max_norm]`

        Args:
            S (np.ndarray): Spectrogram to normalize.

        Raises:
            RuntimeError: Mean and variance is computed from incompatible parameters.

        Returns:
            np.ndarray: Normalized spectrogram.
        """
        # pylint: disable=no-else-return
        S = S.copy()
        if self.signal_norm:
            # mean-var scaling
            if hasattr(self, "mel_scaler"):
                if S.shape[0] == self.num_mels:
                    return self.mel_scaler.transform(S.T).T
                elif S.shape[0] == self.fft_size / 2:
                    return self.linear_scaler.transform(S.T).T
                else:
                    raise RuntimeError(" [!] Mean-Var stats does not match the given feature dimensions.")
            # range normalization
            S -= self.ref_level_db  # discard certain range of DB assuming it is air noise
            S_norm = (S - self.min_level_db) / (-self.min_level_db)
            if self.symmetric_norm:
                S_norm = ((2 * self.max_norm) * S_norm) - self.max_norm
                if self.clip_norm:
                    S_norm = np.clip(
                        S_norm, -self.max_norm, self.max_norm  # pylint: disable=invalid-unary-operand-type
                    )
                return S_norm
            else:
                S_norm = self.max_norm * S_norm
                if self.clip_norm:
                    S_norm = np.clip(S_norm, 0, self.max_norm)
                return S_norm
        else:
            return S

    def denormalize(self, S: np.ndarray) -> np.ndarray:
        """Denormalize spectrogram values.

        Args:
            S (np.ndarray): Spectrogram to denormalize.

        Raises:
            RuntimeError: Mean and variance are incompatible.

        Returns:
            np.ndarray: Denormalized spectrogram.
        """
        # pylint: disable=no-else-return
        S_denorm = S.copy()
        if self.signal_norm:
            # mean-var scaling
            if hasattr(self, "mel_scaler"):
                if S_denorm.shape[0] == self.num_mels:
                    return self.mel_scaler.inverse_transform(S_denorm.T).T
                elif S_denorm.shape[0] == self.fft_size / 2:
                    return self.linear_scaler.inverse_transform(S_denorm.T).T
                else:
                    raise RuntimeError(" [!] Mean-Var stats does not match the given feature dimensions.")
            if self.symmetric_norm:
                if self.clip_norm:
                    S_denorm = np.clip(
                        S_denorm, -self.max_norm, self.max_norm  # pylint: disable=invalid-unary-operand-type
                    )
                S_denorm = ((S_denorm + self.max_norm) * -self.min_level_db / (2 * self.max_norm)) + self.min_level_db
                return S_denorm + self.ref_level_db
            else:
                if self.clip_norm:
                    S_denorm = np.clip(S_denorm, 0, self.max_norm)
                S_denorm = (S_denorm * -self.min_level_db / self.max_norm) + self.min_level_db
                return S_denorm + self.ref_level_db
        else:
            return S_denorm

    ### Mean-STD scaling ###
    def load_stats(self, stats_path: str) -> Tuple[np.array, np.array, np.array, np.array, Dict]:
        """Loading mean and variance statistics from a `npy` file.

        Args:
            stats_path (str): Path to the `npy` file containing

        Returns:
            Tuple[np.array, np.array, np.array, np.array, Dict]: loaded statistics and the config used to
                compute them.
        """
        stats = np.load(stats_path, allow_pickle=True).item()  # pylint: disable=unexpected-keyword-arg
        mel_mean = stats["mel_mean"]
        mel_std = stats["mel_std"]
        linear_mean = stats["linear_mean"]
        linear_std = stats["linear_std"]
        stats_config = stats["audio_config"]
        # check all audio parameters used for computing stats
        skip_parameters = ["griffin_lim_iters", "stats_path", "do_trim_silence", "ref_level_db", "power"]
        for key in stats_config.keys():
            if key in skip_parameters:
                continue
            if key not in ["sample_rate", "trim_db"]:
                assert (
                    stats_config[key] == self.__dict__[key]
                ), f" [!] Audio param {key} does not match the value used for computing mean-var stats. {stats_config[key]} vs {self.__dict__[key]}"
        return mel_mean, mel_std, linear_mean, linear_std, stats_config

    # pylint: disable=attribute-defined-outside-init
    def setup_scaler(
        self, mel_mean: np.ndarray, mel_std: np.ndarray, linear_mean: np.ndarray, linear_std: np.ndarray
    ) -> None:
        """Initialize scaler objects used in mean-std normalization.

        Args:
            mel_mean (np.ndarray): Mean for melspectrograms.
            mel_std (np.ndarray): STD for melspectrograms.
            linear_mean (np.ndarray): Mean for full scale spectrograms.
            linear_std (np.ndarray): STD for full scale spectrograms.
        """
        self.mel_scaler = StandardScaler()
        self.mel_scaler.set_stats(mel_mean, mel_std)
        self.linear_scaler = StandardScaler()
        self.linear_scaler.set_stats(linear_mean, linear_std)

    ### DB and AMP conversion ###
    # pylint: disable=no-self-use
    def _amp_to_db(self, x: np.ndarray) -> np.ndarray:
        """Convert amplitude values to decibels.

        Args:
            x (np.ndarray): Amplitude spectrogram.

        Returns:
            np.ndarray: Decibels spectrogram.
        """
        return self.spec_gain * _log(np.maximum(1e-5, x), self.base)

    # pylint: disable=no-self-use
    def _db_to_amp(self, x: np.ndarray) -> np.ndarray:
        """Convert decibels spectrogram to amplitude spectrogram.

        Args:
            x (np.ndarray): Decibels spectrogram.

        Returns:
            np.ndarray: Amplitude spectrogram.
        """
        return _exp(x / self.spec_gain, self.base)

    ### Preemphasis ###
    def apply_preemphasis(self, x: np.ndarray) -> np.ndarray:
        """Apply pre-emphasis to the audio signal. Useful to reduce the correlation between neighbouring signal values.

        Args:
            x (np.ndarray): Audio signal.

        Raises:
            RuntimeError: Preemphasis coeff is set to 0.

        Returns:
            np.ndarray: Decorrelated audio signal.
        """
        if self.preemphasis == 0:
            raise RuntimeError(" [!] Preemphasis is set 0.0.")
        return scipy.signal.lfilter([1, -self.preemphasis], [1], x)

    def apply_inv_preemphasis(self, x: np.ndarray) -> np.ndarray:
        """Reverse pre-emphasis."""
        if self.preemphasis == 0:
            raise RuntimeError(" [!] Preemphasis is set 0.0.")
        return scipy.signal.lfilter([1], [1, -self.preemphasis], x)

    ### SPECTROGRAMs ###
    def _linear_to_mel(self, spectrogram: np.ndarray) -> np.ndarray:
        """Project a full scale spectrogram to a melspectrogram.

        Args:
            spectrogram (np.ndarray): Full scale spectrogram.

        Returns:
            np.ndarray: Melspectrogram
        """
        return np.dot(self.mel_basis, spectrogram)

    def _mel_to_linear(self, mel_spec: np.ndarray) -> np.ndarray:
        """Convert a melspectrogram to full scale spectrogram."""
        return np.maximum(1e-10, np.dot(self.inv_mel_basis, mel_spec))

    def spectrogram(self, y: np.ndarray) -> np.ndarray:
        """Compute a spectrogram from a waveform.

        Args:
            y (np.ndarray): Waveform.

        Returns:
            np.ndarray: Spectrogram.
        """
        if self.preemphasis != 0:
            D = self._stft(self.apply_preemphasis(y))
        else:
            D = self._stft(y)
        if self.do_amp_to_db_linear:
            S = self._amp_to_db(np.abs(D))
        else:
            S = np.abs(D)
        return self.normalize(S).astype(np.float32)

    def melspectrogram(self, y: np.ndarray) -> np.ndarray:
        """Compute a melspectrogram from a waveform."""
        if self.preemphasis != 0:
            D = self._stft(self.apply_preemphasis(y))
        else:
            D = self._stft(y)
        if self.do_amp_to_db_mel:
            S = self._amp_to_db(self._linear_to_mel(np.abs(D)))
        else:
            S = self._linear_to_mel(np.abs(D))
        return self.normalize(S).astype(np.float32)

    def inv_spectrogram(self, spectrogram: np.ndarray) -> np.ndarray:
        """Convert a spectrogram to a waveform using Griffi-Lim vocoder."""
        S = self.denormalize(spectrogram)
        S = self._db_to_amp(S)
        # Reconstruct phase
        if self.preemphasis != 0:
            return self.apply_inv_preemphasis(self._griffin_lim(S**self.power))
        return self._griffin_lim(S**self.power)

    def inv_melspectrogram(self, mel_spectrogram: np.ndarray) -> np.ndarray:
        """Convert a melspectrogram to a waveform using Griffi-Lim vocoder."""
        D = self.denormalize(mel_spectrogram)
        S = self._db_to_amp(D)
        S = self._mel_to_linear(S)  # Convert back to linear
        if self.preemphasis != 0:
            return self.apply_inv_preemphasis(self._griffin_lim(S**self.power))
        return self._griffin_lim(S**self.power)

    def out_linear_to_mel(self, linear_spec: np.ndarray) -> np.ndarray:
        """Convert a full scale linear spectrogram output of a network to a melspectrogram.

        Args:
            linear_spec (np.ndarray): Normalized full scale linear spectrogram.

        Returns:
            np.ndarray: Normalized melspectrogram.
        """
        S = self.denormalize(linear_spec)
        S = self._db_to_amp(S)
        S = self._linear_to_mel(np.abs(S))
        S = self._amp_to_db(S)
        mel = self.normalize(S)
        return mel

    ### STFT and ISTFT ###
    def _stft(self, y: np.ndarray) -> np.ndarray:
        """Librosa STFT wrapper.

        Args:
            y (np.ndarray): Audio signal.

        Returns:
            np.ndarray: Complex number array.
        """
        return librosa.stft(
            y=y,
            n_fft=self.fft_size,
            hop_length=self.hop_length,
            win_length=self.win_length,
            pad_mode=self.stft_pad_mode,
            window="hann",
            center=True,
        )

    def _istft(self, y: np.ndarray) -> np.ndarray:
        """Librosa iSTFT wrapper."""
        return librosa.istft(y, hop_length=self.hop_length, win_length=self.win_length)

    def _griffin_lim(self, S):
        angles = np.exp(2j * np.pi * np.random.rand(*S.shape))
        S_complex = np.abs(S).astype(np.complex)
        y = self._istft(S_complex * angles)
        if not np.isfinite(y).all():
            print(" [!] Waveform is not finite everywhere. Skipping the GL.")
            return np.array([0.0])
        for _ in range(self.griffin_lim_iters):
            angles = np.exp(1j * np.angle(self._stft(y)))
            y = self._istft(S_complex * angles)
        return y

    def compute_stft_paddings(self, x, pad_sides=1):
        """Compute paddings used by Librosa's STFT. Compute right padding (final frame) or both sides padding
        (first and final frames)"""
        assert pad_sides in (1, 2)
        pad = (x.shape[0] // self.hop_length + 1) * self.hop_length - x.shape[0]
        if pad_sides == 1:
            return 0, pad
        return pad // 2, pad // 2 + pad % 2

    def compute_f0(self, x: np.ndarray) -> np.ndarray:
        """Compute pitch (f0) of a waveform using the same parameters used for computing melspectrogram.

        Args:
            x (np.ndarray): Waveform.

        Returns:
            np.ndarray: Pitch.

        Examples:
            >>> WAV_FILE = filename = librosa.util.example_audio_file()
            >>> from TTS.config import BaseAudioConfig
            >>> from TTS.utils.audio import AudioProcessor
            >>> conf = BaseAudioConfig(pitch_fmax=640, pitch_fmin=1)
            >>> ap = AudioProcessor(**conf)
            >>> wav = ap.load_wav(WAV_FILE, sr=ap.sample_rate)[:5 * ap.sample_rate]
            >>> pitch = ap.compute_f0(wav)
        """
        assert self.pitch_fmax is not None, " [!] Set `pitch_fmax` before caling `compute_f0`."
        assert self.pitch_fmin is not None, " [!] Set `pitch_fmin` before caling `compute_f0`."
        # align F0 length to the spectrogram length
        if len(x) % self.hop_length == 0:
            x = np.pad(x, (0, self.hop_length // 2), mode=self.stft_pad_mode)

        f0 = compute_f0(
            x=x,
            pitch_fmax=self.pitch_fmax,
            pitch_fmin=self.pitch_fmin,
            hop_length=self.hop_length,
            win_length=self.win_length,
            sample_rate=self.sample_rate,
            stft_pad_mode=self.stft_pad_mode,
            center=True,
        )

        return f0

    ### Audio Processing ###
    def find_endpoint(self, wav: np.ndarray, min_silence_sec=0.8) -> int:
        """Find the last point without silence at the end of a audio signal.

        Args:
            wav (np.ndarray): Audio signal.
            threshold_db (int, optional): Silence threshold in decibels. Defaults to -40.
            min_silence_sec (float, optional): Ignore silences that are shorter then this in secs. Defaults to 0.8.

        Returns:
            int: Last point without silence.
        """
        window_length = int(self.sample_rate * min_silence_sec)
        hop_length = int(window_length / 4)
        threshold = self._db_to_amp(-self.trim_db)
        for x in range(hop_length, len(wav) - window_length, hop_length):
            if np.max(wav[x : x + window_length]) < threshold:
                return x + hop_length
        return len(wav)

    def trim_silence(self, wav):
        """Trim silent parts with a threshold and 0.01 sec margin"""
        margin = int(self.sample_rate * 0.01)
        wav = wav[margin:-margin]
        return librosa.effects.trim(wav, top_db=self.trim_db, frame_length=self.win_length, hop_length=self.hop_length)[
            0
        ]

    @staticmethod
    def sound_norm(x: np.ndarray) -> np.ndarray:
        """Normalize the volume of an audio signal.

        Args:
            x (np.ndarray): Raw waveform.

        Returns:
            np.ndarray: Volume normalized waveform.
        """
        return x / abs(x).max() * 0.95

    @staticmethod
    def _rms_norm(wav, db_level=-27):
        r = 10 ** (db_level / 20)
        a = np.sqrt((len(wav) * (r**2)) / np.sum(wav**2))
        return wav * a

    def rms_volume_norm(self, x: np.ndarray, db_level: float = None) -> np.ndarray:
        """Normalize the volume based on RMS of the signal.

        Args:
            x (np.ndarray): Raw waveform.

        Returns:
            np.ndarray: RMS normalized waveform.
        """
        if db_level is None:
            db_level = self.db_level
        assert -99 <= db_level <= 0, " [!] db_level should be between -99 and 0"
        wav = self._rms_norm(x, db_level)
        return wav

    ### save and load ###
    def load_wav(self, filename: str, sr: int = None) -> np.ndarray:
        """Read a wav file using Librosa and optionally resample, silence trim, volume normalize.

        Resampling slows down loading the file significantly. Therefore it is recommended to resample the file before.

        Args:
            filename (str): Path to the wav file.
            sr (int, optional): Sampling rate for resampling. Defaults to None.

        Returns:
            np.ndarray: Loaded waveform.
        """
        if self.resample:
            # loading with resampling. It is significantly slower.
            x, sr = librosa.load(filename, sr=self.sample_rate)
        elif sr is None:
            # SF is faster than librosa for loading files
            x, sr = sf.read(filename)
            assert self.sample_rate == sr, "%s vs %s" % (self.sample_rate, sr)
        else:
            x, sr = librosa.load(filename, sr=sr)
        if self.do_trim_silence:
            try:
                x = self.trim_silence(x)
            except ValueError:
                print(f" [!] File cannot be trimmed for silence - {filename}")
        if self.do_sound_norm:
            x = self.sound_norm(x)
        if self.do_rms_norm:
            x = self.rms_volume_norm(x, self.db_level)
        return x

    def save_wav(self, wav: np.ndarray, path: str, sr: int = None) -> None:
        """Save a waveform to a file using Scipy.

        Args:
            wav (np.ndarray): Waveform to save.
            path (str): Path to a output file.
            sr (int, optional): Sampling rate used for saving to the file. Defaults to None.
        """
        if self.do_rms_norm:
            wav_norm = self.rms_volume_norm(wav, self.db_level) * 32767
        else:
            wav_norm = wav * (32767 / max(0.01, np.max(np.abs(wav))))

        scipy.io.wavfile.write(path, sr if sr else self.sample_rate, wav_norm.astype(np.int16))

    def get_duration(self, filename: str) -> float:
        """Get the duration of a wav file using Librosa.

        Args:
            filename (str): Path to the wav file.
        """
        return librosa.get_duration(filename)

    @staticmethod
    def mulaw_encode(wav: np.ndarray, qc: int) -> np.ndarray:
        mu = 2**qc - 1
        # wav_abs = np.minimum(np.abs(wav), 1.0)
        signal = np.sign(wav) * np.log(1 + mu * np.abs(wav)) / np.log(1.0 + mu)
        # Quantize signal to the specified number of levels.
        signal = (signal + 1) / 2 * mu + 0.5
        return np.floor(
            signal,
        )

    @staticmethod
    def mulaw_decode(wav, qc):
        """Recovers waveform from quantized values."""
        mu = 2**qc - 1
        x = np.sign(wav) / mu * ((1 + mu) ** np.abs(wav) - 1)
        return x

    @staticmethod
    def encode_16bits(x):
        return np.clip(x * 2**15, -(2**15), 2**15 - 1).astype(np.int16)

    @staticmethod
    def quantize(x: np.ndarray, bits: int) -> np.ndarray:
        """Quantize a waveform to a given number of bits.

        Args:
            x (np.ndarray): Waveform to quantize. Must be normalized into the range `[-1, 1]`.
            bits (int): Number of quantization bits.

        Returns:
            np.ndarray: Quantized waveform.
        """
        return (x + 1.0) * (2**bits - 1) / 2

    @staticmethod
    def dequantize(x, bits):
        """Dequantize a waveform from the given number of bits."""
        return 2 * x / (2**bits - 1) - 1


def _log(x, base):
    if base == 10:
        return np.log10(x)
    return np.log(x)


def _exp(x, base):
    if base == 10:
        return np.power(10, x)
    return np.exp(x)


================================================
FILE: TTS/utils/audio/torch_transforms.py
================================================
import librosa
import torch
from torch import nn


class TorchSTFT(nn.Module):  # pylint: disable=abstract-method
    """Some of the audio processing funtions using Torch for faster batch processing.

    Args:

        n_fft (int):
            FFT window size for STFT.

        hop_length (int):
            number of frames between STFT columns.

        win_length (int, optional):
            STFT window length.

        pad_wav (bool, optional):
            If True pad the audio with (n_fft - hop_length) / 2). Defaults to False.

        window (str, optional):
            The name of a function to create a window tensor that is applied/multiplied to each frame/window. Defaults to "hann_window"

        sample_rate (int, optional):
            target audio sampling rate. Defaults to None.

        mel_fmin (int, optional):
            minimum filter frequency for computing melspectrograms. Defaults to None.

        mel_fmax (int, optional):
            maximum filter frequency for computing melspectrograms. Defaults to None.

        n_mels (int, optional):
            number of melspectrogram dimensions. Defaults to None.

        use_mel (bool, optional):
            If True compute the melspectrograms otherwise. Defaults to False.

        do_amp_to_db_linear (bool, optional):
            enable/disable amplitude to dB conversion of linear spectrograms. Defaults to False.

        spec_gain (float, optional):
            gain applied when converting amplitude to DB. Defaults to 1.0.

        power (float, optional):
            Exponent for the magnitude spectrogram, e.g., 1 for energy, 2 for power, etc.  Defaults to None.

        use_htk (bool, optional):
            Use HTK formula in mel filter instead of Slaney.

        mel_norm (None, 'slaney', or number, optional):
            If 'slaney', divide the triangular mel weights by the width of the mel band
            (area normalization).

            If numeric, use `librosa.util.normalize` to normalize each filter by to unit l_p norm.
            See `librosa.util.normalize` for a full description of supported norm values
            (including `+-np.inf`).

            Otherwise, leave all the triangles aiming for a peak value of 1.0. Defaults to "slaney".
    """

    def __init__(
        self,
        n_fft,
        hop_length,
        win_length,
        pad_wav=False,
        window="hann_window",
        sample_rate=None,
        mel_fmin=0,
        mel_fmax=None,
        n_mels=80,
        use_mel=False,
        do_amp_to_db=False,
        spec_gain=1.0,
        power=None,
        use_htk=False,
        mel_norm="slaney",
    ):
        super().__init__()
        self.n_fft = n_fft
        self.hop_length = hop_length
        self.win_length = win_length
        self.pad_wav = pad_wav
        self.sample_rate = sample_rate
        self.mel_fmin = mel_fmin
        self.mel_fmax = mel_fmax
        self.n_mels = n_mels
        self.use_mel = use_mel
        self.do_amp_to_db = do_amp_to_db
        self.spec_gain = spec_gain
        self.power = power
        self.use_htk = use_htk
        self.mel_norm = mel_norm
        self.window = nn.Parameter(getattr(torch, window)(win_length), requires_grad=False)
        self.mel_basis = None
        if use_mel:
            self._build_mel_basis()

    def __call__(self, x):
        """Compute spectrogram frames by torch based stft.

        Args:
            x (Tensor): input waveform

        Returns:
            Tensor: spectrogram frames.

        Shapes:
            x: [B x T] or [:math:`[B, 1, T]`]
        """
        if x.ndim == 2:
            x = x.unsqueeze(1)
        if self.pad_wav:
            padding = int((self.n_fft - self.hop_length) / 2)
            x = torch.nn.functional.pad(x, (padding, padding), mode="reflect")
        # B x D x T x 2
        o = torch.stft(
            x.squeeze(1),
            self.n_fft,
            self.hop_length,
            self.win_length,
            self.window,
            center=True,
            pad_mode="reflect",  # compatible with audio.py
            normalized=False,
            onesided=True,
            return_complex=False,
        )
        M = o[:, :, :, 0]
        P = o[:, :, :, 1]
        S = torch.sqrt(torch.clamp(M**2 + P**2, min=1e-8))

        if self.power is not None:
            S = S**self.power

        if self.use_mel:
            S = torch.matmul(self.mel_basis.to(x), S)
        if self.do_amp_to_db:
            S = self._amp_to_db(S, spec_gain=self.spec_gain)
        return S

    def _build_mel_basis(self):
        mel_basis = librosa.filters.mel(
            self.sample_rate,
            self.n_fft,
            n_mels=self.n_mels,
            fmin=self.mel_fmin,
            fmax=self.mel_fmax,
            htk=self.use_htk,
            norm=self.mel_norm,
        )
        self.mel_basis = torch.from_numpy(mel_basis).float()

    @staticmethod
    def _amp_to_db(x, spec_gain=1.0):
        return torch.log(torch.clamp(x, min=1e-5) * spec_gain)

    @staticmethod
    def _db_to_amp(x, spec_gain=1.0):
        return torch.exp(x) / spec_gain


================================================
FILE: TTS/utils/callbacks.py
================================================
class TrainerCallback:
    @staticmethod
    def on_init_start(trainer) -> None:
        if hasattr(trainer.model, "module"):
            if hasattr(trainer.model.module, "on_init_start"):
                trainer.model.module.on_init_start(trainer)
        else:
            if hasattr(trainer.model, "on_init_start"):
                trainer.model.on_init_start(trainer)

        if hasattr(trainer.criterion, "on_init_start"):
            trainer.criterion.on_init_start(trainer)

        if hasattr(trainer.optimizer, "on_init_start"):
            trainer.optimizer.on_init_start(trainer)

    @staticmethod
    def on_init_end(trainer) -> None:
        if hasattr(trainer.model, "module"):
            if hasattr(trainer.model.module, "on_init_end"):
                trainer.model.module.on_init_end(trainer)
        else:
            if hasattr(trainer.model, "on_init_end"):
                trainer.model.on_init_end(trainer)

        if hasattr(trainer.criterion, "on_init_end"):
            trainer.criterion.on_init_end(trainer)

        if hasattr(trainer.optimizer, "on_init_end"):
            trainer.optimizer.on_init_end(trainer)

    @staticmethod
    def on_epoch_start(trainer) -> None:
        if hasattr(trainer.model, "module"):
            if hasattr(trainer.model.module, "on_epoch_start"):
                trainer.model.module.on_epoch_start(trainer)
        else:
            if hasattr(trainer.model, "on_epoch_start"):
                trainer.model.on_epoch_start(trainer)

        if hasattr(trainer.criterion, "on_epoch_start"):
            trainer.criterion.on_epoch_start(trainer)

        if hasattr(trainer.optimizer, "on_epoch_start"):
            trainer.optimizer.on_epoch_start(trainer)

    @staticmethod
    def on_epoch_end(trainer) -> None:
        if hasattr(trainer.model, "module"):
            if hasattr(trainer.model.module, "on_epoch_end"):
                trainer.model.module.on_epoch_end(trainer)
        else:
            if hasattr(trainer.model, "on_epoch_end"):
                trainer.model.on_epoch_end(trainer)

        if hasattr(trainer.criterion, "on_epoch_end"):
            trainer.criterion.on_epoch_end(trainer)

        if hasattr(trainer.optimizer, "on_epoch_end"):
            trainer.optimizer.on_epoch_end(trainer)

    @staticmethod
    def on_train_step_start(trainer) -> None:
        if hasattr(trainer.model, "module"):
            if hasattr(trainer.model.module, "on_train_step_start"):
                trainer.model.module.on_train_step_start(trainer)
        else:
            if hasattr(trainer.model, "on_train_step_start"):
                trainer.model.on_train_step_start(trainer)

        if hasattr(trainer.criterion, "on_train_step_start"):
            trainer.criterion.on_train_step_start(trainer)

        if hasattr(trainer.optimizer, "on_train_step_start"):
            trainer.optimizer.on_train_step_start(trainer)

    @staticmethod
    def on_train_step_end(trainer) -> None:
        if hasattr(trainer.model, "module"):
            if hasattr(trainer.model.module, "on_train_step_end"):
                trainer.model.module.on_train_step_end(trainer)
        else:
            if hasattr(trainer.model, "on_train_step_end"):
                trainer.model.on_train_step_end(trainer)

        if hasattr(trainer.criterion, "on_train_step_end"):
            trainer.criterion.on_train_step_end(trainer)

        if hasattr(trainer.optimizer, "on_train_step_end"):
            trainer.optimizer.on_train_step_end(trainer)

    @staticmethod
    def on_keyboard_interrupt(trainer) -> None:
        if hasattr(trainer.model, "module"):
            if hasattr(trainer.model.module, "on_keyboard_interrupt"):
                trainer.model.module.on_keyboard_interrupt(trainer)
        else:
            if hasattr(trainer.model, "on_keyboard_interrupt"):
                trainer.model.on_keyboard_interrupt(trainer)

        if hasattr(trainer.criterion, "on_keyboard_interrupt"):
            trainer.criterion.on_keyboard_interrupt(trainer)

        if hasattr(trainer.optimizer, "on_keyboard_interrupt"):
            trainer.optimizer.on_keyboard_interrupt(trainer)


================================================
FILE: TTS/utils/capacitron_optimizer.py
================================================
from typing import Generator

from trainer.trainer_utils import get_optimizer


class CapacitronOptimizer:
    """Double optimizer class for the Capacitron model."""

    def __init__(self, config: dict, model_params: Generator) -> None:
        self.primary_params, self.secondary_params = self.split_model_parameters(model_params)

        optimizer_names = list(config.optimizer_params.keys())
        optimizer_parameters = list(config.optimizer_params.values())

        self.primary_optimizer = get_optimizer(
            optimizer_names[0],
            optimizer_parameters[0],
            config.lr,
            parameters=self.primary_params,
        )

        self.secondary_optimizer = get_optimizer(
            optimizer_names[1],
            self.extract_optimizer_parameters(optimizer_parameters[1]),
            optimizer_parameters[1]["lr"],
            parameters=self.secondary_params,
        )

        self.param_groups = self.primary_optimizer.param_groups

    def first_step(self):
        self.secondary_optimizer.step()
        self.secondary_optimizer.zero_grad()
        self.primary_optimizer.zero_grad()

    def step(self):
        # Update param groups to display the correct learning rate
        self.param_groups = self.primary_optimizer.param_groups
        self.primary_optimizer.step()

    def zero_grad(self, set_to_none=False):
        self.primary_optimizer.zero_grad(set_to_none)
        self.secondary_optimizer.zero_grad(set_to_none)

    def load_state_dict(self, state_dict):
        self.primary_optimizer.load_state_dict(state_dict[0])
        self.secondary_optimizer.load_state_dict(state_dict[1])

    def state_dict(self):
        return [self.primary_optimizer.state_dict(), self.secondary_optimizer.state_dict()]

    @staticmethod
    def split_model_parameters(model_params: Generator) -> list:
        primary_params = []
        secondary_params = []
        for name, param in model_params:
            if param.requires_grad:
                if name == "capacitron_vae_layer.beta":
                    secondary_params.append(param)
                else:
                    primary_params.append(param)
        return [iter(primary_params), iter(secondary_params)]

    @staticmethod
    def extract_optimizer_parameters(params: dict) -> dict:
        """Extract parameters that are not the learning rate"""
        return {k: v for k, v in params.items() if k != "lr"}


================================================
FILE: TTS/utils/distribute.py
================================================
# edited from https://github.com/fastai/imagenet-fast/blob/master/imagenet_nv/distributed.py
import torch
import torch.distributed as dist


def reduce_tensor(tensor, num_gpus):
    rt = tensor.clone()
    dist.all_reduce(rt, op=dist.reduce_op.SUM)
    rt /= num_gpus
    return rt


def init_distributed(rank, num_gpus, group_name, dist_backend, dist_url):
    assert torch.cuda.is_available(), "Distributed mode requires CUDA."

    # Set cuda device so everything is done on the right GPU.
    torch.cuda.set_device(rank % torch.cuda.device_count())

    # Initialize distributed communication
    dist.init_process_group(dist_backend, init_method=dist_url, world_size=num_gpus, rank=rank, group_name=group_name)


================================================
FILE: TTS/utils/download.py
================================================
# Adapted from https://github.com/pytorch/audio/

import hashlib
import logging
import os
import tarfile
import urllib
import urllib.request
import zipfile
from os.path import expanduser
from typing import Any, Iterable, List, Optional

from torch.utils.model_zoo import tqdm


def stream_url(
    url: str, start_byte: Optional[int] = None, block_size: int = 32 * 1024, progress_bar: bool = True
) -> Iterable:
    """Stream url by chunk

    Args:
        url (str): Url.
        start_byte (int or None, optional): Start streaming at that point (Default: ``None``).
        block_size (int, optional): Size of chunks to stream (Default: ``32 * 1024``).
        progress_bar (bool, optional): Display a progress bar (Default: ``True``).
    """

    # If we already have the whole file, there is no need to download it again
    req = urllib.request.Request(url, method="HEAD")
    with urllib.request.urlopen(req) as response:
        url_size = int(response.info().get("Content-Length", -1))
    if url_size == start_byte:
        return

    req = urllib.request.Request(url)
    if start_byte:
        req.headers["Range"] = "bytes={}-".format(start_byte)

    with urllib.request.urlopen(req) as upointer, tqdm(
        unit="B",
        unit_scale=True,
        unit_divisor=1024,
        total=url_size,
        disable=not progress_bar,
    ) as pbar:
        num_bytes = 0
        while True:
            chunk = upointer.read(block_size)
            if not chunk:
                break
            yield chunk
            num_bytes += len(chunk)
            pbar.update(len(chunk))


def download_url(
    url: str,
    download_folder: str,
    filename: Optional[str] = None,
    hash_value: Optional[str] = None,
    hash_type: str = "sha256",
    progress_bar: bool = True,
    resume: bool = False,
) -> None:
    """Download file to disk.

    Args:
        url (str): Url.
        download_folder (str): Folder to download file.
        filename (str or None, optional): Name of downloaded file. If None, it is inferred from the url
            (Default: ``None``).
        hash_value (str or None, optional): Hash for url (Default: ``None``).
        hash_type (str, optional): Hash type, among "sha256" and "md5" (Default: ``"sha256"``).
        progress_bar (bool, optional): Display a progress bar (Default: ``True``).
        resume (bool, optional): Enable resuming download (Default: ``False``).
    """

    req = urllib.request.Request(url, method="HEAD")
    req_info = urllib.request.urlopen(req).info()  # pylint: disable=consider-using-with

    # Detect filename
    filename = filename or req_info.get_filename() or os.path.basename(url)
    filepath = os.path.join(download_folder, filename)
    if resume and os.path.exists(filepath):
        mode = "ab"
        local_size: Optional[int] = os.path.getsize(filepath)

    elif not resume and os.path.exists(filepath):
        raise RuntimeError("{} already exists. Delete the file manually and retry.".format(filepath))
    else:
        mode = "wb"
        local_size = None

    if hash_value and local_size == int(req_info.get("Content-Length", -1)):
        with open(filepath, "rb") as file_obj:
            if validate_file(file_obj, hash_value, hash_type):
                return
        raise RuntimeError("The hash of {} does not match. Delete the file manually and retry.".format(filepath))

    with open(filepath, mode) as fpointer:
        for chunk in stream_url(url, start_byte=local_size, progress_bar=progress_bar):
            fpointer.write(chunk)

    with open(filepath, "rb") as file_obj:
        if hash_value and not validate_file(file_obj, hash_value, hash_type):
            raise RuntimeError("The hash of {} does not match. Delete the file manually and retry.".format(filepath))


def validate_file(file_obj: Any, hash_value: str, hash_type: str = "sha256") -> bool:
    """Validate a given file object with its hash.

    Args:
        file_obj: File object to read from.
        hash_value (str): Hash for url.
        hash_type (str, optional): Hash type, among "sha256" and "md5" (Default: ``"sha256"``).

    Returns:
        bool: return True if its a valid file, else False.
    """

    if hash_type == "sha256":
        hash_func = hashlib.sha256()
    elif hash_type == "md5":
        hash_func = hashlib.md5()
    else:
        raise ValueError

    while True:
        # Read by chunk to avoid filling memory
        chunk = file_obj.read(1024**2)
        if not chunk:
            break
        hash_func.update(chunk)

    return hash_func.hexdigest() == hash_value


def extract_archive(from_path: str, to_path: Optional[str] = None, overwrite: bool = False) -> List[str]:
    """Extract archive.
    Args:
        from_path (str): the path of the archive.
        to_path (str or None, optional): the root path of the extraced files (directory of from_path)
            (Default: ``None``)
        overwrite (bool, optional): overwrite existing files (Default: ``False``)

    Returns:
        list: List of paths to extracted files even if not overwritten.
    """

    if to_path is None:
        to_path = os.path.dirname(from_path)

    try:
        with tarfile.open(from_path, "r") as tar:
            logging.info("Opened tar file %s.", from_path)
            files = []
            for file_ in tar:  # type: Any
                file_path = os.path.join(to_path, file_.name)
                if file_.isfile():
                    files.append(file_path)
                    if os.path.exists(file_path):
                        logging.info("%s already extracted.", file_path)
                        if not overwrite:
                            continue
                tar.extract(file_, to_path)
            return files
    except tarfile.ReadError:
        pass

    try:
        with zipfile.ZipFile(from_path, "r") as zfile:
            logging.info("Opened zip file %s.", from_path)
            files = zfile.namelist()
            for file_ in files:
                file_path = os.path.join(to_path, file_)
                if os.path.exists(file_path):
                    logging.info("%s already extracted.", file_path)
                    if not overwrite:
                        continue
                zfile.extract(file_, to_path)
        return files
    except zipfile.BadZipFile:
        pass

    raise NotImplementedError(" > [!] only supports tar.gz, tgz, and zip achives.")


def download_kaggle_dataset(dataset_path: str, dataset_name: str, output_path: str):
    """Download dataset from kaggle.
    Args:
        dataset_path (str):
        This the kaggle link to the dataset. for example vctk is 'mfekadu/english-multispeaker-corpus-for-voice-cloning'
        dataset_name (str): Name of the folder the dataset will be saved in.
        output_path (str): Path of the location you want the dataset folder to be saved to.
    """
    data_path = os.path.join(output_path, dataset_name)
    try:
        import kaggle  # pylint: disable=import-outside-toplevel

        kaggle.api.authenticate()
        print(f"""\nDownloading {dataset_name}...""")
        kaggle.api.dataset_download_files(dataset_path, path=data_path, unzip=True)
    except OSError:
        print(
            f"""[!] in order to download kaggle datasets, you need to have a kaggle api token stored in your {os.path.join(expanduser('~'), '.kaggle/kaggle.json')}"""
        )


================================================
FILE: TTS/utils/downloaders.py
================================================
import os
from typing import Optional

from TTS.utils.download import download_kaggle_dataset, download_url, extract_archive


def download_ljspeech(path: str):
    """Download and extract LJSpeech dataset

    Args:
        path (str): path to the directory where the dataset will be stored.
    """
    os.makedirs(path, exist_ok=True)
    url = "https://data.keithito.com/data/speech/LJSpeech-1.1.tar.bz2"
    download_url(url, path)
    basename = os.path.basename(url)
    archive = os.path.join(path, basename)
    print(" > Extracting archive file...")
    extract_archive(archive)


def download_vctk(path: str, use_kaggle: Optional[bool] = False):
    """Download and extract VCTK dataset.

    Args:
        path (str): path to the directory where the dataset will be stored.

        use_kaggle (bool, optional): Downloads vctk dataset from kaggle. Is generally faster. Defaults to False.
    """
    if use_kaggle:
        download_kaggle_dataset("mfekadu/english-multispeaker-corpus-for-voice-cloning", "VCTK", path)
    else:
        os.makedirs(path, exist_ok=True)
        url = "https://datashare.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip"
        download_url(url, path)
        basename = os.path.basename(url)
        archive = os.path.join(path, basename)
        print(" > Extracting archive file...")
        extract_archive(archive)


def download_tweb(path: str):
    """Download and extract Tweb dataset

    Args:
        path (str): Path to the directory where the dataset will be stored.
    """
    download_kaggle_dataset("bryanpark/the-world-english-bible-speech-dataset", "TWEB", path)


def download_libri_tts(path: str, subset: Optional[str] = "all"):
    """Download and extract libri tts dataset.

    Args:
        path (str): Path to the directory where the dataset will be stored.

        subset (str, optional): Name of the subset to download. If you only want to download a certain
        portion specify it here. Defaults to 'all'.
    """

    subset_dict = {
        "libri-tts-clean-100": "http://www.openslr.org/resources/60/train-clean-100.tar.gz",
        "libri-tts-clean-360": "http://www.openslr.org/resources/60/train-clean-360.tar.gz",
        "libri-tts-other-500": "http://www.openslr.org/resources/60/train-other-500.tar.gz",
        "libri-tts-dev-clean": "http://www.openslr.org/resources/60/dev-clean.tar.gz",
        "libri-tts-dev-other": "http://www.openslr.org/resources/60/dev-other.tar.gz",
        "libri-tts-test-clean": "http://www.openslr.org/resources/60/test-clean.tar.gz",
        "libri-tts-test-other": "http://www.openslr.org/resources/60/test-other.tar.gz",
    }

    os.makedirs(path, exist_ok=True)
    if subset == "all":
        for sub, val in subset_dict.items():
            print(f" > Downloading {sub}...")
            download_url(val, path)
            basename = os.path.basename(val)
            archive = os.path.join(path, basename)
            print(" > Extracting archive file...")
            extract_archive(archive)
        print(" > All subsets downloaded")
    else:
        url = subset_dict[subset]
        download_url(url, path)
        basename = os.path.basename(url)
        archive = os.path.join(path, basename)
        print(" > Extracting archive file...")
        extract_archive(archive)


def download_thorsten_de(path: str):
    """Download and extract Thorsten german male voice dataset.

    Args:
        path (str): Path to the directory where the dataset will be stored.
    """
    os.makedirs(path, exist_ok=True)
    url = "https://www.openslr.org/resources/95/thorsten-de_v02.tgz"
    download_url(url, path)
    basename = os.path.basename(url)
    archive = os.path.join(path, basename)
    print(" > Extracting archive file...")
    extract_archive(archive)


def download_mailabs(path: str, language: str = "english"):
    """Download and extract Mailabs dataset.

    Args:
        path (str): Path to the directory where the dataset will be stored.

        language (str): Language subset to download. Defaults to english.
    """
    language_dict = {
        "english": "https://data.solak.de/data/Training/stt_tts/en_US.tgz",
        "german": "https://data.solak.de/data/Training/stt_tts/de_DE.tgz",
        "french": "https://data.solak.de/data/Training/stt_tts/fr_FR.tgz",
        "italian": "https://data.solak.de/data/Training/stt_tts/it_IT.tgz",
        "spanish": "https://data.solak.de/data/Training/stt_tts/es_ES.tgz",
    }
    os.makedirs(path, exist_ok=True)
    url = language_dict[language]
    download_url(url, path)
    basename = os.path.basename(url)
    archive = os.path.join(path, basename)
    print(" > Extracting archive file...")
    extract_archive(archive)


================================================
FILE: TTS/utils/generic_utils.py
================================================
# -*- coding: utf-8 -*-
import datetime
import importlib
import os
import re
import subprocess
import sys
from pathlib import Path
from typing import Dict

import fsspec
import torch


def to_cuda(x: torch.Tensor) -> torch.Tensor:
    if x is None:
        return None
    if torch.is_tensor(x):
        x = x.contiguous()
        if torch.cuda.is_available():
            x = x.cuda(non_blocking=True)
    return x


def get_cuda():
    use_cuda = torch.cuda.is_available()
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    return use_cuda, device


def get_git_branch():
    try:
        out = subprocess.check_output(["git", "branch"]).decode("utf8")
        current = next(line for line in out.split("\n") if line.startswith("*"))
        current.replace("* ", "")
    except subprocess.CalledProcessError:
        current = "inside_docker"
    except FileNotFoundError:
        current = "unknown"
    return current


def get_commit_hash():
    """https://stackoverflow.com/questions/14989858/get-the-current-git-hash-in-a-python-script"""
    # try:
    #     subprocess.check_output(['git', 'diff-index', '--quiet',
    #                              'HEAD'])  # Verify client is clean
    # except:
    #     raise RuntimeError(
    #         " !! Commit before training to get the commit hash.")
    try:
        commit = subprocess.check_output(["git", "rev-parse", "--short", "HEAD"]).decode().strip()
    # Not copying .git folder into docker container
    except (subprocess.CalledProcessError, FileNotFoundError):
        commit = "0000000"
    return commit


def get_experiment_folder_path(root_path, model_name):
    """Get an experiment folder path with the current date and time"""
    date_str = datetime.datetime.now().strftime("%B-%d-%Y_%I+%M%p")
    commit_hash = get_commit_hash()
    output_folder = os.path.join(root_path, model_name + "-" + date_str + "-" + commit_hash)
    return output_folder


def remove_experiment_folder(experiment_path):
    """Check folder if there is a checkpoint, otherwise remove the folder"""
    fs = fsspec.get_mapper(experiment_path).fs
    checkpoint_files = fs.glob(experiment_path + "/*.pth")
    if not checkpoint_files:
        if fs.exists(experiment_path):
            fs.rm(experiment_path, recursive=True)
            print(" ! Run is removed from {}".format(experiment_path))
    else:
        print(" ! Run is kept in {}".format(experiment_path))


def count_parameters(model):
    r"""Count number of trainable parameters in a network"""
    return sum(p.numel() for p in model.parameters() if p.requires_grad)


def to_camel(text):
    text = text.capitalize()
    text = re.sub(r"(?!^)_([a-zA-Z])", lambda m: m.group(1).upper(), text)
    text = text.replace("Tts", "TTS")
    text = text.replace("vc", "VC")
    return text


def find_module(module_path: str, module_name: str) -> object:
    module_name = module_name.lower()
    module = importlib.import_module(module_path + "." + module_name)
    class_name = to_camel(module_name)
    return getattr(module, class_name)


def import_class(module_path: str) -> object:
    """Import a class from a module path.

    Args:
        module_path (str): The module path of the class.

    Returns:
        object: The imported class.
    """
    class_name = module_path.split(".")[-1]
    module_path = ".".join(module_path.split(".")[:-1])
    module = importlib.import_module(module_path)
    return getattr(module, class_name)


def get_import_path(obj: object) -> str:
    """Get the import path of a class.

    Args:
        obj (object): The class object.

    Returns:
        str: The import path of the class.
    """
    return ".".join([type(obj).__module__, type(obj).__name__])


def get_user_data_dir(appname):
    if sys.platform == "win32":
        import winreg  # pylint: disable=import-outside-toplevel

        key = winreg.OpenKey(
            winreg.HKEY_CURRENT_USER, r"Software\Microsoft\Windows\CurrentVersion\Explorer\Shell Folders"
        )
        dir_, _ = winreg.QueryValueEx(key, "Local AppData")
        ans = Path(dir_).resolve(strict=False)
    elif sys.platform == "darwin":
        ans = Path("~/Library/Application Support/").expanduser()
    else:
        ans = Path.home().joinpath(".local/share")
    return ans.joinpath(appname)


def set_init_dict(model_dict, checkpoint_state, c):
    # Partial initialization: if there is a mismatch with new and old layer, it is skipped.
    for k, v in checkpoint_state.items():
        if k not in model_dict:
            print(" | > Layer missing in the model definition: {}".format(k))
    # 1. filter out unnecessary keys
    pretrained_dict = {k: v for k, v in checkpoint_state.items() if k in model_dict}
    # 2. filter out different size layers
    pretrained_dict = {k: v for k, v in pretrained_dict.items() if v.numel() == model_dict[k].numel()}
    # 3. skip reinit layers
    if c.has("reinit_layers") and c.reinit_layers is not None:
        for reinit_layer_name in c.reinit_layers:
            pretrained_dict = {k: v for k, v in pretrained_dict.items() if reinit_layer_name not in k}
    # 4. overwrite entries in the existing state dict
    model_dict.update(pretrained_dict)
    print(" | > {} / {} layers are restored.".format(len(pretrained_dict), len(model_dict)))
    return model_dict


def format_aux_input(def_args: Dict, kwargs: Dict) -> Dict:
    """Format kwargs to hande auxilary inputs to models.

    Args:
        def_args (Dict): A dictionary of argument names and their default values if not defined in `kwargs`.
        kwargs (Dict): A `dict` or `kwargs` that includes auxilary inputs to the model.

    Returns:
        Dict: arguments with formatted auxilary inputs.
    """
    kwargs = kwargs.copy()
    for name in def_args:
        if name not in kwargs or kwargs[name] is None:
            kwargs[name] = def_args[name]
    return kwargs


class KeepAverage:
    def __init__(self):
        self.avg_values = {}
        self.iters = {}

    def __getitem__(self, key):
        return self.avg_values[key]

    def items(self):
        return self.avg_values.items()

    def add_value(self, name, init_val=0, init_iter=0):
        self.avg_values[name] = init_val
        self.iters[name] = init_iter

    def update_value(self, name, value, weighted_avg=False):
        if name not in self.avg_values:
            # add value if not exist before
            self.add_value(name, init_val=value)
        else:
            # else update existing value
            if weighted_avg:
                self.avg_values[name] = 0.99 * self.avg_values[name] + 0.01 * value
                self.iters[name] += 1
            else:
                self.avg_values[name] = self.avg_values[name] * self.iters[name] + value
                self.iters[name] += 1
                self.avg_values[name] /= self.iters[name]

    def add_values(self, name_dict):
        for key, value in name_dict.items():
            self.add_value(key, init_val=value)

    def update_values(self, value_dict):
        for key, value in value_dict.items():
            self.update_value(key, value)


================================================
FILE: TTS/utils/io.py
================================================
import datetime
import json
import os
import pickle as pickle_tts
import shutil
from typing import Any, Callable, Dict, Union

import fsspec
import torch
from coqpit import Coqpit

from TTS.utils.generic_utils import get_user_data_dir


class RenamingUnpickler(pickle_tts.Unpickler):
    """Overload default pickler to solve module renaming problem"""

    def find_class(self, module, name):
        return super().find_class(module.replace("mozilla_voice_tts", "TTS"), name)


class AttrDict(dict):
    """A custom dict which converts dict keys
    to class attributes"""

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.__dict__ = self


def copy_model_files(config: Coqpit, out_path, new_fields=None):
    """Copy config.json and other model files to training folder and add
    new fields.

    Args:
        config (Coqpit): Coqpit config defining the training run.
        out_path (str): output path to copy the file.
        new_fields (dict): new fileds to be added or edited
            in the config file.
    """
    copy_config_path = os.path.join(out_path, "config.json")
    # add extra information fields
    if new_fields:
        config.update(new_fields, allow_new=True)
    # TODO: Revert to config.save_json() once Coqpit supports arbitrary paths.
    with fsspec.open(copy_config_path, "w", encoding="utf8") as f:
        json.dump(config.to_dict(), f, indent=4)

    # copy model stats file if available
    if config.audio.stats_path is not None:
        copy_stats_path = os.path.join(out_path, "scale_stats.npy")
        filesystem = fsspec.get_mapper(copy_stats_path).fs
        if not filesystem.exists(copy_stats_path):
            with fsspec.open(config.audio.stats_path, "rb") as source_file:
                with fsspec.open(copy_stats_path, "wb") as target_file:
                    shutil.copyfileobj(source_file, target_file)


def load_fsspec(
    path: str,
    map_location: Union[str, Callable, torch.device, Dict[Union[str, torch.device], Union[str, torch.device]]] = None,
    cache: bool = True,
    **kwargs,
) -> Any:
    """Like torch.load but can load from other locations (e.g. s3:// , gs://).

    Args:
        path: Any path or url supported by fsspec.
        map_location: torch.device or str.
        cache: If True, cache a remote file locally for subsequent calls. It is cached under `get_user_data_dir()/tts_cache`. Defaults to True.
        **kwargs: Keyword arguments forwarded to torch.load.

    Returns:
        Object stored in path.
    """
    is_local = os.path.isdir(path) or os.path.isfile(path)
    if cache and not is_local:
        with fsspec.open(
            f"filecache::{path}",
            filecache={"cache_storage": str(get_user_data_dir("tts_cache"))},
            mode="rb",
        ) as f:
            return torch.load(f, map_location=map_location, **kwargs)
    else:
        with fsspec.open(path, "rb") as f:
            return torch.load(f, map_location=map_location, **kwargs)


def load_checkpoint(
    model, checkpoint_path, use_cuda=False, eval=False, cache=False
):  # pylint: disable=redefined-builtin
    try:
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
    except ModuleNotFoundError:
        pickle_tts.Unpickler = RenamingUnpickler
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), pickle_module=pickle_tts, cache=cache)
    model.load_state_dict(state["model"])
    if use_cuda:
        model.cuda()
    if eval:
        model.eval()
    return model, state


def save_fsspec(state: Any, path: str, **kwargs):
    """Like torch.save but can save to other locations (e.g. s3:// , gs://).

    Args:
        state: State object to save
        path: Any path or url supported by fsspec.
        **kwargs: Keyword arguments forwarded to torch.save.
    """
    with fsspec.open(path, "wb") as f:
        torch.save(state, f, **kwargs)


def save_model(config, model, optimizer, scaler, current_step, epoch, output_path, **kwargs):
    if hasattr(model, "module"):
        model_state = model.module.state_dict()
    else:
        model_state = model.state_dict()
    if isinstance(optimizer, list):
        optimizer_state = [optim.state_dict() for optim in optimizer]
    elif optimizer.__class__.__name__ == "CapacitronOptimizer":
        optimizer_state = [optimizer.primary_optimizer.state_dict(), optimizer.secondary_optimizer.state_dict()]
    else:
        optimizer_state = optimizer.state_dict() if optimizer is not None else None

    if isinstance(scaler, list):
        scaler_state = [s.state_dict() for s in scaler]
    else:
        scaler_state = scaler.state_dict() if scaler is not None else None

    if isinstance(config, Coqpit):
        config = config.to_dict()

    state = {
        "config": config,
        "model": model_state,
        "optimizer": optimizer_state,
        "scaler": scaler_state,
        "step": current_step,
        "epoch": epoch,
        "date": datetime.date.today().strftime("%B %d, %Y"),
    }
    state.update(kwargs)
    save_fsspec(state, output_path)


def save_checkpoint(
    config,
    model,
    optimizer,
    scaler,
    current_step,
    epoch,
    output_folder,
    **kwargs,
):
    file_name = "checkpoint_{}.pth".format(current_step)
    checkpoint_path = os.path.join(output_folder, file_name)
    print("\n > CHECKPOINT : {}".format(checkpoint_path))
    save_model(
        config,
        model,
        optimizer,
        scaler,
        current_step,
        epoch,
        checkpoint_path,
        **kwargs,
    )


def save_best_model(
    current_loss,
    best_loss,
    config,
    model,
    optimizer,
    scaler,
    current_step,
    epoch,
    out_path,
    keep_all_best=False,
    keep_after=10000,
    **kwargs,
):
    if current_loss < best_loss:
        best_model_name = f"best_model_{current_step}.pth"
        checkpoint_path = os.path.join(out_path, best_model_name)
        print(" > BEST MODEL : {}".format(checkpoint_path))
        save_model(
            config,
            model,
            optimizer,
            scaler,
            current_step,
            epoch,
            checkpoint_path,
            model_loss=current_loss,
            **kwargs,
        )
        fs = fsspec.get_mapper(out_path).fs
        # only delete previous if current is saved successfully
        if not keep_all_best or (current_step < keep_after):
            model_names = fs.glob(os.path.join(out_path, "best_model*.pth"))
            for model_name in model_names:
                if os.path.basename(model_name) != best_model_name:
                    fs.rm(model_name)
        # create a shortcut which always points to the currently best model
        shortcut_name = "best_model.pth"
        shortcut_path = os.path.join(out_path, shortcut_name)
        fs.copy(checkpoint_path, shortcut_path)
        best_loss = current_loss
    return best_loss


================================================
FILE: TTS/utils/manage.py
================================================
import json
import os
import zipfile
from pathlib import Path
from shutil import copyfile, rmtree
from typing import Dict, Tuple

import requests
from tqdm import tqdm

from TTS.config import load_config
from TTS.utils.generic_utils import get_user_data_dir

LICENSE_URLS = {
    "cc by-nc-nd 4.0": "https://creativecommons.org/licenses/by-nc-nd/4.0/",
    "mpl": "https://www.mozilla.org/en-US/MPL/2.0/",
    "mpl2": "https://www.mozilla.org/en-US/MPL/2.0/",
    "mpl 2.0": "https://www.mozilla.org/en-US/MPL/2.0/",
    "mit": "https://choosealicense.com/licenses/mit/",
    "apache 2.0": "https://choosealicense.com/licenses/apache-2.0/",
    "apache2": "https://choosealicense.com/licenses/apache-2.0/",
    "cc-by-sa 4.0": "https://creativecommons.org/licenses/by-sa/4.0/",
}


class ModelManager(object):
    """Manage TTS models defined in .models.json.
    It provides an interface to list and download
    models defines in '.model.json'

    Models are downloaded under '.TTS' folder in the user's
    home path.

    Args:
        models_file (str): path to .model.json file. Defaults to None.
        output_prefix (str): prefix to `tts` to download models. Defaults to None
        progress_bar (bool): print a progress bar when donwloading a file. Defaults to False.
        verbose (bool): print info. Defaults to True.
    """

    def __init__(self, models_file=None, output_prefix=None, progress_bar=False, verbose=True):
        super().__init__()
        self.progress_bar = progress_bar
        self.verbose = verbose
        if output_prefix is None:
            self.output_prefix = get_user_data_dir("tts")
        else:
            self.output_prefix = os.path.join(output_prefix, "tts")
        self.models_dict = None
        if models_file is not None:
            self.read_models_file(models_file)
        else:
            # try the default location
            path = Path(__file__).parent / "../.models.json"
            self.read_models_file(path)

    def read_models_file(self, file_path):
        """Read .models.json as a dict

        Args:
            file_path (str): path to .models.json.
        """
        with open(file_path, "r", encoding="utf-8") as json_file:
            self.models_dict = json.load(json_file)

    def _list_models(self, model_type, model_count=0):
        if self.verbose:
            print(" Name format: type/language/dataset/model")
        model_list = []
        for lang in self.models_dict[model_type]:
            for dataset in self.models_dict[model_type][lang]:
                for model in self.models_dict[model_type][lang][dataset]:
                    model_full_name = f"{model_type}--{lang}--{dataset}--{model}"
                    output_path = os.path.join(self.output_prefix, model_full_name)
                    if self.verbose:
                        if os.path.exists(output_path):
                            print(f" {model_count}: {model_type}/{lang}/{dataset}/{model} [already downloaded]")
                        else:
                            print(f" {model_count}: {model_type}/{lang}/{dataset}/{model}")
                    model_list.append(f"{model_type}/{lang}/{dataset}/{model}")
                    model_count += 1
        return model_list

    def _list_for_model_type(self, model_type):
        models_name_list = []
        model_count = 1
        model_type = "tts_models"
        models_name_list.extend(self._list_models(model_type, model_count))
        return models_name_list

    def list_models(self):
        models_name_list = []
        model_count = 1
        for model_type in self.models_dict:
            model_list = self._list_models(model_type, model_count)
            models_name_list.extend(model_list)
        return models_name_list

    def model_info_by_idx(self, model_query):
        """Print the description of the model from .models.json file using model_idx

        Args:
            model_query (str): <model_tye>/<model_idx>
        """
        model_name_list = []
        model_type, model_query_idx = model_query.split("/")
        try:
            model_query_idx = int(model_query_idx)
            if model_query_idx <= 0:
                print("> model_query_idx should be a positive integer!")
                return
        except:
            print("> model_query_idx should be an integer!")
            return
        model_count = 0
        if model_type in self.models_dict:
            for lang in self.models_dict[model_type]:
                for dataset in self.models_dict[model_type][lang]:
                    for model in self.models_dict[model_type][lang][dataset]:
                        model_name_list.append(f"{model_type}/{lang}/{dataset}/{model}")
                        model_count += 1
        else:
            print(f"> model_type {model_type} does not exist in the list.")
            return
        if model_query_idx > model_count:
            print(f"model query idx exceeds the number of available models [{model_count}] ")
        else:
            model_type, lang, dataset, model = model_name_list[model_query_idx - 1].split("/")
            print(f"> model type : {model_type}")
            print(f"> language supported : {lang}")
            print(f"> dataset used : {dataset}")
            print(f"> model name : {model}")
            if "description" in self.models_dict[model_type][lang][dataset][model]:
                print(f"> description : {self.models_dict[model_type][lang][dataset][model]['description']}")
            else:
                print("> description : coming soon")
            if "default_vocoder" in self.models_dict[model_type][lang][dataset][model]:
                print(f"> default_vocoder : {self.models_dict[model_type][lang][dataset][model]['default_vocoder']}")

    def model_info_by_full_name(self, model_query_name):
        """Print the description of the model from .models.json file using model_full_name

        Args:
            model_query_name (str): Format is <model_type>/<language>/<dataset>/<model_name>
        """
        model_type, lang, dataset, model = model_query_name.split("/")
        if model_type in self.models_dict:
            if lang in self.models_dict[model_type]:
                if dataset in self.models_dict[model_type][lang]:
                    if model in self.models_dict[model_type][lang][dataset]:
                        print(f"> model type : {model_type}")
                        print(f"> language supported : {lang}")
                        print(f"> dataset used : {dataset}")
                        print(f"> model name : {model}")
                        if "description" in self.models_dict[model_type][lang][dataset][model]:
                            print(
                                f"> description : {self.models_dict[model_type][lang][dataset][model]['description']}"
                            )
                        else:
                            print("> description : coming soon")
                        if "default_vocoder" in self.models_dict[model_type][lang][dataset][model]:
                            print(
                                f"> default_vocoder : {self.models_dict[model_type][lang][dataset][model]['default_vocoder']}"
                            )
                    else:
                        print(f"> model {model} does not exist for {model_type}/{lang}/{dataset}.")
                else:
                    print(f"> dataset {dataset} does not exist for {model_type}/{lang}.")
            else:
                print(f"> lang {lang} does not exist for {model_type}.")
        else:
            print(f"> model_type {model_type} does not exist in the list.")

    def list_tts_models(self):
        """Print all `TTS` models and return a list of model names

        Format is `language/dataset/model`
        """
        return self._list_for_model_type("tts_models")

    def list_vocoder_models(self):
        """Print all the `vocoder` models and return a list of model names

        Format is `language/dataset/model`
        """
        return self._list_for_model_type("vocoder_models")

    def list_vc_models(self):
        """Print all the voice conversion models and return a list of model names

        Format is `language/dataset/model`
        """
        return self._list_for_model_type("voice_conversion_models")

    def list_langs(self):
        """Print all the available languages"""
        print(" Name format: type/language")
        for model_type in self.models_dict:
            for lang in self.models_dict[model_type]:
                print(f" >: {model_type}/{lang} ")

    def list_datasets(self):
        """Print all the datasets"""
        print(" Name format: type/language/dataset")
        for model_type in self.models_dict:
            for lang in self.models_dict[model_type]:
                for dataset in self.models_dict[model_type][lang]:
                    print(f" >: {model_type}/{lang}/{dataset}")

    @staticmethod
    def print_model_license(model_item: Dict):
        """Print the license of a model

        Args:
            model_item (dict): model item in the models.json
        """
        if "license" in model_item and model_item["license"].strip() != "":
            print(f" > Model's license - {model_item['license']}")
            if model_item["license"].lower() in LICENSE_URLS:
                print(f" > Check {LICENSE_URLS[model_item['license'].lower()]} for more info.")
            else:
                print(" > Check https://opensource.org/licenses for more info.")
        else:
            print(" > Model's license - No license information available")

    def download_model(self, model_name):
        """Download model files given the full model name.
        Model name is in the format
            'type/language/dataset/model'
            e.g. 'tts_model/en/ljspeech/tacotron'

        Every model must have the following files:
            - *.pth : pytorch model checkpoint file.
            - config.json : model config file.
            - scale_stats.npy (if exist): scale values for preprocessing.

        Args:
            model_name (str): model name as explained above.
        """
        # fetch model info from the dict
        model_type, lang, dataset, model = model_name.split("/")
        model_full_name = f"{model_type}--{lang}--{dataset}--{model}"
        model_item = self.models_dict[model_type][lang][dataset][model]
        model_item["model_type"] = model_type
        # set the model specific output path
        output_path = os.path.join(self.output_prefix, model_full_name)
        if os.path.exists(output_path):
            print(f" > {model_name} is already downloaded.")
        else:
            os.makedirs(output_path, exist_ok=True)
            print(f" > Downloading model to {output_path}")
            # download from github release
            self._download_zip_file(model_item["github_rls_url"], output_path, self.progress_bar)
            self.print_model_license(model_item=model_item)
        # find downloaded files
        output_model_path, output_config_path = self._find_files(output_path)
        # update paths in the config.json
        self._update_paths(output_path, output_config_path)
        return output_model_path, output_config_path, model_item

    @staticmethod
    def _find_files(output_path: str) -> Tuple[str, str]:
        """Find the model and config files in the output path

        Args:
            output_path (str): path to the model files

        Returns:
            Tuple[str, str]: path to the model file and config file
        """
        model_file = None
        config_file = None
        for file_name in os.listdir(output_path):
            if file_name in ["model_file.pth", "model_file.pth.tar", "model.pth"]:
                model_file = os.path.join(output_path, file_name)
            elif file_name == "config.json":
                config_file = os.path.join(output_path, file_name)
        if model_file is None:
            raise ValueError(" [!] Model file not found in the output path")
        if config_file is None:
            raise ValueError(" [!] Config file not found in the output path")
        return model_file, config_file

    @staticmethod
    def _find_speaker_encoder(output_path: str) -> str:
        """Find the speaker encoder file in the output path

        Args:
            output_path (str): path to the model files

        Returns:
            str: path to the speaker encoder file
        """
        speaker_encoder_file = None
        for file_name in os.listdir(output_path):
            if file_name in ["model_se.pth", "model_se.pth.tar"]:
                speaker_encoder_file = os.path.join(output_path, file_name)
        return speaker_encoder_file

    def _update_paths(self, output_path: str, config_path: str) -> None:
        """Update paths for certain files in config.json after download.

        Args:
            output_path (str): local path the model is downloaded to.
            config_path (str): local config.json path.
        """
        output_stats_path = os.path.join(output_path, "scale_stats.npy")
        output_d_vector_file_path = os.path.join(output_path, "speakers.json")
        output_d_vector_file_pth_path = os.path.join(output_path, "speakers.pth")
        output_speaker_ids_file_path = os.path.join(output_path, "speaker_ids.json")
        output_speaker_ids_file_pth_path = os.path.join(output_path, "speaker_ids.pth")
        speaker_encoder_config_path = os.path.join(output_path, "config_se.json")
        speaker_encoder_model_path = self._find_speaker_encoder(output_path)

        # update the scale_path.npy file path in the model config.json
        self._update_path("audio.stats_path", output_stats_path, config_path)

        # update the speakers.json file path in the model config.json to the current path
        self._update_path("d_vector_file", output_d_vector_file_path, config_path)
        self._update_path("d_vector_file", output_d_vector_file_pth_path, config_path)
        self._update_path("model_args.d_vector_file", output_d_vector_file_path, config_path)
        self._update_path("model_args.d_vector_file", output_d_vector_file_pth_path, config_path)

        # update the speaker_ids.json file path in the model config.json to the current path
        self._update_path("speakers_file", output_speaker_ids_file_path, config_path)
        self._update_path("speakers_file", output_speaker_ids_file_pth_path, config_path)
        self._update_path("model_args.speakers_file", output_speaker_ids_file_path, config_path)
        self._update_path("model_args.speakers_file", output_speaker_ids_file_pth_path, config_path)

        # update the speaker_encoder file path in the model config.json to the current path
        self._update_path("speaker_encoder_model_path", speaker_encoder_model_path, config_path)
        self._update_path("model_args.speaker_encoder_model_path", speaker_encoder_model_path, config_path)
        self._update_path("speaker_encoder_config_path", speaker_encoder_config_path, config_path)
        self._update_path("model_args.speaker_encoder_config_path", speaker_encoder_config_path, config_path)

    @staticmethod
    def _update_path(field_name, new_path, config_path):
        """Update the path in the model config.json for the current environment after download"""
        if new_path and os.path.exists(new_path):
            config = load_config(config_path)
            field_names = field_name.split(".")
            if len(field_names) > 1:
                # field name points to a sub-level field
                sub_conf = config
                for fd in field_names[:-1]:
                    if fd in sub_conf:
                        sub_conf = sub_conf[fd]
                    else:
                        return
                if isinstance(sub_conf[field_names[-1]], list):
                    sub_conf[field_names[-1]] = [new_path]
                else:
                    sub_conf[field_names[-1]] = new_path
            else:
                # field name points to a top-level field
                if not field_name in config:
                    return
                if isinstance(config[field_name], list):
                    config[field_name] = [new_path]
                else:
                    config[field_name] = new_path
            config.save_json(config_path)

    @staticmethod
    def _download_zip_file(file_url, output_folder, progress_bar):
        """Download the github releases"""
        # download the file
        r = requests.get(file_url, stream=True)
        # extract the file
        try:
            total_size_in_bytes = int(r.headers.get("content-length", 0))
            block_size = 1024  # 1 Kibibyte
            if progress_bar:
                progress_bar = tqdm(total=total_size_in_bytes, unit="iB", unit_scale=True)
            temp_zip_name = os.path.join(output_folder, file_url.split("/")[-1])
            with open(temp_zip_name, "wb") as file:
                for data in r.iter_content(block_size):
                    if progress_bar:
                        progress_bar.update(len(data))
                    file.write(data)
            with zipfile.ZipFile(temp_zip_name) as z:
                z.extractall(output_folder)
            os.remove(temp_zip_name)  # delete zip after extract
        except zipfile.BadZipFile:
            print(f" > Error: Bad zip file - {file_url}")
            raise zipfile.BadZipFile  # pylint: disable=raise-missing-from
        # move the files to the outer path
        for file_path in z.namelist()[1:]:
            src_path = os.path.join(output_folder, file_path)
            dst_path = os.path.join(output_folder, os.path.basename(file_path))
            if src_path != dst_path:
                copyfile(src_path, dst_path)
        # remove the extracted folder
        rmtree(os.path.join(output_folder, z.namelist()[0]))

    @staticmethod
    def _check_dict_key(my_dict, key):
        if key in my_dict.keys() and my_dict[key] is not None:
            if not isinstance(key, str):
                return True
            if isinstance(key, str) and len(my_dict[key]) > 0:
                return True
        return False


================================================
FILE: TTS/utils/radam.py
================================================
# modified from https://github.com/LiyuanLucasLiu/RAdam

import math

import torch
from torch.optim.optimizer import Optimizer


class RAdam(Optimizer):
    def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8, weight_decay=0, degenerated_to_sgd=True):
        if lr < 0.0:
            raise ValueError("Invalid learning rate: {}".format(lr))
        if eps < 0.0:
            raise ValueError("Invalid epsilon value: {}".format(eps))
        if not 0.0 <= betas[0] < 1.0:
            raise ValueError("Invalid beta parameter at index 0: {}".format(betas[0]))
        if not 0.0 <= betas[1] < 1.0:
            raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))

        self.degenerated_to_sgd = degenerated_to_sgd
        if isinstance(params, (list, tuple)) and len(params) > 0 and isinstance(params[0], dict):
            for param in params:
                if "betas" in param and (param["betas"][0] != betas[0] or param["betas"][1] != betas[1]):
                    param["buffer"] = [[None, None, None] for _ in range(10)]
        defaults = dict(
            lr=lr, betas=betas, eps=eps, weight_decay=weight_decay, buffer=[[None, None, None] for _ in range(10)]
        )
        super().__init__(params, defaults)

    def __setstate__(self, state):  # pylint: disable=useless-super-delegation
        super().__setstate__(state)

    def step(self, closure=None):
        loss = None
        if closure is not None:
            loss = closure()

        for group in self.param_groups:
            for p in group["params"]:
                if p.grad is None:
                    continue
                grad = p.grad.data.float()
                if grad.is_sparse:
                    raise RuntimeError("RAdam does not support sparse gradients")

                p_data_fp32 = p.data.float()

                state = self.state[p]

                if len(state) == 0:
                    state["step"] = 0
                    state["exp_avg"] = torch.zeros_like(p_data_fp32)
                    state["exp_avg_sq"] = torch.zeros_like(p_data_fp32)
                else:
                    state["exp_avg"] = state["exp_avg"].type_as(p_data_fp32)
                    state["exp_avg_sq"] = state["exp_avg_sq"].type_as(p_data_fp32)

                exp_avg, exp_avg_sq = state["exp_avg"], state["exp_avg_sq"]
                beta1, beta2 = group["betas"]

                exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)
                exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)

                state["step"] += 1
                buffered = group["buffer"][int(state["step"] % 10)]
                if state["step"] == buffered[0]:
                    N_sma, step_size = buffered[1], buffered[2]
                else:
                    buffered[0] = state["step"]
                    beta2_t = beta2 ** state["step"]
                    N_sma_max = 2 / (1 - beta2) - 1
                    N_sma = N_sma_max - 2 * state["step"] * beta2_t / (1 - beta2_t)
                    buffered[1] = N_sma

                    # more conservative since it's an approximated value
                    if N_sma >= 5:
                        step_size = math.sqrt(
                            (1 - beta2_t)
                            * (N_sma - 4)
                            / (N_sma_max - 4)
                            * (N_sma - 2)
                            / N_sma
                            * N_sma_max
                            / (N_sma_max - 2)
                        ) / (1 - beta1 ** state["step"])
                    elif self.degenerated_to_sgd:
                        step_size = 1.0 / (1 - beta1 ** state["step"])
                    else:
                        step_size = -1
                    buffered[2] = step_size

                # more conservative since it's an approximated value
                if N_sma >= 5:
                    if group["weight_decay"] != 0:
                        p_data_fp32.add_(p_data_fp32, alpha=-group["weight_decay"] * group["lr"])
                    denom = exp_avg_sq.sqrt().add_(group["eps"])
                    p_data_fp32.addcdiv_(exp_avg, denom, value=-step_size * group["lr"])
                    p.data.copy_(p_data_fp32)
                elif step_size > 0:
                    if group["weight_decay"] != 0:
                        p_data_fp32.add_(p_data_fp32, alpha=-group["weight_decay"] * group["lr"])
                    p_data_fp32.add_(exp_avg, alpha=-step_size * group["lr"])
                    p.data.copy_(p_data_fp32)

        return loss


================================================
FILE: TTS/utils/samplers.py
================================================
import math
import random
from typing import Callable, List, Union

from torch.utils.data.sampler import BatchSampler, Sampler, SubsetRandomSampler


class SubsetSampler(Sampler):
    """
    Samples elements sequentially from a given list of indices.

    Args:
        indices (list): a sequence of indices
    """

    def __init__(self, indices):
        super().__init__(indices)
        self.indices = indices

    def __iter__(self):
        return (self.indices[i] for i in range(len(self.indices)))

    def __len__(self):
        return len(self.indices)


class PerfectBatchSampler(Sampler):
    """
    Samples a mini-batch of indices for a balanced class batching

    Args:
        dataset_items(list): dataset items to sample from.
        classes (list): list of classes of dataset_items to sample from.
        batch_size (int): total number of samples to be sampled in a mini-batch.
        num_gpus (int): number of GPU in the data parallel mode.
        shuffle (bool): if True, samples randomly, otherwise samples sequentially.
        drop_last (bool): if True, drops last incomplete batch.
    """

    def __init__(
        self,
        dataset_items,
        classes,
        batch_size,
        num_classes_in_batch,
        num_gpus=1,
        shuffle=True,
        drop_last=False,
        label_key="class_name",
    ):
        super().__init__(dataset_items)
        assert (
            batch_size % (num_classes_in_batch * num_gpus) == 0
        ), "Batch size must be divisible by number of classes times the number of data parallel devices (if enabled)."

        label_indices = {}
        for idx, item in enumerate(dataset_items):
            label = item[label_key]
            if label not in label_indices.keys():
                label_indices[label] = [idx]
            else:
                label_indices[label].append(idx)

        if shuffle:
            self._samplers = [SubsetRandomSampler(label_indices[key]) for key in classes]
        else:
            self._samplers = [SubsetSampler(label_indices[key]) for key in classes]

        self._batch_size = batch_size
        self._drop_last = drop_last
        self._dp_devices = num_gpus
        self._num_classes_in_batch = num_classes_in_batch

    def __iter__(self):
        batch = []
        if self._num_classes_in_batch != len(self._samplers):
            valid_samplers_idx = random.sample(range(len(self._samplers)), self._num_classes_in_batch)
        else:
            valid_samplers_idx = None

        iters = [iter(s) for s in self._samplers]
        done = False

        while True:
            b = []
            for i, it in enumerate(iters):
                if valid_samplers_idx is not None and i not in valid_samplers_idx:
                    continue
                idx = next(it, None)
                if idx is None:
                    done = True
                    break
                b.append(idx)
            if done:
                break
            batch += b
            if len(batch) == self._batch_size:
                yield batch
                batch = []
                if valid_samplers_idx is not None:
                    valid_samplers_idx = random.sample(range(len(self._samplers)), self._num_classes_in_batch)

        if not self._drop_last:
            if len(batch) > 0:
                groups = len(batch) // self._num_classes_in_batch
                if groups % self._dp_devices == 0:
                    yield batch
                else:
                    batch = batch[: (groups // self._dp_devices) * self._dp_devices * self._num_classes_in_batch]
                    if len(batch) > 0:
                        yield batch

    def __len__(self):
        class_batch_size = self._batch_size // self._num_classes_in_batch
        return min(((len(s) + class_batch_size - 1) // class_batch_size) for s in self._samplers)


def identity(x):
    return x


class SortedSampler(Sampler):
    """Samples elements sequentially, always in the same order.

    Taken from https://github.com/PetrochukM/PyTorch-NLP

    Args:
        data (iterable): Iterable data.
        sort_key (callable): Specifies a function of one argument that is used to extract a
            numerical comparison key from each list element.

    Example:
        >>> list(SortedSampler(range(10), sort_key=lambda i: -i))
        [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

    """

    def __init__(self, data, sort_key: Callable = identity):
        super().__init__(data)
        self.data = data
        self.sort_key = sort_key
        zip_ = [(i, self.sort_key(row)) for i, row in enumerate(self.data)]
        zip_ = sorted(zip_, key=lambda r: r[1])
        self.sorted_indexes = [item[0] for item in zip_]

    def __iter__(self):
        return iter(self.sorted_indexes)

    def __len__(self):
        return len(self.data)


class BucketBatchSampler(BatchSampler):
    """Bucket batch sampler

    Adapted from https://github.com/PetrochukM/PyTorch-NLP

    Args:
        sampler (torch.data.utils.sampler.Sampler):
        batch_size (int): Size of mini-batch.
        drop_last (bool): If `True` the sampler will drop the last batch if its size would be less
            than `batch_size`.
        data (list): List of data samples.
        sort_key (callable, optional): Callable to specify a comparison key for sorting.
        bucket_size_multiplier (int, optional): Buckets are of size
            `batch_size * bucket_size_multiplier`.

    Example:
        >>> sampler = WeightedRandomSampler(weights, len(weights))
        >>> sampler = BucketBatchSampler(sampler, data=data_items, batch_size=32, drop_last=True)
    """

    def __init__(
        self,
        sampler,
        data,
        batch_size,
        drop_last,
        sort_key: Union[Callable, List] = identity,
        bucket_size_multiplier=100,
    ):
        super().__init__(sampler, batch_size, drop_last)
        self.data = data
        self.sort_key = sort_key
        _bucket_size = batch_size * bucket_size_multiplier
        if hasattr(sampler, "__len__"):
            _bucket_size = min(_bucket_size, len(sampler))
        self.bucket_sampler = BatchSampler(sampler, _bucket_size, False)

    def __iter__(self):
        for idxs in self.bucket_sampler:
            bucket_data = [self.data[idx] for idx in idxs]
            sorted_sampler = SortedSampler(bucket_data, self.sort_key)
            for batch_idx in SubsetRandomSampler(list(BatchSampler(sorted_sampler, self.batch_size, self.drop_last))):
                sorted_idxs = [idxs[i] for i in batch_idx]
                yield sorted_idxs

    def __len__(self):
        if self.drop_last:
            return len(self.sampler) // self.batch_size
        return math.ceil(len(self.sampler) / self.batch_size)


================================================
FILE: TTS/utils/synthesizer.py
================================================
import time
from typing import List

import numpy as np
import pysbd
import torch

from TTS.config import load_config
from TTS.tts.models import setup_model as setup_tts_model

# pylint: disable=unused-wildcard-import
# pylint: disable=wildcard-import
from TTS.tts.utils.synthesis import synthesis, transfer_voice, trim_silence
from TTS.utils.audio import AudioProcessor
from TTS.utils.audio.numpy_transforms import save_wav
from TTS.vc.models import setup_model as setup_vc_model
from TTS.vocoder.models import setup_model as setup_vocoder_model
from TTS.vocoder.utils.generic_utils import interpolate_vocoder_input


class Synthesizer(object):
    def __init__(
        self,
        tts_checkpoint: str = "",
        tts_config_path: str = "",
        tts_speakers_file: str = "",
        tts_languages_file: str = "",
        vocoder_checkpoint: str = "",
        vocoder_config: str = "",
        encoder_checkpoint: str = "",
        encoder_config: str = "",
        vc_checkpoint: str = "",
        vc_config: str = "",
        use_cuda: bool = False,
    ) -> None:
        """General 🐸 TTS interface for inference. It takes a tts and a vocoder
        model and synthesize speech from the provided text.

        The text is divided into a list of sentences using `pysbd` and synthesize
        speech on each sentence separately.

        If you have certain special characters in your text, you need to handle
        them before providing the text to Synthesizer.

        TODO: set the segmenter based on the source language

        Args:
            tts_checkpoint (str, optional): path to the tts model file.
            tts_config_path (str, optional): path to the tts config file.
            vocoder_checkpoint (str, optional): path to the vocoder model file. Defaults to None.
            vocoder_config (str, optional): path to the vocoder config file. Defaults to None.
            encoder_checkpoint (str, optional): path to the speaker encoder model file. Defaults to `""`,
            encoder_config (str, optional): path to the speaker encoder config file. Defaults to `""`,
            vc_checkpoint (str, optional): path to the voice conversion model file. Defaults to `""`,
            vc_config (str, optional): path to the voice conversion config file. Defaults to `""`,
            use_cuda (bool, optional): enable/disable cuda. Defaults to False.
        """
        self.tts_checkpoint = tts_checkpoint
        self.tts_config_path = tts_config_path
        self.tts_speakers_file = tts_speakers_file
        self.tts_languages_file = tts_languages_file
        self.vocoder_checkpoint = vocoder_checkpoint
        self.vocoder_config = vocoder_config
        self.encoder_checkpoint = encoder_checkpoint
        self.encoder_config = encoder_config
        self.vc_checkpoint = vc_checkpoint
        self.vc_config = vc_config
        self.use_cuda = use_cuda

        self.tts_model = None
        self.vocoder_model = None
        self.vc_model = None
        self.speaker_manager = None
        self.tts_speakers = {}
        self.language_manager = None
        self.num_languages = 0
        self.tts_languages = {}
        self.d_vector_dim = 0
        self.seg = self._get_segmenter("en")
        self.use_cuda = use_cuda

        if self.use_cuda:
            assert torch.cuda.is_available(), "CUDA is not availabe on this machine."

        if tts_checkpoint:
            self._load_tts(tts_checkpoint, tts_config_path, use_cuda)
            self.output_sample_rate = self.tts_config.audio["sample_rate"]

        if vocoder_checkpoint:
            self._load_vocoder(vocoder_checkpoint, vocoder_config, use_cuda)
            self.output_sample_rate = self.vocoder_config.audio["sample_rate"]

        if vc_checkpoint:
            self._load_vc(vc_checkpoint, vc_config, use_cuda)
            self.output_sample_rate = self.vc_config.audio["output_sample_rate"]

    @staticmethod
    def _get_segmenter(lang: str):
        """get the sentence segmenter for the given language.

        Args:
            lang (str): target language code.

        Returns:
            [type]: [description]
        """
        return pysbd.Segmenter(language=lang, clean=True)

    def _load_vc(self, vc_checkpoint: str, vc_config_path: str, use_cuda: bool) -> None:
        """Load the voice conversion model.

        1. Load the model config.
        2. Init the model from the config.
        3. Load the model weights.
        4. Move the model to the GPU if CUDA is enabled.

        Args:
            vc_checkpoint (str): path to the model checkpoint.
            tts_config_path (str): path to the model config file.
            use_cuda (bool): enable/disable CUDA use.
        """
        # pylint: disable=global-statement
        self.vc_config = load_config(vc_config_path)
        self.vc_model = setup_vc_model(config=self.vc_config)
        self.vc_model.load_checkpoint(self.vc_config, vc_checkpoint)
        if use_cuda:
            self.vc_model.cuda()

    def _load_tts(self, tts_checkpoint: str, tts_config_path: str, use_cuda: bool) -> None:
        """Load the TTS model.

        1. Load the model config.
        2. Init the model from the config.
        3. Load the model weights.
        4. Move the model to the GPU if CUDA is enabled.
        5. Init the speaker manager in the model.

        Args:
            tts_checkpoint (str): path to the model checkpoint.
            tts_config_path (str): path to the model config file.
            use_cuda (bool): enable/disable CUDA use.
        """
        # pylint: disable=global-statement
        self.tts_config = load_config(tts_config_path)
        if self.tts_config["use_phonemes"] and self.tts_config["phonemizer"] is None:
            raise ValueError("Phonemizer is not defined in the TTS config.")

        self.tts_model = setup_tts_model(config=self.tts_config)

        if not self.encoder_checkpoint:
            self._set_speaker_encoder_paths_from_tts_config()

        self.tts_model.load_checkpoint(self.tts_config, tts_checkpoint, eval=True)
        if use_cuda:
            self.tts_model.cuda()

        if self.encoder_checkpoint and hasattr(self.tts_model, "speaker_manager"):
            self.tts_model.speaker_manager.init_encoder(self.encoder_checkpoint, self.encoder_config, use_cuda)

    def _set_speaker_encoder_paths_from_tts_config(self):
        """Set the encoder paths from the tts model config for models with speaker encoders."""
        if hasattr(self.tts_config, "model_args") and hasattr(
            self.tts_config.model_args, "speaker_encoder_config_path"
        ):
            self.encoder_checkpoint = self.tts_config.model_args.speaker_encoder_model_path
            self.encoder_config = self.tts_config.model_args.speaker_encoder_config_path

    def _load_vocoder(self, model_file: str, model_config: str, use_cuda: bool) -> None:
        """Load the vocoder model.

        1. Load the vocoder config.
        2. Init the AudioProcessor for the vocoder.
        3. Init the vocoder model from the config.
        4. Move the model to the GPU if CUDA is enabled.

        Args:
            model_file (str): path to the model checkpoint.
            model_config (str): path to the model config file.
            use_cuda (bool): enable/disable CUDA use.
        """
        self.vocoder_config = load_config(model_config)
        self.vocoder_ap = AudioProcessor(verbose=False, **self.vocoder_config.audio)
        self.vocoder_model = setup_vocoder_model(self.vocoder_config)
        self.vocoder_model.load_checkpoint(self.vocoder_config, model_file, eval=True)
        if use_cuda:
            self.vocoder_model.cuda()

    def split_into_sentences(self, text) -> List[str]:
        """Split give text into sentences.

        Args:
            text (str): input text in string format.

        Returns:
            List[str]: list of sentences.
        """
        return self.seg.segment(text)

    def save_wav(self, wav: List[int], path: str) -> None:
        """Save the waveform as a file.

        Args:
            wav (List[int]): waveform as a list of values.
            path (str): output path to save the waveform.
        """
        wav = np.array(wav)
        save_wav(wav=wav, path=path, sample_rate=self.output_sample_rate)

    def voice_conversion(self, source_wav: str, target_wav: str) -> List[int]:
        output_wav = self.vc_model.voice_conversion(source_wav, target_wav)
        return output_wav

    def tts(
        self,
        text: str = "",
        speaker_name: str = "",
        language_name: str = "",
        speaker_wav=None,
        style_wav=None,
        style_text=None,
        reference_wav=None,
        reference_speaker_name=None,
    ) -> List[int]:
        """🐸 TTS magic. Run all the models and generate speech.

        Args:
            text (str): input text.
            speaker_name (str, optional): spekaer id for multi-speaker models. Defaults to "".
            language_name (str, optional): language id for multi-language models. Defaults to "".
            speaker_wav (Union[str, List[str]], optional): path to the speaker wav for voice cloning. Defaults to None.
            style_wav ([type], optional): style waveform for GST. Defaults to None.
            style_text ([type], optional): transcription of style_wav for Capacitron. Defaults to None.
            reference_wav ([type], optional): reference waveform for voice conversion. Defaults to None.
            reference_speaker_name ([type], optional): spekaer id of reference waveform. Defaults to None.
        Returns:
            List[int]: [description]
        """
        start_time = time.time()
        wavs = []

        if not text and not reference_wav:
            raise ValueError(
                "You need to define either `text` (for sythesis) or a `reference_wav` (for voice conversion) to use the Coqui TTS API."
            )

        if text:
            sens = self.split_into_sentences(text)
            print(" > Text splitted to sentences.")
            print(sens)

        # handle multi-speaker
        speaker_embedding = None
        speaker_id = None
        if self.tts_speakers_file or hasattr(self.tts_model.speaker_manager, "name_to_id"):
            # handle Neon models with single speaker.
            if len(self.tts_model.speaker_manager.name_to_id) == 1:
                speaker_id = list(self.tts_model.speaker_manager.name_to_id.values())[0]

            elif speaker_name and isinstance(speaker_name, str):
                if self.tts_config.use_d_vector_file:
                    # get the average speaker embedding from the saved d_vectors.
                    speaker_embedding = self.tts_model.speaker_manager.get_mean_embedding(
                        speaker_name, num_samples=None, randomize=False
                    )
                    speaker_embedding = np.array(speaker_embedding)[None, :]  # [1 x embedding_dim]
                else:
                    # get speaker idx from the speaker name
                    speaker_id = self.tts_model.speaker_manager.name_to_id[speaker_name]

            elif not speaker_name and not speaker_wav:
                raise ValueError(
                    " [!] Look like you use a multi-speaker model. "
                    "You need to define either a `speaker_name` or a `speaker_wav` to use a multi-speaker model."
                )
            else:
                speaker_embedding = None
        else:
            if speaker_name:
                raise ValueError(
                    f" [!] Missing speakers.json file path for selecting speaker {speaker_name}."
                    "Define path for speaker.json if it is a multi-speaker model or remove defined speaker idx. "
                )

        # handle multi-lingual
        language_id = None
        if self.tts_languages_file or (
            hasattr(self.tts_model, "language_manager") and self.tts_model.language_manager is not None
        ):
            if len(self.tts_model.language_manager.name_to_id) == 1:
                language_id = list(self.tts_model.language_manager.name_to_id.values())[0]

            elif language_name and isinstance(language_name, str):
                language_id = self.tts_model.language_manager.name_to_id[language_name]

            elif not language_name:
                raise ValueError(
                    " [!] Look like you use a multi-lingual model. "
                    "You need to define either a `language_name` or a `style_wav` to use a multi-lingual model."
                )

            else:
                raise ValueError(
                    f" [!] Missing language_ids.json file path for selecting language {language_name}."
                    "Define path for language_ids.json if it is a multi-lingual model or remove defined language idx. "
                )

        # compute a new d_vector from the given clip.
        if speaker_wav is not None:
            speaker_embedding = self.tts_model.speaker_manager.compute_embedding_from_clip(speaker_wav)

        use_gl = self.vocoder_model is None

        if not reference_wav:
            for sen in sens:
                # synthesize voice
                outputs = synthesis(
                    model=self.tts_model,
                    text=sen,
                    CONFIG=self.tts_config,
                    use_cuda=self.use_cuda,
                    speaker_id=speaker_id,
                    style_wav=style_wav,
                    style_text=style_text,
                    use_griffin_lim=use_gl,
                    d_vector=speaker_embedding,
                    language_id=language_id,
                )
                waveform = outputs["wav"]
                mel_postnet_spec = outputs["outputs"]["model_outputs"][0].detach().cpu().numpy()
                if not use_gl:
                    # denormalize tts output based on tts audio config
                    mel_postnet_spec = self.tts_model.ap.denormalize(mel_postnet_spec.T).T
                    device_type = "cuda" if self.use_cuda else "cpu"
                    # renormalize spectrogram based on vocoder config
                    vocoder_input = self.vocoder_ap.normalize(mel_postnet_spec.T)
                    # compute scale factor for possible sample rate mismatch
                    scale_factor = [
                        1,
                        self.vocoder_config["audio"]["sample_rate"] / self.tts_model.ap.sample_rate,
                    ]
                    if scale_factor[1] != 1:
                        print(" > interpolating tts model output.")
                        vocoder_input = interpolate_vocoder_input(scale_factor, vocoder_input)
                    else:
                        vocoder_input = torch.tensor(vocoder_input).unsqueeze(0)  # pylint: disable=not-callable
                    # run vocoder model
                    # [1, T, C]
                    waveform = self.vocoder_model.inference(vocoder_input.to(device_type))
                if self.use_cuda and not use_gl:
                    waveform = waveform.cpu()
                if not use_gl:
                    waveform = waveform.numpy()
                waveform = waveform.squeeze()

                # trim silence
                if "do_trim_silence" in self.tts_config.audio and self.tts_config.audio["do_trim_silence"]:
                    waveform = trim_silence(waveform, self.tts_model.ap)

                wavs += list(waveform)
                wavs += [0] * 10000
        else:
            # get the speaker embedding or speaker id for the reference wav file
            reference_speaker_embedding = None
            reference_speaker_id = None
            if self.tts_speakers_file or hasattr(self.tts_model.speaker_manager, "name_to_id"):
                if reference_speaker_name and isinstance(reference_speaker_name, str):
                    if self.tts_config.use_d_vector_file:
                        # get the speaker embedding from the saved d_vectors.
                        reference_speaker_embedding = self.tts_model.speaker_manager.get_embeddings_by_name(
                            reference_speaker_name
                        )[0]
                        reference_speaker_embedding = np.array(reference_speaker_embedding)[
                            None, :
                        ]  # [1 x embedding_dim]
                    else:
                        # get speaker idx from the speaker name
                        reference_speaker_id = self.tts_model.speaker_manager.name_to_id[reference_speaker_name]
                else:
                    reference_speaker_embedding = self.tts_model.speaker_manager.compute_embedding_from_clip(
                        reference_wav
                    )
            outputs = transfer_voice(
                model=self.tts_model,
                CONFIG=self.tts_config,
                use_cuda=self.use_cuda,
                reference_wav=reference_wav,
                speaker_id=speaker_id,
                d_vector=speaker_embedding,
                use_griffin_lim=use_gl,
                reference_speaker_id=reference_speaker_id,
                reference_d_vector=reference_speaker_embedding,
            )
            waveform = outputs
            if not use_gl:
                mel_postnet_spec = outputs[0].detach().cpu().numpy()
                # denormalize tts output based on tts audio config
                mel_postnet_spec = self.tts_model.ap.denormalize(mel_postnet_spec.T).T
                device_type = "cuda" if self.use_cuda else "cpu"
                # renormalize spectrogram based on vocoder config
                vocoder_input = self.vocoder_ap.normalize(mel_postnet_spec.T)
                # compute scale factor for possible sample rate mismatch
                scale_factor = [
                    1,
                    self.vocoder_config["audio"]["sample_rate"] / self.tts_model.ap.sample_rate,
                ]
                if scale_factor[1] != 1:
                    print(" > interpolating tts model output.")
                    vocoder_input = interpolate_vocoder_input(scale_factor, vocoder_input)
                else:
                    vocoder_input = torch.tensor(vocoder_input).unsqueeze(0)  # pylint: disable=not-callable
                # run vocoder model
                # [1, T, C]
                waveform = self.vocoder_model.inference(vocoder_input.to(device_type))
            if self.use_cuda:
                waveform = waveform.cpu()
            if not use_gl:
                waveform = waveform.numpy()
            wavs = waveform.squeeze()

        # compute stats
        process_time = time.time() - start_time
        audio_time = len(wavs) / self.tts_config.audio["sample_rate"]
        print(f" > Processing time: {process_time}")
        print(f" > Real-time factor: {process_time / audio_time}")
        return wavs


================================================
FILE: TTS/utils/training.py
================================================
import numpy as np
import torch


def check_update(model, grad_clip, ignore_stopnet=False, amp_opt_params=None):
    r"""Check model gradient against unexpected jumps and failures"""
    skip_flag = False
    if ignore_stopnet:
        if not amp_opt_params:
            grad_norm = torch.nn.utils.clip_grad_norm_(
                [param for name, param in model.named_parameters() if "stopnet" not in name], grad_clip
            )
        else:
            grad_norm = torch.nn.utils.clip_grad_norm_(amp_opt_params, grad_clip)
    else:
        if not amp_opt_params:
            grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), grad_clip)
        else:
            grad_norm = torch.nn.utils.clip_grad_norm_(amp_opt_params, grad_clip)

    # compatibility with different torch versions
    if isinstance(grad_norm, float):
        if np.isinf(grad_norm):
            print(" | > Gradient is INF !!")
            skip_flag = True
    else:
        if torch.isinf(grad_norm):
            print(" | > Gradient is INF !!")
            skip_flag = True
    return grad_norm, skip_flag


def gradual_training_scheduler(global_step, config):
    """Setup the gradual training schedule wrt number
    of active GPUs"""
    num_gpus = torch.cuda.device_count()
    if num_gpus == 0:
        num_gpus = 1
    new_values = None
    # we set the scheduling wrt num_gpus
    for values in config.gradual_training:
        if global_step * num_gpus >= values[0]:
            new_values = values
    return new_values[1], new_values[2]


================================================
FILE: TTS/utils/vad.py
================================================
import soundfile as sf
import torch
import torchaudio


def read_audio(path):
    wav, sr = torchaudio.load(path)

    if wav.size(0) > 1:
        wav = wav.mean(dim=0, keepdim=True)

    return wav.squeeze(0), sr


def resample_wav(wav, sr, new_sr):
    wav = wav.unsqueeze(0)
    transform = torchaudio.transforms.Resample(orig_freq=sr, new_freq=new_sr)
    wav = transform(wav)
    return wav.squeeze(0)


def map_timestamps_to_new_sr(vad_sr, new_sr, timestamps, just_begging_end=False):
    factor = new_sr / vad_sr
    new_timestamps = []
    if just_begging_end and timestamps:
        # get just the start and end timestamps
        new_dict = {"start": int(timestamps[0]["start"] * factor), "end": int(timestamps[-1]["end"] * factor)}
        new_timestamps.append(new_dict)
    else:
        for ts in timestamps:
            # map to the new SR
            new_dict = {"start": int(ts["start"] * factor), "end": int(ts["end"] * factor)}
            new_timestamps.append(new_dict)

    return new_timestamps


def get_vad_model_and_utils(use_cuda=False):
    model, utils = torch.hub.load(repo_or_dir="snakers4/silero-vad", model="silero_vad", force_reload=True, onnx=False)
    if use_cuda:
        model = model.cuda()

    get_speech_timestamps, save_audio, _, _, collect_chunks = utils
    return model, get_speech_timestamps, save_audio, collect_chunks


def remove_silence(
    model_and_utils, audio_path, out_path, vad_sample_rate=8000, trim_just_beginning_and_end=True, use_cuda=False
):
    # get the VAD model and utils functions
    model, get_speech_timestamps, _, collect_chunks = model_and_utils

    # read ground truth wav and resample the audio for the VAD
    wav, gt_sample_rate = read_audio(audio_path)

    # if needed, resample the audio for the VAD model
    if gt_sample_rate != vad_sample_rate:
        wav_vad = resample_wav(wav, gt_sample_rate, vad_sample_rate)
    else:
        wav_vad = wav

    if use_cuda:
        wav_vad = wav_vad.cuda()

    # get speech timestamps from full audio file
    speech_timestamps = get_speech_timestamps(wav_vad, model, sampling_rate=vad_sample_rate, window_size_samples=768)

    # map the current speech_timestamps to the sample rate of the ground truth audio
    new_speech_timestamps = map_timestamps_to_new_sr(
        vad_sample_rate, gt_sample_rate, speech_timestamps, trim_just_beginning_and_end
    )

    # if have speech timestamps else save the wav
    if new_speech_timestamps:
        wav = collect_chunks(new_speech_timestamps, wav)
        is_speech = True
    else:
        print(f"> The file {audio_path} probably does not have speech please check it !!")
        is_speech = False

    # save audio
    sf.write(out_path, wav, gt_sample_rate, subtype="PCM_16")
    return out_path, is_speech


================================================
FILE: TTS/vc/configs/__init__.py
================================================


================================================
FILE: TTS/vc/configs/freevc_config.py
================================================
from dataclasses import dataclass, field
from typing import List

from TTS.vc.configs.shared_configs import BaseVCConfig
from TTS.vc.models.freevc import FreeVCArgs, FreeVCAudioConfig, FreeVCConfig


================================================
FILE: TTS/vc/configs/shared_configs.py
================================================
from dataclasses import asdict, dataclass, field
from typing import Dict, List

from coqpit import Coqpit, check_argument

from TTS.config import BaseAudioConfig, BaseDatasetConfig, BaseTrainingConfig


@dataclass
class BaseVCConfig(BaseTrainingConfig):
    """Shared parameters among all the tts models.

    Args:

        audio (BaseAudioConfig):
            Audio processor config object instance.

        batch_group_size (int):
            Size of the batch groups used for bucketing. By default, the dataloader orders samples by the sequence
            length for a more efficient and stable training. If `batch_group_size > 1` then it performs bucketing to
            prevent using the same batches for each epoch.

        loss_masking (bool):
            enable / disable masking loss values against padded segments of samples in a batch.

        min_text_len (int):
            Minimum length of input text to be used. All shorter samples will be ignored. Defaults to 0.

        max_text_len (int):
            Maximum length of input text to be used. All longer samples will be ignored. Defaults to float("inf").

        min_audio_len (int):
            Minimum length of input audio to be used. All shorter samples will be ignored. Defaults to 0.

        max_audio_len (int):
            Maximum length of input audio to be used. All longer samples will be ignored. The maximum length in the
            dataset defines the VRAM used in the training. Hence, pay attention to this value if you encounter an
            OOM error in training. Defaults to float("inf").

        compute_f0 (int):
            (Not in use yet).

        compute_energy (int):
            (Not in use yet).

        compute_linear_spec (bool):
            If True data loader computes and returns linear spectrograms alongside the other data.

        precompute_num_workers (int):
            Number of workers to precompute features. Defaults to 0.

        use_noise_augment (bool):
            Augment the input audio with random noise.

        start_by_longest (bool):
            If True, the data loader will start loading the longest batch first. It is useful for checking OOM issues.
            Defaults to False.

        shuffle (bool):
            If True, the data loader will shuffle the dataset when there is not sampler defined. Defaults to True.

        drop_last (bool):
            If True, the data loader will drop the last batch if it is not complete. It helps to prevent
            issues that emerge from the partial batch statistics. Defaults to True.

        add_blank (bool):
            Add blank characters between each other two characters. It improves performance for some models at expense
            of slower run-time due to the longer input sequence.

        datasets (List[BaseDatasetConfig]):
            List of datasets used for training. If multiple datasets are provided, they are merged and used together
            for training.

        optimizer (str):
            Optimizer used for the training. Set one from `torch.optim.Optimizer` or `TTS.utils.training`.
            Defaults to ``.

        optimizer_params (dict):
            Optimizer kwargs. Defaults to `{"betas": [0.8, 0.99], "weight_decay": 0.0}`

        lr_scheduler (str):
            Learning rate scheduler for the training. Use one from `torch.optim.Scheduler` schedulers or
            `TTS.utils.training`. Defaults to ``.

        lr_scheduler_params (dict):
            Parameters for the generator learning rate scheduler. Defaults to `{"warmup": 4000}`.

        test_sentences (List[str]):
            List of sentences to be used at testing. Defaults to '[]'

        eval_split_max_size (int):
            Number maximum of samples to be used for evaluation in proportion split. Defaults to None (Disabled).

        eval_split_size (float):
            If between 0.0 and 1.0 represents the proportion of the dataset to include in the evaluation set.
            If > 1, represents the absolute number of evaluation samples. Defaults to 0.01 (1%).

        use_speaker_weighted_sampler (bool):
            Enable / Disable the batch balancer by speaker. Defaults to ```False```.

        speaker_weighted_sampler_alpha (float):
            Number that control the influence of the speaker sampler weights. Defaults to ```1.0```.

        use_language_weighted_sampler (bool):
            Enable / Disable the batch balancer by language. Defaults to ```False```.

        language_weighted_sampler_alpha (float):
            Number that control the influence of the language sampler weights. Defaults to ```1.0```.

        use_length_weighted_sampler (bool):
            Enable / Disable the batch balancer by audio length. If enabled the dataset will be divided
            into 10 buckets considering the min and max audio of the dataset. The sampler weights will be
            computed forcing to have the same quantity of data for each bucket in each training batch. Defaults to ```False```.

        length_weighted_sampler_alpha (float):
            Number that control the influence of the length sampler weights. Defaults to ```1.0```.
    """

    audio: BaseAudioConfig = field(default_factory=BaseAudioConfig)
    # training params
    batch_group_size: int = 0
    loss_masking: bool = None
    # dataloading
    min_audio_len: int = 1
    max_audio_len: int = float("inf")
    min_text_len: int = 1
    max_text_len: int = float("inf")
    compute_f0: bool = False
    compute_energy: bool = False
    compute_linear_spec: bool = False
    precompute_num_workers: int = 0
    use_noise_augment: bool = False
    start_by_longest: bool = False
    shuffle: bool = False
    drop_last: bool = False
    # dataset
    datasets: List[BaseDatasetConfig] = field(default_factory=lambda: [BaseDatasetConfig()])
    # optimizer
    optimizer: str = "radam"
    optimizer_params: dict = None
    # scheduler
    lr_scheduler: str = None
    lr_scheduler_params: dict = field(default_factory=lambda: {})
    # testing
    test_sentences: List[str] = field(default_factory=lambda: [])
    # evaluation
    eval_split_max_size: int = None
    eval_split_size: float = 0.01
    # weighted samplers
    use_speaker_weighted_sampler: bool = False
    speaker_weighted_sampler_alpha: float = 1.0
    use_language_weighted_sampler: bool = False
    language_weighted_sampler_alpha: float = 1.0
    use_length_weighted_sampler: bool = False
    length_weighted_sampler_alpha: float = 1.0


================================================
FILE: TTS/vc/models/__init__.py
================================================
import importlib
import re
from typing import Dict, List, Union


def to_camel(text):
    text = text.capitalize()
    return re.sub(r"(?!^)_([a-zA-Z])", lambda m: m.group(1).upper(), text)


def setup_model(config: "Coqpit", samples: Union[List[List], List[Dict]] = None) -> "BaseVC":
    print(" > Using model: {}".format(config.model))
    # fetch the right model implementation.
    if "model" in config and config["model"].lower() == "freevc":
        MyModel = importlib.import_module("TTS.vc.models.freevc").FreeVC
        model = MyModel.init_from_config(config, samples)
    return model


================================================
FILE: TTS/vc/models/base_vc.py
================================================
import os
import random
from typing import Dict, List, Tuple, Union

import torch
import torch.distributed as dist
from coqpit import Coqpit
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.data.sampler import WeightedRandomSampler
from trainer.torch import DistributedSampler, DistributedSamplerWrapper

from TTS.model import BaseTrainerModel
from TTS.tts.datasets.dataset import TTSDataset
from TTS.tts.utils.data import get_length_balancer_weights
from TTS.tts.utils.languages import LanguageManager, get_language_balancer_weights
from TTS.tts.utils.speakers import SpeakerManager, get_speaker_balancer_weights
from TTS.tts.utils.synthesis import synthesis
from TTS.tts.utils.visual import plot_alignment, plot_spectrogram

# pylint: skip-file


class BaseVC(BaseTrainerModel):
    """Base `vc` class. Every new `vc` model must inherit this.

    It defines common `vc` specific functions on top of `Model` implementation.
    """

    MODEL_TYPE = "vc"

    def __init__(
        self,
        config: Coqpit,
        ap: "AudioProcessor",
        speaker_manager: SpeakerManager = None,
        language_manager: LanguageManager = None,
    ):
        super().__init__()
        self.config = config
        self.ap = ap
        self.speaker_manager = speaker_manager
        self.language_manager = language_manager
        self._set_model_args(config)

    def _set_model_args(self, config: Coqpit):
        """Setup model args based on the config type (`ModelConfig` or `ModelArgs`).

        `ModelArgs` has all the fields reuqired to initialize the model architecture.

        `ModelConfig` has all the fields required for training, inference and containes `ModelArgs`.

        If the config is for training with a name like "*Config", then the model args are embeded in the
        config.model_args

        If the config is for the model with a name like "*Args", then we assign the directly.
        """
        # don't use isintance not to import recursively
        if "Config" in config.__class__.__name__:
            self.config = config
            self.args = config.model_args
        elif "Args" in config.__class__.__name__:
            self.args = config
        else:
            raise ValueError("config must be either a *Config or *Args")

    def init_multispeaker(self, config: Coqpit, data: List = None):
        """Initialize a speaker embedding layer if needen and define expected embedding channel size for defining
        `in_channels` size of the connected layers.

        This implementation yields 3 possible outcomes:

        1. If `config.use_speaker_embedding` and `config.use_d_vector_file are False, do nothing.
        2. If `config.use_d_vector_file` is True, set expected embedding channel size to `config.d_vector_dim` or 512.
        3. If `config.use_speaker_embedding`, initialize a speaker embedding layer with channel size of
        `config.d_vector_dim` or 512.

        You can override this function for new models.

        Args:
            config (Coqpit): Model configuration.
        """
        # set number of speakers
        if self.speaker_manager is not None:
            self.num_speakers = self.speaker_manager.num_speakers
        elif hasattr(config, "num_speakers"):
            self.num_speakers = config.num_speakers

        # set ultimate speaker embedding size
        if config.use_speaker_embedding or config.use_d_vector_file:
            self.embedded_speaker_dim = (
                config.d_vector_dim if "d_vector_dim" in config and config.d_vector_dim is not None else 512
            )
        # init speaker embedding layer
        if config.use_speaker_embedding and not config.use_d_vector_file:
            print(" > Init speaker_embedding layer.")
            self.speaker_embedding = nn.Embedding(self.num_speakers, self.embedded_speaker_dim)
            self.speaker_embedding.weight.data.normal_(0, 0.3)

    def get_aux_input(self, **kwargs) -> Dict:
        """Prepare and return `aux_input` used by `forward()`"""
        return {"speaker_id": None, "style_wav": None, "d_vector": None, "language_id": None}

    def get_aux_input_from_test_sentences(self, sentence_info):
        if hasattr(self.config, "model_args"):
            config = self.config.model_args
        else:
            config = self.config

        # extract speaker and language info
        text, speaker_name, style_wav, language_name = None, None, None, None

        if isinstance(sentence_info, list):
            if len(sentence_info) == 1:
                text = sentence_info[0]
            elif len(sentence_info) == 2:
                text, speaker_name = sentence_info
            elif len(sentence_info) == 3:
                text, speaker_name, style_wav = sentence_info
            elif len(sentence_info) == 4:
                text, speaker_name, style_wav, language_name = sentence_info
        else:
            text = sentence_info

        # get speaker  id/d_vector
        speaker_id, d_vector, language_id = None, None, None
        if self.speaker_manager is not None:
            if config.use_d_vector_file:
                if speaker_name is None:
                    d_vector = self.speaker_manager.get_random_embedding()
                else:
                    d_vector = self.speaker_manager.get_d_vector_by_name(speaker_name)
            elif config.use_speaker_embedding:
                if speaker_name is None:
                    speaker_id = self.speaker_manager.get_random_id()
                else:
                    speaker_id = self.speaker_manager.name_to_id[speaker_name]

        # get language id
        if self.language_manager is not None and config.use_language_embedding and language_name is not None:
            language_id = self.language_manager.name_to_id[language_name]

        return {
            "text": text,
            "speaker_id": speaker_id,
            "style_wav": style_wav,
            "d_vector": d_vector,
            "language_id": language_id,
        }

    def format_batch(self, batch: Dict) -> Dict:
        """Generic batch formatting for `VCDataset`.

        You must override this if you use a custom dataset.

        Args:
            batch (Dict): [description]

        Returns:
            Dict: [description]
        """
        # setup input batch
        text_input = batch["token_id"]
        text_lengths = batch["token_id_lengths"]
        speaker_names = batch["speaker_names"]
        linear_input = batch["linear"]
        mel_input = batch["mel"]
        mel_lengths = batch["mel_lengths"]
        stop_targets = batch["stop_targets"]
        item_idx = batch["item_idxs"]
        d_vectors = batch["d_vectors"]
        speaker_ids = batch["speaker_ids"]
        attn_mask = batch["attns"]
        waveform = batch["waveform"]
        pitch = batch["pitch"]
        energy = batch["energy"]
        language_ids = batch["language_ids"]
        max_text_length = torch.max(text_lengths.float())
        max_spec_length = torch.max(mel_lengths.float())

        # compute durations from attention masks
        durations = None
        if attn_mask is not None:
            durations = torch.zeros(attn_mask.shape[0], attn_mask.shape[2])
            for idx, am in enumerate(attn_mask):
                # compute raw durations
                c_idxs = am[:, : text_lengths[idx], : mel_lengths[idx]].max(1)[1]
                # c_idxs, counts = torch.unique_consecutive(c_idxs, return_counts=True)
                c_idxs, counts = torch.unique(c_idxs, return_counts=True)
                dur = torch.ones([text_lengths[idx]]).to(counts.dtype)
                dur[c_idxs] = counts
                # smooth the durations and set any 0 duration to 1
                # by cutting off from the largest duration indeces.
                extra_frames = dur.sum() - mel_lengths[idx]
                largest_idxs = torch.argsort(-dur)[:extra_frames]
                dur[largest_idxs] -= 1
                assert (
                    dur.sum() == mel_lengths[idx]
                ), f" [!] total duration {dur.sum()} vs spectrogram length {mel_lengths[idx]}"
                durations[idx, : text_lengths[idx]] = dur

        # set stop targets wrt reduction factor
        stop_targets = stop_targets.view(text_input.shape[0], stop_targets.size(1) // self.config.r, -1)
        stop_targets = (stop_targets.sum(2) > 0.0).unsqueeze(2).float().squeeze(2)
        stop_target_lengths = torch.divide(mel_lengths, self.config.r).ceil_()

        return {
            "text_input": text_input,
            "text_lengths": text_lengths,
            "speaker_names": speaker_names,
            "mel_input": mel_input,
            "mel_lengths": mel_lengths,
            "linear_input": linear_input,
            "stop_targets": stop_targets,
            "stop_target_lengths": stop_target_lengths,
            "attn_mask": attn_mask,
            "durations": durations,
            "speaker_ids": speaker_ids,
            "d_vectors": d_vectors,
            "max_text_length": float(max_text_length),
            "max_spec_length": float(max_spec_length),
            "item_idx": item_idx,
            "waveform": waveform,
            "pitch": pitch,
            "energy": energy,
            "language_ids": language_ids,
            "audio_unique_names": batch["audio_unique_names"],
        }

    def get_sampler(self, config: Coqpit, dataset: TTSDataset, num_gpus=1):
        weights = None
        data_items = dataset.samples

        if getattr(config, "use_language_weighted_sampler", False):
            alpha = getattr(config, "language_weighted_sampler_alpha", 1.0)
            print(" > Using Language weighted sampler with alpha:", alpha)
            weights = get_language_balancer_weights(data_items) * alpha

        if getattr(config, "use_speaker_weighted_sampler", False):
            alpha = getattr(config, "speaker_weighted_sampler_alpha", 1.0)
            print(" > Using Speaker weighted sampler with alpha:", alpha)
            if weights is not None:
                weights += get_speaker_balancer_weights(data_items) * alpha
            else:
                weights = get_speaker_balancer_weights(data_items) * alpha

        if getattr(config, "use_length_weighted_sampler", False):
            alpha = getattr(config, "length_weighted_sampler_alpha", 1.0)
            print(" > Using Length weighted sampler with alpha:", alpha)
            if weights is not None:
                weights += get_length_balancer_weights(data_items) * alpha
            else:
                weights = get_length_balancer_weights(data_items) * alpha

        if weights is not None:
            sampler = WeightedRandomSampler(weights, len(weights))
        else:
            sampler = None

        # sampler for DDP
        if sampler is None:
            sampler = DistributedSampler(dataset) if num_gpus > 1 else None
        else:  # If a sampler is already defined use this sampler and DDP sampler together
            sampler = DistributedSamplerWrapper(sampler) if num_gpus > 1 else sampler

        return sampler

    def get_data_loader(
        self,
        config: Coqpit,
        assets: Dict,
        is_eval: bool,
        samples: Union[List[Dict], List[List]],
        verbose: bool,
        num_gpus: int,
        rank: int = None,
    ) -> "DataLoader":
        if is_eval and not config.run_eval:
            loader = None
        else:
            # setup multi-speaker attributes
            if self.speaker_manager is not None:
                if hasattr(config, "model_args"):
                    speaker_id_mapping = (
                        self.speaker_manager.name_to_id if config.model_args.use_speaker_embedding else None
                    )
                    d_vector_mapping = self.speaker_manager.embeddings if config.model_args.use_d_vector_file else None
                    config.use_d_vector_file = config.model_args.use_d_vector_file
                else:
                    speaker_id_mapping = self.speaker_manager.name_to_id if config.use_speaker_embedding else None
                    d_vector_mapping = self.speaker_manager.embeddings if config.use_d_vector_file else None
            else:
                speaker_id_mapping = None
                d_vector_mapping = None

            # setup multi-lingual attributes
            if self.language_manager is not None:
                language_id_mapping = self.language_manager.name_to_id if self.args.use_language_embedding else None
            else:
                language_id_mapping = None

            # init dataloader
            dataset = TTSDataset(
                outputs_per_step=config.r if "r" in config else 1,
                compute_linear_spec=config.model.lower() == "tacotron" or config.compute_linear_spec,
                compute_f0=config.get("compute_f0", False),
                f0_cache_path=config.get("f0_cache_path", None),
                compute_energy=config.get("compute_energy", False),
                energy_cache_path=config.get("energy_cache_path", None),
                samples=samples,
                ap=self.ap,
                return_wav=config.return_wav if "return_wav" in config else False,
                batch_group_size=0 if is_eval else config.batch_group_size * config.batch_size,
                min_text_len=config.min_text_len,
                max_text_len=config.max_text_len,
                min_audio_len=config.min_audio_len,
                max_audio_len=config.max_audio_len,
                phoneme_cache_path=config.phoneme_cache_path,
                precompute_num_workers=config.precompute_num_workers,
                use_noise_augment=False if is_eval else config.use_noise_augment,
                verbose=verbose,
                speaker_id_mapping=speaker_id_mapping,
                d_vector_mapping=d_vector_mapping if config.use_d_vector_file else None,
                tokenizer=None,
                start_by_longest=config.start_by_longest,
                language_id_mapping=language_id_mapping,
            )

            # wait all the DDP process to be ready
            if num_gpus > 1:
                dist.barrier()

            # sort input sequences from short to long
            dataset.preprocess_samples()

            # get samplers
            sampler = self.get_sampler(config, dataset, num_gpus)

            loader = DataLoader(
                dataset,
                batch_size=config.eval_batch_size if is_eval else config.batch_size,
                shuffle=config.shuffle if sampler is None else False,  # if there is no other sampler
                collate_fn=dataset.collate_fn,
                drop_last=config.drop_last,  # setting this False might cause issues in AMP training.
                sampler=sampler,
                num_workers=config.num_eval_loader_workers if is_eval else config.num_loader_workers,
                pin_memory=False,
            )
        return loader

    def _get_test_aux_input(
        self,
    ) -> Dict:
        d_vector = None
        if self.config.use_d_vector_file:
            d_vector = [self.speaker_manager.embeddings[name]["embedding"] for name in self.speaker_manager.embeddings]
            d_vector = (random.sample(sorted(d_vector), 1),)

        aux_inputs = {
            "speaker_id": None
            if not self.config.use_speaker_embedding
            else random.sample(sorted(self.speaker_manager.name_to_id.values()), 1),
            "d_vector": d_vector,
            "style_wav": None,  # TODO: handle GST style input
        }
        return aux_inputs

    def test_run(self, assets: Dict) -> Tuple[Dict, Dict]:
        """Generic test run for `vc` models used by `Trainer`.

        You can override this for a different behaviour.

        Args:
            assets (dict): A dict of training assets. For `vc` models, it must include `{'audio_processor': ap}`.

        Returns:
            Tuple[Dict, Dict]: Test figures and audios to be projected to Tensorboard.
        """
        print(" | > Synthesizing test sentences.")
        test_audios = {}
        test_figures = {}
        test_sentences = self.config.test_sentences
        aux_inputs = self._get_test_aux_input()
        for idx, sen in enumerate(test_sentences):
            if isinstance(sen, list):
                aux_inputs = self.get_aux_input_from_test_sentences(sen)
                sen = aux_inputs["text"]
            outputs_dict = synthesis(
                self,
                sen,
                self.config,
                "cuda" in str(next(self.parameters()).device),
                speaker_id=aux_inputs["speaker_id"],
                d_vector=aux_inputs["d_vector"],
                style_wav=aux_inputs["style_wav"],
                use_griffin_lim=True,
                do_trim_silence=False,
            )
            test_audios["{}-audio".format(idx)] = outputs_dict["wav"]
            test_figures["{}-prediction".format(idx)] = plot_spectrogram(
                outputs_dict["outputs"]["model_outputs"], self.ap, output_fig=False
            )
            test_figures["{}-alignment".format(idx)] = plot_alignment(
                outputs_dict["outputs"]["alignments"], output_fig=False
            )
        return test_figures, test_audios

    def on_init_start(self, trainer):
        """Save the speaker.pth and language_ids.json at the beginning of the training. Also update both paths."""
        if self.speaker_manager is not None:
            output_path = os.path.join(trainer.output_path, "speakers.pth")
            self.speaker_manager.save_ids_to_file(output_path)
            trainer.config.speakers_file = output_path
            # some models don't have `model_args` set
            if hasattr(trainer.config, "model_args"):
                trainer.config.model_args.speakers_file = output_path
            trainer.config.save_json(os.path.join(trainer.output_path, "config.json"))
            print(f" > `speakers.pth` is saved to {output_path}.")
            print(" > `speakers_file` is updated in the config.json.")

        if self.language_manager is not None:
            output_path = os.path.join(trainer.output_path, "language_ids.json")
            self.language_manager.save_ids_to_file(output_path)
            trainer.config.language_ids_file = output_path
            if hasattr(trainer.config, "model_args"):
                trainer.config.model_args.language_ids_file = output_path
            trainer.config.save_json(os.path.join(trainer.output_path, "config.json"))
            print(f" > `language_ids.json` is saved to {output_path}.")
            print(" > `language_ids_file` is updated in the config.json.")


================================================
FILE: TTS/vc/models/freevc.py
================================================
from dataclasses import dataclass, field
from typing import Dict, List, Optional, Tuple, Union

import librosa
import numpy as np
import torch
from coqpit import Coqpit
from torch import nn
from torch.nn import AvgPool1d, Conv1d, Conv2d, ConvTranspose1d
from torch.nn import functional as F
from torch.nn.utils import remove_weight_norm, spectral_norm, weight_norm

import TTS.vc.modules.freevc.commons as commons
import TTS.vc.modules.freevc.modules as modules
from TTS.tts.utils.speakers import SpeakerManager
from TTS.utils.io import load_fsspec, save_checkpoint
from TTS.vc.configs.shared_configs import BaseVCConfig
from TTS.vc.models.base_vc import BaseVC
from TTS.vc.modules.freevc.commons import get_padding, init_weights
from TTS.vc.modules.freevc.mel_processing import mel_spectrogram_torch
from TTS.vc.modules.freevc.speaker_encoder.speaker_encoder import SpeakerEncoder as SpeakerEncoderEx
from TTS.vc.modules.freevc.wavlm import get_wavlm


class ResidualCouplingBlock(nn.Module):
    def __init__(self, channels, hidden_channels, kernel_size, dilation_rate, n_layers, n_flows=4, gin_channels=0):
        super().__init__()
        self.channels = channels
        self.hidden_channels = hidden_channels
        self.kernel_size = kernel_size
        self.dilation_rate = dilation_rate
        self.n_layers = n_layers
        self.n_flows = n_flows
        self.gin_channels = gin_channels

        self.flows = nn.ModuleList()
        for i in range(n_flows):
            self.flows.append(
                modules.ResidualCouplingLayer(
                    channels,
                    hidden_channels,
                    kernel_size,
                    dilation_rate,
                    n_layers,
                    gin_channels=gin_channels,
                    mean_only=True,
                )
            )
            self.flows.append(modules.Flip())

    def forward(self, x, x_mask, g=None, reverse=False):
        if not reverse:
            for flow in self.flows:
                x, _ = flow(x, x_mask, g=g, reverse=reverse)
        else:
            for flow in reversed(self.flows):
                x = flow(x, x_mask, g=g, reverse=reverse)
        return x


class Encoder(nn.Module):
    def __init__(
        self, in_channels, out_channels, hidden_channels, kernel_size, dilation_rate, n_layers, gin_channels=0
    ):
        super().__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.hidden_channels = hidden_channels
        self.kernel_size = kernel_size
        self.dilation_rate = dilation_rate
        self.n_layers = n_layers
        self.gin_channels = gin_channels

        self.pre = nn.Conv1d(in_channels, hidden_channels, 1)
        self.enc = modules.WN(hidden_channels, kernel_size, dilation_rate, n_layers, gin_channels=gin_channels)
        self.proj = nn.Conv1d(hidden_channels, out_channels * 2, 1)

    def forward(self, x, x_lengths, g=None):
        x_mask = torch.unsqueeze(commons.sequence_mask(x_lengths, x.size(2)), 1).to(x.dtype)
        x = self.pre(x) * x_mask
        x = self.enc(x, x_mask, g=g)
        stats = self.proj(x) * x_mask
        m, logs = torch.split(stats, self.out_channels, dim=1)
        z = (m + torch.randn_like(m) * torch.exp(logs)) * x_mask
        return z, m, logs, x_mask


class Generator(torch.nn.Module):
    def __init__(
        self,
        initial_channel,
        resblock,
        resblock_kernel_sizes,
        resblock_dilation_sizes,
        upsample_rates,
        upsample_initial_channel,
        upsample_kernel_sizes,
        gin_channels=0,
    ):
        super(Generator, self).__init__()
        self.num_kernels = len(resblock_kernel_sizes)
        self.num_upsamples = len(upsample_rates)
        self.conv_pre = Conv1d(initial_channel, upsample_initial_channel, 7, 1, padding=3)
        resblock = modules.ResBlock1 if resblock == "1" else modules.ResBlock2

        self.ups = nn.ModuleList()
        for i, (u, k) in enumerate(zip(upsample_rates, upsample_kernel_sizes)):
            self.ups.append(
                weight_norm(
                    ConvTranspose1d(
                        upsample_initial_channel // (2**i),
                        upsample_initial_channel // (2 ** (i + 1)),
                        k,
                        u,
                        padding=(k - u) // 2,
                    )
                )
            )

        self.resblocks = nn.ModuleList()
        for i in range(len(self.ups)):
            ch = upsample_initial_channel // (2 ** (i + 1))
            for j, (k, d) in enumerate(zip(resblock_kernel_sizes, resblock_dilation_sizes)):
                self.resblocks.append(resblock(ch, k, d))

        self.conv_post = Conv1d(ch, 1, 7, 1, padding=3, bias=False)
        self.ups.apply(init_weights)

        if gin_channels != 0:
            self.cond = nn.Conv1d(gin_channels, upsample_initial_channel, 1)

    def forward(self, x, g=None):
        x = self.conv_pre(x)
        if g is not None:
            x = x + self.cond(g)

        for i in range(self.num_upsamples):
            x = F.leaky_relu(x, modules.LRELU_SLOPE)
            x = self.ups[i](x)
            xs = None
            for j in range(self.num_kernels):
                if xs is None:
                    xs = self.resblocks[i * self.num_kernels + j](x)
                else:
                    xs += self.resblocks[i * self.num_kernels + j](x)
            x = xs / self.num_kernels
        x = F.leaky_relu(x)
        x = self.conv_post(x)
        x = torch.tanh(x)

        return x

    def remove_weight_norm(self):
        print("Removing weight norm...")
        for l in self.ups:
            remove_weight_norm(l)
        for l in self.resblocks:
            l.remove_weight_norm()


class DiscriminatorP(torch.nn.Module):
    def __init__(self, period, kernel_size=5, stride=3, use_spectral_norm=False):
        super(DiscriminatorP, self).__init__()
        self.period = period
        self.use_spectral_norm = use_spectral_norm
        norm_f = weight_norm if use_spectral_norm == False else spectral_norm
        self.convs = nn.ModuleList(
            [
                norm_f(Conv2d(1, 32, (kernel_size, 1), (stride, 1), padding=(get_padding(kernel_size, 1), 0))),
                norm_f(Conv2d(32, 128, (kernel_size, 1), (stride, 1), padding=(get_padding(kernel_size, 1), 0))),
                norm_f(Conv2d(128, 512, (kernel_size, 1), (stride, 1), padding=(get_padding(kernel_size, 1), 0))),
                norm_f(Conv2d(512, 1024, (kernel_size, 1), (stride, 1), padding=(get_padding(kernel_size, 1), 0))),
                norm_f(Conv2d(1024, 1024, (kernel_size, 1), 1, padding=(get_padding(kernel_size, 1), 0))),
            ]
        )
        self.conv_post = norm_f(Conv2d(1024, 1, (3, 1), 1, padding=(1, 0)))

    def forward(self, x):
        fmap = []

        # 1d to 2d
        b, c, t = x.shape
        if t % self.period != 0:  # pad first
            n_pad = self.period - (t % self.period)
            x = F.pad(x, (0, n_pad), "reflect")
            t = t + n_pad
        x = x.view(b, c, t // self.period, self.period)

        for l in self.convs:
            x = l(x)
            x = F.leaky_relu(x, modules.LRELU_SLOPE)
            fmap.append(x)
        x = self.conv_post(x)
        fmap.append(x)
        x = torch.flatten(x, 1, -1)

        return x, fmap


class DiscriminatorS(torch.nn.Module):
    def __init__(self, use_spectral_norm=False):
        super(DiscriminatorS, self).__init__()
        norm_f = weight_norm if use_spectral_norm == False else spectral_norm
        self.convs = nn.ModuleList(
            [
                norm_f(Conv1d(1, 16, 15, 1, padding=7)),
                norm_f(Conv1d(16, 64, 41, 4, groups=4, padding=20)),
                norm_f(Conv1d(64, 256, 41, 4, groups=16, padding=20)),
                norm_f(Conv1d(256, 1024, 41, 4, groups=64, padding=20)),
                norm_f(Conv1d(1024, 1024, 41, 4, groups=256, padding=20)),
                norm_f(Conv1d(1024, 1024, 5, 1, padding=2)),
            ]
        )
        self.conv_post = norm_f(Conv1d(1024, 1, 3, 1, padding=1))

    def forward(self, x):
        fmap = []

        for l in self.convs:
            x = l(x)
            x = F.leaky_relu(x, modules.LRELU_SLOPE)
            fmap.append(x)
        x = self.conv_post(x)
        fmap.append(x)
        x = torch.flatten(x, 1, -1)

        return x, fmap


class MultiPeriodDiscriminator(torch.nn.Module):
    def __init__(self, use_spectral_norm=False):
        super(MultiPeriodDiscriminator, self).__init__()
        periods = [2, 3, 5, 7, 11]

        discs = [DiscriminatorS(use_spectral_norm=use_spectral_norm)]
        discs = discs + [DiscriminatorP(i, use_spectral_norm=use_spectral_norm) for i in periods]
        self.discriminators = nn.ModuleList(discs)

    def forward(self, y, y_hat):
        y_d_rs = []
        y_d_gs = []
        fmap_rs = []
        fmap_gs = []
        for i, d in enumerate(self.discriminators):
            y_d_r, fmap_r = d(y)
            y_d_g, fmap_g = d(y_hat)
            y_d_rs.append(y_d_r)
            y_d_gs.append(y_d_g)
            fmap_rs.append(fmap_r)
            fmap_gs.append(fmap_g)

        return y_d_rs, y_d_gs, fmap_rs, fmap_gs


class SpeakerEncoder(torch.nn.Module):
    def __init__(self, mel_n_channels=80, model_num_layers=3, model_hidden_size=256, model_embedding_size=256):
        super(SpeakerEncoder, self).__init__()
        self.lstm = nn.LSTM(mel_n_channels, model_hidden_size, model_num_layers, batch_first=True)
        self.linear = nn.Linear(model_hidden_size, model_embedding_size)
        self.relu = nn.ReLU()

    def forward(self, mels):
        self.lstm.flatten_parameters()
        _, (hidden, _) = self.lstm(mels)
        embeds_raw = self.relu(self.linear(hidden[-1]))
        return embeds_raw / torch.norm(embeds_raw, dim=1, keepdim=True)

    def compute_partial_slices(self, total_frames, partial_frames, partial_hop):
        mel_slices = []
        for i in range(0, total_frames - partial_frames, partial_hop):
            mel_range = torch.arange(i, i + partial_frames)
            mel_slices.append(mel_range)

        return mel_slices

    def embed_utterance(self, mel, partial_frames=128, partial_hop=64):
        mel_len = mel.size(1)
        last_mel = mel[:, -partial_frames:]

        if mel_len > partial_frames:
            mel_slices = self.compute_partial_slices(mel_len, partial_frames, partial_hop)
            mels = list(mel[:, s] for s in mel_slices)
            mels.append(last_mel)
            mels = torch.stack(tuple(mels), 0).squeeze(1)

            with torch.no_grad():
                partial_embeds = self(mels)
            embed = torch.mean(partial_embeds, axis=0).unsqueeze(0)
            # embed = embed / torch.linalg.norm(embed, 2)
        else:
            with torch.no_grad():
                embed = self(last_mel)

        return embed


@dataclass
class FreeVCAudioConfig(Coqpit):
    """Audio configuration

    Args:
        max_wav_value (float):
            The maximum value of the waveform.

        input_sample_rate (int):
            The sampling rate of the input waveform.

        output_sample_rate (int):
            The sampling rate of the output waveform.

        filter_length (int):
            The length of the filter.

        hop_length (int):
            The hop length.

        win_length (int):
            The window length.

        n_mel_channels (int):
            The number of mel channels.

        mel_fmin (float):
            The minimum frequency of the mel filterbank.

        mel_fmax (Optional[float]):
            The maximum frequency of the mel filterbank.
    """

    max_wav_value: float = field(default=32768.0)
    input_sample_rate: int = field(default=16000)
    output_sample_rate: int = field(default=24000)
    filter_length: int = field(default=1280)
    hop_length: int = field(default=320)
    win_length: int = field(default=1280)
    n_mel_channels: int = field(default=80)
    mel_fmin: float = field(default=0.0)
    mel_fmax: Optional[float] = field(default=None)


@dataclass
class FreeVCArgs(Coqpit):
    """FreeVC model arguments

    Args:
        spec_channels (int):
            The number of channels in the spectrogram.

        inter_channels (int):
            The number of channels in the intermediate layers.

        hidden_channels (int):
            The number of channels in the hidden layers.

        filter_channels (int):
            The number of channels in the filter layers.

        n_heads (int):
            The number of attention heads.

        n_layers (int):
            The number of layers.

        kernel_size (int):
            The size of the kernel.

        p_dropout (float):
            The dropout probability.

        resblock (str):
            The type of residual block.

        resblock_kernel_sizes (List[int]):
            The kernel sizes for the residual blocks.

        resblock_dilation_sizes (List[List[int]]):
            The dilation sizes for the residual blocks.

        upsample_rates (List[int]):
            The upsample rates.

        upsample_initial_channel (int):
            The number of channels in the initial upsample layer.

        upsample_kernel_sizes (List[int]):
            The kernel sizes for the upsample layers.

        n_layers_q (int):
            The number of layers in the quantization network.

        use_spectral_norm (bool):
            Whether to use spectral normalization.

        gin_channels (int):
            The number of channels in the global conditioning vector.

        ssl_dim (int):
            The dimension of the self-supervised learning embedding.

        use_spk (bool):
            Whether to use external speaker encoder.
    """

    spec_channels: int = field(default=641)
    inter_channels: int = field(default=192)
    hidden_channels: int = field(default=192)
    filter_channels: int = field(default=768)
    n_heads: int = field(default=2)
    n_layers: int = field(default=6)
    kernel_size: int = field(default=3)
    p_dropout: float = field(default=0.1)
    resblock: str = field(default="1")
    resblock_kernel_sizes: List[int] = field(default_factory=lambda: [3, 7, 11])
    resblock_dilation_sizes: List[List[int]] = field(default_factory=lambda: [[1, 3, 5], [1, 3, 5], [1, 3, 5]])
    upsample_rates: List[int] = field(default_factory=lambda: [10, 8, 2, 2])
    upsample_initial_channel: int = field(default=512)
    upsample_kernel_sizes: List[int] = field(default_factory=lambda: [16, 16, 4, 4])
    n_layers_q: int = field(default=3)
    use_spectral_norm: bool = field(default=False)
    gin_channels: int = field(default=256)
    ssl_dim: int = field(default=1024)
    use_spk: bool = field(default=False)
    num_spks: int = field(default=0)
    segment_size: int = field(default=8960)


class FreeVC(BaseVC):
    """

    Papaer::
        https://arxiv.org/abs/2210.15418#

    Paper Abstract::
        Voice conversion (VC) can be achieved by first extracting source content information and target speaker
        information, and then reconstructing waveform with these information. However, current approaches normally
        either extract dirty content information with speaker information leaked in, or demand a large amount of
        annotated data for training. Besides, the quality of reconstructed waveform can be degraded by the
        mismatch between conversion model and vocoder. In this paper, we adopt the end-to-end framework of VITS for
        high-quality waveform reconstruction, and propose strategies for clean content information extraction without
        text annotation. We disentangle content information by imposing an information bottleneck to WavLM features,
        and propose the spectrogram-resize based data augmentation to improve the purity of extracted content
        information. Experimental results show that the proposed method outperforms the latest VC models trained with
        annotated data and has greater robustness.

    Original Code::
        https://github.com/OlaWod/FreeVC

    Examples:
        >>> from TTS.vc.configs.freevc_config import FreeVCConfig
        >>> from TTS.vc.models.freevc import FreeVC
        >>> config = FreeVCConfig()
        >>> model = FreeVC(config)
    """

    def __init__(self, config: Coqpit, speaker_manager: SpeakerManager = None):
        super().__init__(config, None, speaker_manager, None)

        self.init_multispeaker(config)

        self.spec_channels = self.args.spec_channels
        self.inter_channels = self.args.inter_channels
        self.hidden_channels = self.args.hidden_channels
        self.filter_channels = self.args.filter_channels
        self.n_heads = self.args.n_heads
        self.n_layers = self.args.n_layers
        self.kernel_size = self.args.kernel_size
        self.p_dropout = self.args.p_dropout
        self.resblock = self.args.resblock
        self.resblock_kernel_sizes = self.args.resblock_kernel_sizes
        self.resblock_dilation_sizes = self.args.resblock_dilation_sizes
        self.upsample_rates = self.args.upsample_rates
        self.upsample_initial_channel = self.args.upsample_initial_channel
        self.upsample_kernel_sizes = self.args.upsample_kernel_sizes
        self.segment_size = self.args.segment_size
        self.gin_channels = self.args.gin_channels
        self.ssl_dim = self.args.ssl_dim
        self.use_spk = self.args.use_spk

        self.enc_p = Encoder(self.args.ssl_dim, self.inter_channels, self.hidden_channels, 5, 1, 16)
        self.dec = Generator(
            self.inter_channels,
            self.resblock,
            self.resblock_kernel_sizes,
            self.resblock_dilation_sizes,
            self.upsample_rates,
            self.upsample_initial_channel,
            self.upsample_kernel_sizes,
            gin_channels=self.gin_channels,
        )
        self.enc_q = Encoder(
            self.spec_channels, self.inter_channels, self.hidden_channels, 5, 1, 16, gin_channels=self.gin_channels
        )
        self.flow = ResidualCouplingBlock(
            self.inter_channels, self.hidden_channels, 5, 1, 4, gin_channels=self.gin_channels
        )
        if not self.use_spk:
            self.enc_spk = SpeakerEncoder(model_hidden_size=self.gin_channels, model_embedding_size=self.gin_channels)
        else:
            self.load_pretrained_speaker_encoder()

        self.wavlm = get_wavlm()

    @property
    def device(self):
        return next(self.parameters()).device

    def load_pretrained_speaker_encoder(self):
        """Load pretrained speaker encoder model as mentioned in the paper."""
        print(" > Loading pretrained speaker encoder model ...")
        self.enc_spk_ex = SpeakerEncoderEx(
            "https://github.com/coqui-ai/TTS/releases/download/v0.13.0_models/speaker_encoder.pt"
        )

    def init_multispeaker(self, config: Coqpit):
        """Initialize multi-speaker modules of a model. A model can be trained either with a speaker embedding layer
        or with external `d_vectors` computed from a speaker encoder model.

        You must provide a `speaker_manager` at initialization to set up the multi-speaker modules.

        Args:
            config (Coqpit): Model configuration.
            data (List, optional): Dataset items to infer number of speakers. Defaults to None.
        """
        self.num_spks = self.args.num_spks
        if self.speaker_manager:
            self.num_spks = self.speaker_manager.num_spks

    def forward(
        self,
        c: torch.Tensor,
        spec: torch.Tensor,
        g: Optional[torch.Tensor] = None,
        mel: Optional[torch.Tensor] = None,
        c_lengths: Optional[torch.Tensor] = None,
        spec_lengths: Optional[torch.Tensor] = None,
    ) -> Tuple[
        torch.Tensor,
        torch.Tensor,
        torch.Tensor,
        Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor],
    ]:
        """
        Forward pass of the model.

        Args:
            c: WavLM features. Shape: (batch_size, c_seq_len).
            spec: The input spectrogram. Shape: (batch_size, spec_seq_len, spec_dim).
            g: The speaker embedding. Shape: (batch_size, spk_emb_dim).
            mel: The input mel-spectrogram for the speaker encoder. Shape: (batch_size, mel_seq_len, mel_dim).
            c_lengths: The lengths of the WavLM features. Shape: (batch_size,).
            spec_lengths: The lengths of the spectrogram. Shape: (batch_size,).

        Returns:
            o: The output spectrogram. Shape: (batch_size, spec_seq_len, spec_dim).
            ids_slice: The slice indices. Shape: (batch_size, num_slices).
            spec_mask: The spectrogram mask. Shape: (batch_size, spec_seq_len).
            (z, z_p, m_p, logs_p, m_q, logs_q): A tuple of latent variables.
        """

        # If c_lengths is None, set it to the length of the last dimension of c
        if c_lengths is None:
            c_lengths = (torch.ones(c.size(0)) * c.size(-1)).to(c.device)

        # If spec_lengths is None, set it to the length of the last dimension of spec
        if spec_lengths is None:
            spec_lengths = (torch.ones(spec.size(0)) * spec.size(-1)).to(spec.device)

        # If use_spk is False, compute g from mel using enc_spk
        g = None
        if not self.use_spk:
            g = self.enc_spk(mel).unsqueeze(-1)

        # Compute m_p, logs_p, z, m_q, logs_q, and spec_mask using enc_p and enc_q
        _, m_p, logs_p, _ = self.enc_p(c, c_lengths)
        z, m_q, logs_q, spec_mask = self.enc_q(spec.transpose(1, 2), spec_lengths, g=g)

        # Compute z_p using flow
        z_p = self.flow(z, spec_mask, g=g)

        # Randomly slice z and compute o using dec
        z_slice, ids_slice = commons.rand_slice_segments(z, spec_lengths, self.segment_size)
        o = self.dec(z_slice, g=g)

        return o, ids_slice, spec_mask, (z, z_p, m_p, logs_p, m_q, logs_q)

    @torch.no_grad()
    def inference(self, c, g=None, mel=None, c_lengths=None):
        """
        Inference pass of the model

        Args:
            c (torch.Tensor): Input tensor. Shape: (batch_size, c_seq_len).
            g (torch.Tensor): Speaker embedding tensor. Shape: (batch_size, spk_emb_dim).
            mel (torch.Tensor): Mel-spectrogram tensor. Shape: (batch_size, mel_seq_len, mel_dim).
            c_lengths (torch.Tensor): Lengths of the input tensor. Shape: (batch_size,).

        Returns:
            torch.Tensor: Output tensor.
        """
        if c_lengths == None:
            c_lengths = (torch.ones(c.size(0)) * c.size(-1)).to(c.device)
        if not self.use_spk:
            g = self.enc_spk.embed_utterance(mel)
            g = g.unsqueeze(-1)
        z_p, m_p, logs_p, c_mask = self.enc_p(c, c_lengths)
        z = self.flow(z_p, c_mask, g=g, reverse=True)
        o = self.dec(z * c_mask, g=g)
        return o

    def extract_wavlm_features(self, y):
        """Extract WavLM features from an audio tensor.

        Args:
            y (torch.Tensor): Audio tensor. Shape: (batch_size, audio_seq_len).
        """

        with torch.no_grad():
            c = self.wavlm.extract_features(y)[0]
        c = c.transpose(1, 2)
        return c

    def load_audio(self, wav):
        """Read and format the input audio."""
        if isinstance(wav, str):
            wav, _ = librosa.load(wav, sr=self.config.audio.input_sample_rate)
        if isinstance(wav, np.ndarray):
            wav = torch.from_numpy(wav).to(self.device)
        if isinstance(wav, torch.Tensor):
            wav = wav.to(self.device)
        if isinstance(wav, list):
            wav = torch.from_numpy(np.array(wav)).to(self.device)
        return wav.float()

    @torch.inference_mode()
    def voice_conversion(self, src, tgt):
        """
        Voice conversion pass of the model.

        Args:
            src (str or torch.Tensor): Source utterance.
            tgt (str or torch.Tensor): Target utterance.

        Returns:
            torch.Tensor: Output tensor.
        """

        wav_tgt = self.load_audio(tgt).cpu().numpy()
        wav_tgt, _ = librosa.effects.trim(wav_tgt, top_db=20)

        if self.config.model_args.use_spk:
            g_tgt = self.enc_spk_ex.embed_utterance(wav_tgt)
            g_tgt = torch.from_numpy(g_tgt)[None, :, None].to(self.device)
        else:
            wav_tgt = torch.from_numpy(wav_tgt).unsqueeze(0).to(self.device)
            mel_tgt = mel_spectrogram_torch(
                wav_tgt,
                self.config.audio.filter_length,
                self.config.audio.n_mel_channels,
                self.config.audio.input_sample_rate,
                self.config.audio.hop_length,
                self.config.audio.win_length,
                self.config.audio.mel_fmin,
                self.config.audio.mel_fmax,
            )
        # src
        wav_src = self.load_audio(src)
        c = self.extract_wavlm_features(wav_src[None, :])

        if self.config.model_args.use_spk:
            audio = self.inference(c, g=g_tgt)
        else:
            audio = self.inference(c, mel=mel_tgt.transpose(1, 2))
        audio = audio[0][0].data.cpu().float().numpy()
        return audio

    def eval_step():
        ...

    @staticmethod
    def init_from_config(config: "VitsConfig", samples: Union[List[List], List[Dict]] = None, verbose=True):
        model = FreeVC(config)
        return model

    def load_checkpoint(self, config, checkpoint_path, eval=False, strict=True, cache=False):
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
        self.load_state_dict(state["model"], strict=strict)
        if eval:
            self.eval()

    def train_step():
        ...


@dataclass
class FreeVCConfig(BaseVCConfig):
    """Defines parameters for FreeVC End2End TTS model.

    Args:
        model (str):
            Model name. Do not change unless you know what you are doing.

        model_args (FreeVCArgs):
            Model architecture arguments. Defaults to `FreeVCArgs()`.

        audio (FreeVCAudioConfig):
            Audio processing configuration. Defaults to `FreeVCAudioConfig()`.

        grad_clip (List):
            Gradient clipping thresholds for each optimizer. Defaults to `[1000.0, 1000.0]`.

        lr_gen (float):
            Initial learning rate for the generator. Defaults to 0.0002.

        lr_disc (float):
            Initial learning rate for the discriminator. Defaults to 0.0002.

        lr_scheduler_gen (str):
            Name of the learning rate scheduler for the generator. One of the `torch.optim.lr_scheduler.*`. Defaults to
            `ExponentialLR`.

        lr_scheduler_gen_params (dict):
            Parameters for the learning rate scheduler of the generator. Defaults to `{'gamma': 0.999875, "last_epoch":-1}`.

        lr_scheduler_disc (str):
            Name of the learning rate scheduler for the discriminator. One of the `torch.optim.lr_scheduler.*`. Defaults to
            `ExponentialLR`.

        lr_scheduler_disc_params (dict):
            Parameters for the learning rate scheduler of the discriminator. Defaults to `{'gamma': 0.999875, "last_epoch":-1}`.

        scheduler_after_epoch (bool):
            If true, step the schedulers after each epoch else after each step. Defaults to `False`.

        optimizer (str):
            Name of the optimizer to use with both the generator and the discriminator networks. One of the
            `torch.optim.*`. Defaults to `AdamW`.

        kl_loss_alpha (float):
            Loss weight for KL loss. Defaults to 1.0.

        disc_loss_alpha (float):
            Loss weight for the discriminator loss. Defaults to 1.0.

        gen_loss_alpha (float):
            Loss weight for the generator loss. Defaults to 1.0.

        feat_loss_alpha (float):
            Loss weight for the feature matching loss. Defaults to 1.0.

        mel_loss_alpha (float):
            Loss weight for the mel loss. Defaults to 45.0.

        return_wav (bool):
            If true, data loader returns the waveform as well as the other outputs. Do not change. Defaults to `True`.

        compute_linear_spec (bool):
            If true, the linear spectrogram is computed and returned alongside the mel output. Do not change. Defaults to `True`.

        use_weighted_sampler (bool):
            If true, use weighted sampler with bucketing for balancing samples between datasets used in training. Defaults to `False`.

        weighted_sampler_attrs (dict):
            Key retuned by the formatter to be used for weighted sampler. For example `{"root_path": 2.0, "speaker_name": 1.0}` sets sample probabilities
            by overweighting `root_path` by 2.0. Defaults to `{}`.

        weighted_sampler_multipliers (dict):
            Weight each unique value of a key returned by the formatter for weighted sampling.
            For example `{"root_path":{"/raid/datasets/libritts-clean-16khz-bwe-coqui_44khz/LibriTTS/train-clean-100/":1.0, "/raid/datasets/libritts-clean-16khz-bwe-coqui_44khz/LibriTTS/train-clean-360/": 0.5}`.
            It will sample instances from `train-clean-100` 2 times more than `train-clean-360`. Defaults to `{}`.

        r (int):
            Number of spectrogram frames to be generated at a time. Do not change. Defaults to `1`.

        add_blank (bool):
            If true, a blank token is added in between every character. Defaults to `True`.

        test_sentences (List[List]):
            List of sentences with speaker and language information to be used for testing.

        language_ids_file (str):
            Path to the language ids file.

        use_language_embedding (bool):
            If true, language embedding is used. Defaults to `False`.

    Note:
        Check :class:`TTS.tts.configs.shared_configs.BaseTTSConfig` for the inherited parameters.

    Example:

        >>> from TTS.tts.configs.freevc_config import FreeVCConfig
        >>> config = FreeVCConfig()
    """

    model: str = "freevc"
    # model specific params
    model_args: FreeVCArgs = FreeVCArgs()
    audio: FreeVCAudioConfig = FreeVCAudioConfig()

    # optimizer
    # TODO with training support

    # loss params
    # TODO with training support

    # data loader params
    return_wav: bool = True
    compute_linear_spec: bool = True

    # sampler params
    use_weighted_sampler: bool = False  # TODO: move it to the base config
    weighted_sampler_attrs: dict = field(default_factory=lambda: {})
    weighted_sampler_multipliers: dict = field(default_factory=lambda: {})

    # overrides
    r: int = 1  # DO NOT CHANGE
    add_blank: bool = True

    # multi-speaker settings
    # use speaker embedding layer
    num_speakers: int = 0
    speakers_file: str = None
    speaker_embedding_channels: int = 256

    # use d-vectors
    use_d_vector_file: bool = False
    d_vector_file: List[str] = None
    d_vector_dim: int = None

    def __post_init__(self):
        for key, val in self.model_args.items():
            if hasattr(self, key):
                self[key] = val


================================================
FILE: TTS/vc/modules/__init__.py
================================================


================================================
FILE: TTS/vc/modules/freevc/__init__.py
================================================


================================================
FILE: TTS/vc/modules/freevc/commons.py
================================================
import math

import numpy as np
import torch
from torch import nn
from torch.nn import functional as F


def init_weights(m, mean=0.0, std=0.01):
    classname = m.__class__.__name__
    if classname.find("Conv") != -1:
        m.weight.data.normal_(mean, std)


def get_padding(kernel_size, dilation=1):
    return int((kernel_size * dilation - dilation) / 2)


def convert_pad_shape(pad_shape):
    l = pad_shape[::-1]
    pad_shape = [item for sublist in l for item in sublist]
    return pad_shape


def intersperse(lst, item):
    result = [item] * (len(lst) * 2 + 1)
    result[1::2] = lst
    return result


def kl_divergence(m_p, logs_p, m_q, logs_q):
    """KL(P||Q)"""
    kl = (logs_q - logs_p) - 0.5
    kl += 0.5 * (torch.exp(2.0 * logs_p) + ((m_p - m_q) ** 2)) * torch.exp(-2.0 * logs_q)
    return kl


def rand_gumbel(shape):
    """Sample from the Gumbel distribution, protect from overflows."""
    uniform_samples = torch.rand(shape) * 0.99998 + 0.00001
    return -torch.log(-torch.log(uniform_samples))


def rand_gumbel_like(x):
    g = rand_gumbel(x.size()).to(dtype=x.dtype, device=x.device)
    return g


def slice_segments(x, ids_str, segment_size=4):
    ret = torch.zeros_like(x[:, :, :segment_size])
    for i in range(x.size(0)):
        idx_str = ids_str[i]
        idx_end = idx_str + segment_size
        ret[i] = x[i, :, idx_str:idx_end]
    return ret


def rand_slice_segments(x, x_lengths=None, segment_size=4):
    b, d, t = x.size()
    if x_lengths is None:
        x_lengths = t
    ids_str_max = x_lengths - segment_size + 1
    ids_str = (torch.rand([b]).to(device=x.device) * ids_str_max).to(dtype=torch.long)
    ret = slice_segments(x, ids_str, segment_size)
    return ret, ids_str


def rand_spec_segments(x, x_lengths=None, segment_size=4):
    b, d, t = x.size()
    if x_lengths is None:
        x_lengths = t
    ids_str_max = x_lengths - segment_size
    ids_str = (torch.rand([b]).to(device=x.device) * ids_str_max).to(dtype=torch.long)
    ret = slice_segments(x, ids_str, segment_size)
    return ret, ids_str


def get_timing_signal_1d(length, channels, min_timescale=1.0, max_timescale=1.0e4):
    position = torch.arange(length, dtype=torch.float)
    num_timescales = channels // 2
    log_timescale_increment = math.log(float(max_timescale) / float(min_timescale)) / (num_timescales - 1)
    inv_timescales = min_timescale * torch.exp(
        torch.arange(num_timescales, dtype=torch.float) * -log_timescale_increment
    )
    scaled_time = position.unsqueeze(0) * inv_timescales.unsqueeze(1)
    signal = torch.cat([torch.sin(scaled_time), torch.cos(scaled_time)], 0)
    signal = F.pad(signal, [0, 0, 0, channels % 2])
    signal = signal.view(1, channels, length)
    return signal


def add_timing_signal_1d(x, min_timescale=1.0, max_timescale=1.0e4):
    b, channels, length = x.size()
    signal = get_timing_signal_1d(length, channels, min_timescale, max_timescale)
    return x + signal.to(dtype=x.dtype, device=x.device)


def cat_timing_signal_1d(x, min_timescale=1.0, max_timescale=1.0e4, axis=1):
    b, channels, length = x.size()
    signal = get_timing_signal_1d(length, channels, min_timescale, max_timescale)
    return torch.cat([x, signal.to(dtype=x.dtype, device=x.device)], axis)


def subsequent_mask(length):
    mask = torch.tril(torch.ones(length, length)).unsqueeze(0).unsqueeze(0)
    return mask


@torch.jit.script
def fused_add_tanh_sigmoid_multiply(input_a, input_b, n_channels):
    n_channels_int = n_channels[0]
    in_act = input_a + input_b
    t_act = torch.tanh(in_act[:, :n_channels_int, :])
    s_act = torch.sigmoid(in_act[:, n_channels_int:, :])
    acts = t_act * s_act
    return acts


def convert_pad_shape(pad_shape):
    l = pad_shape[::-1]
    pad_shape = [item for sublist in l for item in sublist]
    return pad_shape


def shift_1d(x):
    x = F.pad(x, convert_pad_shape([[0, 0], [0, 0], [1, 0]]))[:, :, :-1]
    return x


def sequence_mask(length, max_length=None):
    if max_length is None:
        max_length = length.max()
    x = torch.arange(max_length, dtype=length.dtype, device=length.device)
    return x.unsqueeze(0) < length.unsqueeze(1)


def generate_path(duration, mask):
    """
    duration: [b, 1, t_x]
    mask: [b, 1, t_y, t_x]
    """
    device = duration.device

    b, _, t_y, t_x = mask.shape
    cum_duration = torch.cumsum(duration, -1)

    cum_duration_flat = cum_duration.view(b * t_x)
    path = sequence_mask(cum_duration_flat, t_y).to(mask.dtype)
    path = path.view(b, t_x, t_y)
    path = path - F.pad(path, convert_pad_shape([[0, 0], [1, 0], [0, 0]]))[:, :-1]
    path = path.unsqueeze(1).transpose(2, 3) * mask
    return path


def clip_grad_value_(parameters, clip_value, norm_type=2):
    if isinstance(parameters, torch.Tensor):
        parameters = [parameters]
    parameters = list(filter(lambda p: p.grad is not None, parameters))
    norm_type = float(norm_type)
    if clip_value is not None:
        clip_value = float(clip_value)

    total_norm = 0
    for p in parameters:
        param_norm = p.grad.data.norm(norm_type)
        total_norm += param_norm.item() ** norm_type
        if clip_value is not None:
            p.grad.data.clamp_(min=-clip_value, max=clip_value)
    total_norm = total_norm ** (1.0 / norm_type)
    return total_norm


================================================
FILE: TTS/vc/modules/freevc/mel_processing.py
================================================
import torch
import torch.utils.data
from librosa.filters import mel as librosa_mel_fn

MAX_WAV_VALUE = 32768.0


def dynamic_range_compression_torch(x, C=1, clip_val=1e-5):
    """
    PARAMS
    ------
    C: compression factor
    """
    return torch.log(torch.clamp(x, min=clip_val) * C)


def dynamic_range_decompression_torch(x, C=1):
    """
    PARAMS
    ------
    C: compression factor used to compress
    """
    return torch.exp(x) / C


def spectral_normalize_torch(magnitudes):
    output = dynamic_range_compression_torch(magnitudes)
    return output


def spectral_de_normalize_torch(magnitudes):
    output = dynamic_range_decompression_torch(magnitudes)
    return output


mel_basis = {}
hann_window = {}


def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, center=False):
    if torch.min(y) < -1.0:
        print("min value is ", torch.min(y))
    if torch.max(y) > 1.0:
        print("max value is ", torch.max(y))

    global hann_window
    dtype_device = str(y.dtype) + "_" + str(y.device)
    wnsize_dtype_device = str(win_size) + "_" + dtype_device
    if wnsize_dtype_device not in hann_window:
        hann_window[wnsize_dtype_device] = torch.hann_window(win_size).to(dtype=y.dtype, device=y.device)

    y = torch.nn.functional.pad(
        y.unsqueeze(1), (int((n_fft - hop_size) / 2), int((n_fft - hop_size) / 2)), mode="reflect"
    )
    y = y.squeeze(1)

    spec = torch.stft(
        y,
        n_fft,
        hop_length=hop_size,
        win_length=win_size,
        window=hann_window[wnsize_dtype_device],
        center=center,
        pad_mode="reflect",
        normalized=False,
        onesided=True,
        return_complex=False,
    )

    spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-6)
    return spec


def spec_to_mel_torch(spec, n_fft, num_mels, sampling_rate, fmin, fmax):
    global mel_basis
    dtype_device = str(spec.dtype) + "_" + str(spec.device)
    fmax_dtype_device = str(fmax) + "_" + dtype_device
    if fmax_dtype_device not in mel_basis:
        mel = librosa_mel_fn(sr=sampling_rate, n_fft=n_fft, n_mels=num_mels, fmin=fmin, fmax=fmax)
        mel_basis[fmax_dtype_device] = torch.from_numpy(mel).to(dtype=spec.dtype, device=spec.device)
    spec = torch.matmul(mel_basis[fmax_dtype_device], spec)
    spec = spectral_normalize_torch(spec)
    return spec


def mel_spectrogram_torch(y, n_fft, num_mels, sampling_rate, hop_size, win_size, fmin, fmax, center=False):
    if torch.min(y) < -1.0:
        print("min value is ", torch.min(y))
    if torch.max(y) > 1.0:
        print("max value is ", torch.max(y))

    global mel_basis, hann_window
    dtype_device = str(y.dtype) + "_" + str(y.device)
    fmax_dtype_device = str(fmax) + "_" + dtype_device
    wnsize_dtype_device = str(win_size) + "_" + dtype_device
    if fmax_dtype_device not in mel_basis:
        mel = librosa_mel_fn(sr=sampling_rate, n_fft=n_fft, n_mels=num_mels, fmin=fmin, fmax=fmax)
        mel_basis[fmax_dtype_device] = torch.from_numpy(mel).to(dtype=y.dtype, device=y.device)
    if wnsize_dtype_device not in hann_window:
        hann_window[wnsize_dtype_device] = torch.hann_window(win_size).to(dtype=y.dtype, device=y.device)

    y = torch.nn.functional.pad(
        y.unsqueeze(1), (int((n_fft - hop_size) / 2), int((n_fft - hop_size) / 2)), mode="reflect"
    )
    y = y.squeeze(1)

    spec = torch.stft(
        y,
        n_fft,
        hop_length=hop_size,
        win_length=win_size,
        window=hann_window[wnsize_dtype_device],
        center=center,
        pad_mode="reflect",
        normalized=False,
        onesided=True,
        return_complex=False,
    )

    spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-6)

    spec = torch.matmul(mel_basis[fmax_dtype_device], spec)
    spec = spectral_normalize_torch(spec)

    return spec


================================================
FILE: TTS/vc/modules/freevc/modules.py
================================================
import copy
import math

import numpy as np
import scipy
import torch
from torch import nn
from torch.nn import AvgPool1d, Conv1d, Conv2d, ConvTranspose1d
from torch.nn import functional as F
from torch.nn.utils import remove_weight_norm, weight_norm

import TTS.vc.modules.freevc.commons as commons
from TTS.vc.modules.freevc.commons import get_padding, init_weights

LRELU_SLOPE = 0.1


class LayerNorm(nn.Module):
    def __init__(self, channels, eps=1e-5):
        super().__init__()
        self.channels = channels
        self.eps = eps

        self.gamma = nn.Parameter(torch.ones(channels))
        self.beta = nn.Parameter(torch.zeros(channels))

    def forward(self, x):
        x = x.transpose(1, -1)
        x = F.layer_norm(x, (self.channels,), self.gamma, self.beta, self.eps)
        return x.transpose(1, -1)


class ConvReluNorm(nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, kernel_size, n_layers, p_dropout):
        super().__init__()
        self.in_channels = in_channels
        self.hidden_channels = hidden_channels
        self.out_channels = out_channels
        self.kernel_size = kernel_size
        self.n_layers = n_layers
        self.p_dropout = p_dropout
        assert n_layers > 1, "Number of layers should be larger than 0."

        self.conv_layers = nn.ModuleList()
        self.norm_layers = nn.ModuleList()
        self.conv_layers.append(nn.Conv1d(in_channels, hidden_channels, kernel_size, padding=kernel_size // 2))
        self.norm_layers.append(LayerNorm(hidden_channels))
        self.relu_drop = nn.Sequential(nn.ReLU(), nn.Dropout(p_dropout))
        for _ in range(n_layers - 1):
            self.conv_layers.append(nn.Conv1d(hidden_channels, hidden_channels, kernel_size, padding=kernel_size // 2))
            self.norm_layers.append(LayerNorm(hidden_channels))
        self.proj = nn.Conv1d(hidden_channels, out_channels, 1)
        self.proj.weight.data.zero_()
        self.proj.bias.data.zero_()

    def forward(self, x, x_mask):
        x_org = x
        for i in range(self.n_layers):
            x = self.conv_layers[i](x * x_mask)
            x = self.norm_layers[i](x)
            x = self.relu_drop(x)
        x = x_org + self.proj(x)
        return x * x_mask


class DDSConv(nn.Module):
    """
    Dialted and Depth-Separable Convolution
    """

    def __init__(self, channels, kernel_size, n_layers, p_dropout=0.0):
        super().__init__()
        self.channels = channels
        self.kernel_size = kernel_size
        self.n_layers = n_layers
        self.p_dropout = p_dropout

        self.drop = nn.Dropout(p_dropout)
        self.convs_sep = nn.ModuleList()
        self.convs_1x1 = nn.ModuleList()
        self.norms_1 = nn.ModuleList()
        self.norms_2 = nn.ModuleList()
        for i in range(n_layers):
            dilation = kernel_size**i
            padding = (kernel_size * dilation - dilation) // 2
            self.convs_sep.append(
                nn.Conv1d(channels, channels, kernel_size, groups=channels, dilation=dilation, padding=padding)
            )
            self.convs_1x1.append(nn.Conv1d(channels, channels, 1))
            self.norms_1.append(LayerNorm(channels))
            self.norms_2.append(LayerNorm(channels))

    def forward(self, x, x_mask, g=None):
        if g is not None:
            x = x + g
        for i in range(self.n_layers):
            y = self.convs_sep[i](x * x_mask)
            y = self.norms_1[i](y)
            y = F.gelu(y)
            y = self.convs_1x1[i](y)
            y = self.norms_2[i](y)
            y = F.gelu(y)
            y = self.drop(y)
            x = x + y
        return x * x_mask


class WN(torch.nn.Module):
    def __init__(self, hidden_channels, kernel_size, dilation_rate, n_layers, gin_channels=0, p_dropout=0):
        super(WN, self).__init__()
        assert kernel_size % 2 == 1
        self.hidden_channels = hidden_channels
        self.kernel_size = (kernel_size,)
        self.dilation_rate = dilation_rate
        self.n_layers = n_layers
        self.gin_channels = gin_channels
        self.p_dropout = p_dropout

        self.in_layers = torch.nn.ModuleList()
        self.res_skip_layers = torch.nn.ModuleList()
        self.drop = nn.Dropout(p_dropout)

        if gin_channels != 0:
            cond_layer = torch.nn.Conv1d(gin_channels, 2 * hidden_channels * n_layers, 1)
            self.cond_layer = torch.nn.utils.weight_norm(cond_layer, name="weight")

        for i in range(n_layers):
            dilation = dilation_rate**i
            padding = int((kernel_size * dilation - dilation) / 2)
            in_layer = torch.nn.Conv1d(
                hidden_channels, 2 * hidden_channels, kernel_size, dilation=dilation, padding=padding
            )
            in_layer = torch.nn.utils.weight_norm(in_layer, name="weight")
            self.in_layers.append(in_layer)

            # last one is not necessary
            if i < n_layers - 1:
                res_skip_channels = 2 * hidden_channels
            else:
                res_skip_channels = hidden_channels

            res_skip_layer = torch.nn.Conv1d(hidden_channels, res_skip_channels, 1)
            res_skip_layer = torch.nn.utils.weight_norm(res_skip_layer, name="weight")
            self.res_skip_layers.append(res_skip_layer)

    def forward(self, x, x_mask, g=None, **kwargs):
        output = torch.zeros_like(x)
        n_channels_tensor = torch.IntTensor([self.hidden_channels])

        if g is not None:
            g = self.cond_layer(g)

        for i in range(self.n_layers):
            x_in = self.in_layers[i](x)
            if g is not None:
                cond_offset = i * 2 * self.hidden_channels
                g_l = g[:, cond_offset : cond_offset + 2 * self.hidden_channels, :]
            else:
                g_l = torch.zeros_like(x_in)

            acts = commons.fused_add_tanh_sigmoid_multiply(x_in, g_l, n_channels_tensor)
            acts = self.drop(acts)

            res_skip_acts = self.res_skip_layers[i](acts)
            if i < self.n_layers - 1:
                res_acts = res_skip_acts[:, : self.hidden_channels, :]
                x = (x + res_acts) * x_mask
                output = output + res_skip_acts[:, self.hidden_channels :, :]
            else:
                output = output + res_skip_acts
        return output * x_mask

    def remove_weight_norm(self):
        if self.gin_channels != 0:
            torch.nn.utils.remove_weight_norm(self.cond_layer)
        for l in self.in_layers:
            torch.nn.utils.remove_weight_norm(l)
        for l in self.res_skip_layers:
            torch.nn.utils.remove_weight_norm(l)


class ResBlock1(torch.nn.Module):
    def __init__(self, channels, kernel_size=3, dilation=(1, 3, 5)):
        super(ResBlock1, self).__init__()
        self.convs1 = nn.ModuleList(
            [
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[0],
                        padding=get_padding(kernel_size, dilation[0]),
                    )
                ),
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[1],
                        padding=get_padding(kernel_size, dilation[1]),
                    )
                ),
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[2],
                        padding=get_padding(kernel_size, dilation[2]),
                    )
                ),
            ]
        )
        self.convs1.apply(init_weights)

        self.convs2 = nn.ModuleList(
            [
                weight_norm(
                    Conv1d(channels, channels, kernel_size, 1, dilation=1, padding=get_padding(kernel_size, 1))
                ),
                weight_norm(
                    Conv1d(channels, channels, kernel_size, 1, dilation=1, padding=get_padding(kernel_size, 1))
                ),
                weight_norm(
                    Conv1d(channels, channels, kernel_size, 1, dilation=1, padding=get_padding(kernel_size, 1))
                ),
            ]
        )
        self.convs2.apply(init_weights)

    def forward(self, x, x_mask=None):
        for c1, c2 in zip(self.convs1, self.convs2):
            xt = F.leaky_relu(x, LRELU_SLOPE)
            if x_mask is not None:
                xt = xt * x_mask
            xt = c1(xt)
            xt = F.leaky_relu(xt, LRELU_SLOPE)
            if x_mask is not None:
                xt = xt * x_mask
            xt = c2(xt)
            x = xt + x
        if x_mask is not None:
            x = x * x_mask
        return x

    def remove_weight_norm(self):
        for l in self.convs1:
            remove_weight_norm(l)
        for l in self.convs2:
            remove_weight_norm(l)


class ResBlock2(torch.nn.Module):
    def __init__(self, channels, kernel_size=3, dilation=(1, 3)):
        super(ResBlock2, self).__init__()
        self.convs = nn.ModuleList(
            [
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[0],
                        padding=get_padding(kernel_size, dilation[0]),
                    )
                ),
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[1],
                        padding=get_padding(kernel_size, dilation[1]),
                    )
                ),
            ]
        )
        self.convs.apply(init_weights)

    def forward(self, x, x_mask=None):
        for c in self.convs:
            xt = F.leaky_relu(x, LRELU_SLOPE)
            if x_mask is not None:
                xt = xt * x_mask
            xt = c(xt)
            x = xt + x
        if x_mask is not None:
            x = x * x_mask
        return x

    def remove_weight_norm(self):
        for l in self.convs:
            remove_weight_norm(l)


class Log(nn.Module):
    def forward(self, x, x_mask, reverse=False, **kwargs):
        if not reverse:
            y = torch.log(torch.clamp_min(x, 1e-5)) * x_mask
            logdet = torch.sum(-y, [1, 2])
            return y, logdet
        else:
            x = torch.exp(x) * x_mask
            return x


class Flip(nn.Module):
    def forward(self, x, *args, reverse=False, **kwargs):
        x = torch.flip(x, [1])
        if not reverse:
            logdet = torch.zeros(x.size(0)).to(dtype=x.dtype, device=x.device)
            return x, logdet
        else:
            return x


class ElementwiseAffine(nn.Module):
    def __init__(self, channels):
        super().__init__()
        self.channels = channels
        self.m = nn.Parameter(torch.zeros(channels, 1))
        self.logs = nn.Parameter(torch.zeros(channels, 1))

    def forward(self, x, x_mask, reverse=False, **kwargs):
        if not reverse:
            y = self.m + torch.exp(self.logs) * x
            y = y * x_mask
            logdet = torch.sum(self.logs * x_mask, [1, 2])
            return y, logdet
        else:
            x = (x - self.m) * torch.exp(-self.logs) * x_mask
            return x


class ResidualCouplingLayer(nn.Module):
    def __init__(
        self,
        channels,
        hidden_channels,
        kernel_size,
        dilation_rate,
        n_layers,
        p_dropout=0,
        gin_channels=0,
        mean_only=False,
    ):
        assert channels % 2 == 0, "channels should be divisible by 2"
        super().__init__()
        self.channels = channels
        self.hidden_channels = hidden_channels
        self.kernel_size = kernel_size
        self.dilation_rate = dilation_rate
        self.n_layers = n_layers
        self.half_channels = channels // 2
        self.mean_only = mean_only

        self.pre = nn.Conv1d(self.half_channels, hidden_channels, 1)
        self.enc = WN(
            hidden_channels, kernel_size, dilation_rate, n_layers, p_dropout=p_dropout, gin_channels=gin_channels
        )
        self.post = nn.Conv1d(hidden_channels, self.half_channels * (2 - mean_only), 1)
        self.post.weight.data.zero_()
        self.post.bias.data.zero_()

    def forward(self, x, x_mask, g=None, reverse=False):
        x0, x1 = torch.split(x, [self.half_channels] * 2, 1)
        h = self.pre(x0) * x_mask
        h = self.enc(h, x_mask, g=g)
        stats = self.post(h) * x_mask
        if not self.mean_only:
            m, logs = torch.split(stats, [self.half_channels] * 2, 1)
        else:
            m = stats
            logs = torch.zeros_like(m)

        if not reverse:
            x1 = m + x1 * torch.exp(logs) * x_mask
            x = torch.cat([x0, x1], 1)
            logdet = torch.sum(logs, [1, 2])
            return x, logdet
        else:
            x1 = (x1 - m) * torch.exp(-logs) * x_mask
            x = torch.cat([x0, x1], 1)
            return x


================================================
FILE: TTS/vc/modules/freevc/speaker_encoder/__init__.py
================================================


================================================
FILE: TTS/vc/modules/freevc/speaker_encoder/audio.py
================================================
import struct
from pathlib import Path
from typing import Optional, Union

# import webrtcvad
import librosa
import numpy as np
from scipy.ndimage.morphology import binary_dilation

from TTS.vc.modules.freevc.speaker_encoder.hparams import *

int16_max = (2**15) - 1


def preprocess_wav(fpath_or_wav: Union[str, Path, np.ndarray], source_sr: Optional[int] = None):
    """
    Applies the preprocessing operations used in training the Speaker Encoder to a waveform
    either on disk or in memory. The waveform will be resampled to match the data hyperparameters.

    :param fpath_or_wav: either a filepath to an audio file (many extensions are supported, not
    just .wav), either the waveform as a numpy array of floats.
    :param source_sr: if passing an audio waveform, the sampling rate of the waveform before
    preprocessing. After preprocessing, the waveform's sampling rate will match the data
    hyperparameters. If passing a filepath, the sampling rate will be automatically detected and
    this argument will be ignored.
    """
    # Load the wav from disk if needed
    if isinstance(fpath_or_wav, str) or isinstance(fpath_or_wav, Path):
        wav, source_sr = librosa.load(fpath_or_wav, sr=None)
    else:
        wav = fpath_or_wav

    # Resample the wav if needed
    if source_sr is not None and source_sr != sampling_rate:
        wav = librosa.resample(wav, source_sr, sampling_rate)

    # Apply the preprocessing: normalize volume and shorten long silences
    wav = normalize_volume(wav, audio_norm_target_dBFS, increase_only=True)
    wav = trim_long_silences(wav)

    return wav


def wav_to_mel_spectrogram(wav):
    """
    Derives a mel spectrogram ready to be used by the encoder from a preprocessed audio waveform.
    Note: this not a log-mel spectrogram.
    """
    frames = librosa.feature.melspectrogram(
        y=wav,
        sr=sampling_rate,
        n_fft=int(sampling_rate * mel_window_length / 1000),
        hop_length=int(sampling_rate * mel_window_step / 1000),
        n_mels=mel_n_channels,
    )
    return frames.astype(np.float32).T


def normalize_volume(wav, target_dBFS, increase_only=False, decrease_only=False):
    if increase_only and decrease_only:
        raise ValueError("Both increase only and decrease only are set")
    dBFS_change = target_dBFS - 10 * np.log10(np.mean(wav**2))
    if (dBFS_change < 0 and increase_only) or (dBFS_change > 0 and decrease_only):
        return wav
    return wav * (10 ** (dBFS_change / 20))


================================================
FILE: TTS/vc/modules/freevc/speaker_encoder/hparams.py
================================================
## Mel-filterbank
mel_window_length = 25  # In milliseconds
mel_window_step = 10  # In milliseconds
mel_n_channels = 40


## Audio
sampling_rate = 16000
# Number of spectrogram frames in a partial utterance
partials_n_frames = 160  # 1600 ms


## Voice Activation Detection
# Window size of the VAD. Must be either 10, 20 or 30 milliseconds.
# This sets the granularity of the VAD. Should not need to be changed.
vad_window_length = 30  # In milliseconds
# Number of frames to average together when performing the moving average smoothing.
# The larger this value, the larger the VAD variations must be to not get smoothed out.
vad_moving_average_width = 8
# Maximum number of consecutive silent frames a segment can have.
vad_max_silence_length = 6


## Audio volume normalization
audio_norm_target_dBFS = -30


## Model parameters
model_hidden_size = 256
model_embedding_size = 256
model_num_layers = 3


================================================
FILE: TTS/vc/modules/freevc/speaker_encoder/speaker_encoder.py
================================================
from pathlib import Path
from time import perf_counter as timer
from typing import List, Union

import numpy as np
import torch
from torch import nn

from TTS.utils.io import load_fsspec
from TTS.vc.modules.freevc.speaker_encoder import audio
from TTS.vc.modules.freevc.speaker_encoder.hparams import *


class SpeakerEncoder(nn.Module):
    def __init__(self, weights_fpath, device: Union[str, torch.device] = None, verbose=True):
        """
        :param device: either a torch device or the name of a torch device (e.g. "cpu", "cuda").
        If None, defaults to cuda if it is available on your machine, otherwise the model will
        run on cpu. Outputs are always returned on the cpu, as numpy arrays.
        """
        super().__init__()

        # Define the network
        self.lstm = nn.LSTM(mel_n_channels, model_hidden_size, model_num_layers, batch_first=True)
        self.linear = nn.Linear(model_hidden_size, model_embedding_size)
        self.relu = nn.ReLU()

        # Get the target device
        if device is None:
            device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        elif isinstance(device, str):
            device = torch.device(device)
        self.device = device

        # Load the pretrained model'speaker weights
        # weights_fpath = Path(__file__).resolve().parent.joinpath("pretrained.pt")
        # if not weights_fpath.exists():
        #     raise Exception("Couldn't find the voice encoder pretrained model at %s." %
        #                     weights_fpath)

        start = timer()
        checkpoint = load_fsspec(weights_fpath, map_location="cpu")

        self.load_state_dict(checkpoint["model_state"], strict=False)
        self.to(device)

        if verbose:
            print("Loaded the voice encoder model on %s in %.2f seconds." % (device.type, timer() - start))

    def forward(self, mels: torch.FloatTensor):
        """
        Computes the embeddings of a batch of utterance spectrograms.
        :param mels: a batch of mel spectrograms of same duration as a float32 tensor of shape
        (batch_size, n_frames, n_channels)
        :return: the embeddings as a float 32 tensor of shape (batch_size, embedding_size).
        Embeddings are positive and L2-normed, thus they lay in the range [0, 1].
        """
        # Pass the input through the LSTM layers and retrieve the final hidden state of the last
        # layer. Apply a cutoff to 0 for negative values and L2 normalize the embeddings.
        _, (hidden, _) = self.lstm(mels)
        embeds_raw = self.relu(self.linear(hidden[-1]))
        return embeds_raw / torch.norm(embeds_raw, dim=1, keepdim=True)

    @staticmethod
    def compute_partial_slices(n_samples: int, rate, min_coverage):
        """
        Computes where to split an utterance waveform and its corresponding mel spectrogram to
        obtain partial utterances of <partials_n_frames> each. Both the waveform and the
        mel spectrogram slices are returned, so as to make each partial utterance waveform
        correspond to its spectrogram.

        The returned ranges may be indexing further than the length of the waveform. It is
        recommended that you pad the waveform with zeros up to wav_slices[-1].stop.

        :param n_samples: the number of samples in the waveform
        :param rate: how many partial utterances should occur per second. Partial utterances must
        cover the span of the entire utterance, thus the rate should not be lower than the inverse
        of the duration of a partial utterance. By default, partial utterances are 1.6s long and
        the minimum rate is thus 0.625.
        :param min_coverage: when reaching the last partial utterance, it may or may not have
        enough frames. If at least <min_pad_coverage> of <partials_n_frames> are present,
        then the last partial utterance will be considered by zero-padding the audio. Otherwise,
        it will be discarded. If there aren't enough frames for one partial utterance,
        this parameter is ignored so that the function always returns at least one slice.
        :return: the waveform slices and mel spectrogram slices as lists of array slices. Index
        respectively the waveform and the mel spectrogram with these slices to obtain the partial
        utterances.
        """
        assert 0 < min_coverage <= 1

        # Compute how many frames separate two partial utterances
        samples_per_frame = int((sampling_rate * mel_window_step / 1000))
        n_frames = int(np.ceil((n_samples + 1) / samples_per_frame))
        frame_step = int(np.round((sampling_rate / rate) / samples_per_frame))
        assert 0 < frame_step, "The rate is too high"
        assert frame_step <= partials_n_frames, "The rate is too low, it should be %f at least" % (
            sampling_rate / (samples_per_frame * partials_n_frames)
        )

        # Compute the slices
        wav_slices, mel_slices = [], []
        steps = max(1, n_frames - partials_n_frames + frame_step + 1)
        for i in range(0, steps, frame_step):
            mel_range = np.array([i, i + partials_n_frames])
            wav_range = mel_range * samples_per_frame
            mel_slices.append(slice(*mel_range))
            wav_slices.append(slice(*wav_range))

        # Evaluate whether extra padding is warranted or not
        last_wav_range = wav_slices[-1]
        coverage = (n_samples - last_wav_range.start) / (last_wav_range.stop - last_wav_range.start)
        if coverage < min_coverage and len(mel_slices) > 1:
            mel_slices = mel_slices[:-1]
            wav_slices = wav_slices[:-1]

        return wav_slices, mel_slices

    def embed_utterance(self, wav: np.ndarray, return_partials=False, rate=1.3, min_coverage=0.75):
        """
        Computes an embedding for a single utterance. The utterance is divided in partial
        utterances and an embedding is computed for each. The complete utterance embedding is the
        L2-normed average embedding of the partial utterances.

        TODO: independent batched version of this function

        :param wav: a preprocessed utterance waveform as a numpy array of float32
        :param return_partials: if True, the partial embeddings will also be returned along with
        the wav slices corresponding to each partial utterance.
        :param rate: how many partial utterances should occur per second. Partial utterances must
        cover the span of the entire utterance, thus the rate should not be lower than the inverse
        of the duration of a partial utterance. By default, partial utterances are 1.6s long and
        the minimum rate is thus 0.625.
        :param min_coverage: when reaching the last partial utterance, it may or may not have
        enough frames. If at least <min_pad_coverage> of <partials_n_frames> are present,
        then the last partial utterance will be considered by zero-padding the audio. Otherwise,
        it will be discarded. If there aren't enough frames for one partial utterance,
        this parameter is ignored so that the function always returns at least one slice.
        :return: the embedding as a numpy array of float32 of shape (model_embedding_size,). If
        <return_partials> is True, the partial utterances as a numpy array of float32 of shape
        (n_partials, model_embedding_size) and the wav partials as a list of slices will also be
        returned.
        """
        # Compute where to split the utterance into partials and pad the waveform with zeros if
        # the partial utterances cover a larger range.
        wav_slices, mel_slices = self.compute_partial_slices(len(wav), rate, min_coverage)
        max_wave_length = wav_slices[-1].stop
        if max_wave_length >= len(wav):
            wav = np.pad(wav, (0, max_wave_length - len(wav)), "constant")

        # Split the utterance into partials and forward them through the model
        mel = audio.wav_to_mel_spectrogram(wav)
        mels = np.array([mel[s] for s in mel_slices])
        with torch.no_grad():
            mels = torch.from_numpy(mels).to(self.device)
            partial_embeds = self(mels).cpu().numpy()

        # Compute the utterance embedding from the partial embeddings
        raw_embed = np.mean(partial_embeds, axis=0)
        embed = raw_embed / np.linalg.norm(raw_embed, 2)

        if return_partials:
            return embed, partial_embeds, wav_slices
        return embed

    def embed_speaker(self, wavs: List[np.ndarray], **kwargs):
        """
        Compute the embedding of a collection of wavs (presumably from the same speaker) by
        averaging their embedding and L2-normalizing it.

        :param wavs: list of wavs a numpy arrays of float32.
        :param kwargs: extra arguments to embed_utterance()
        :return: the embedding as a numpy array of float32 of shape (model_embedding_size,).
        """
        raw_embed = np.mean([self.embed_utterance(wav, return_partials=False, **kwargs) for wav in wavs], axis=0)
        return raw_embed / np.linalg.norm(raw_embed, 2)


================================================
FILE: TTS/vc/modules/freevc/wavlm/__init__.py
================================================
import os
import urllib.request

import torch

from TTS.utils.generic_utils import get_user_data_dir
from TTS.vc.modules.freevc.wavlm.wavlm import WavLM, WavLMConfig

model_uri = "https://github.com/coqui-ai/TTS/releases/download/v0.13.0_models/WavLM-Large.pt"


def get_wavlm(device="cpu"):
    """Download the model and return the model object."""

    output_path = get_user_data_dir("tts")

    output_path = os.path.join(output_path, "wavlm")
    if not os.path.exists(output_path):
        os.makedirs(output_path)

    output_path = os.path.join(output_path, "WavLM-Large.pt")
    if not os.path.exists(output_path):
        print(f" > Downloading WavLM model to {output_path} ...")
        urllib.request.urlretrieve(model_uri, output_path)

    checkpoint = torch.load(output_path, map_location=torch.device(device))
    cfg = WavLMConfig(checkpoint["cfg"])
    wavlm = WavLM(cfg).to(device)
    wavlm.load_state_dict(checkpoint["model"])
    wavlm.eval()
    return wavlm


if __name__ == "__main__":
    wavlm = get_wavlm()


================================================
FILE: TTS/vc/modules/freevc/wavlm/config.json
================================================
{
    "_name_or_path": "./wavlm-large/",
    "activation_dropout": 0.0,
    "adapter_kernel_size": 3,
    "adapter_stride": 2,
    "add_adapter": false,
    "apply_spec_augment": true,
    "architectures": [
      "WavLMModel"
    ],
    "attention_dropout": 0.1,
    "bos_token_id": 1,
    "classifier_proj_size": 256,
    "codevector_dim": 768,
    "contrastive_logits_temperature": 0.1,
    "conv_bias": false,
    "conv_dim": [
      512,
      512,
      512,
      512,
      512,
      512,
      512
    ],
    "conv_kernel": [
      10,
      3,
      3,
      3,
      3,
      2,
      2
    ],
    "conv_stride": [
      5,
      2,
      2,
      2,
      2,
      2,
      2
    ],
    "ctc_loss_reduction": "sum",
    "ctc_zero_infinity": false,
    "diversity_loss_weight": 0.1,
    "do_stable_layer_norm": true,
    "eos_token_id": 2,
    "feat_extract_activation": "gelu",
    "feat_extract_dropout": 0.0,
    "feat_extract_norm": "layer",
    "feat_proj_dropout": 0.1,
    "feat_quantizer_dropout": 0.0,
    "final_dropout": 0.0,
    "gradient_checkpointing": false,
    "hidden_act": "gelu",
    "hidden_dropout": 0.1,
    "hidden_size": 1024,
    "initializer_range": 0.02,
    "intermediate_size": 4096,
    "layer_norm_eps": 1e-05,
    "layerdrop": 0.1,
    "mask_channel_length": 10,
    "mask_channel_min_space": 1,
    "mask_channel_other": 0.0,
    "mask_channel_prob": 0.0,
    "mask_channel_selection": "static",
    "mask_feature_length": 10,
    "mask_feature_min_masks": 0,
    "mask_feature_prob": 0.0,
    "mask_time_length": 10,
    "mask_time_min_masks": 2,
    "mask_time_min_space": 1,
    "mask_time_other": 0.0,
    "mask_time_prob": 0.075,
    "mask_time_selection": "static",
    "max_bucket_distance": 800,
    "model_type": "wavlm",
    "num_adapter_layers": 3,
    "num_attention_heads": 16,
    "num_buckets": 320,
    "num_codevector_groups": 2,
    "num_codevectors_per_group": 320,
    "num_conv_pos_embedding_groups": 16,
    "num_conv_pos_embeddings": 128,
    "num_ctc_classes": 80,
    "num_feat_extract_layers": 7,
    "num_hidden_layers": 24,
    "num_negatives": 100,
    "output_hidden_size": 1024,
    "pad_token_id": 0,
    "proj_codevector_dim": 768,
    "replace_prob": 0.5,
    "tokenizer_class": "Wav2Vec2CTCTokenizer",
    "torch_dtype": "float32",
    "transformers_version": "4.15.0.dev0",
    "use_weighted_layer_sum": false,
    "vocab_size": 32
  }

================================================
FILE: TTS/vc/modules/freevc/wavlm/modules.py
================================================
# --------------------------------------------------------
# WavLM: Large-Scale Self-Supervised  Pre-training  for Full Stack Speech Processing (https://arxiv.org/abs/2110.13900.pdf)
# Github source: https://github.com/microsoft/unilm/tree/master/wavlm
# Copyright (c) 2021 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Based on fairseq code bases
# https://github.com/pytorch/fairseq
# --------------------------------------------------------

import math
import warnings
from typing import Dict, Optional, Tuple

import torch
import torch.nn.functional as F
from torch import Tensor, nn
from torch.nn import Parameter


class TransposeLast(nn.Module):
    def __init__(self, deconstruct_idx=None):
        super().__init__()
        self.deconstruct_idx = deconstruct_idx

    def forward(self, x):
        if self.deconstruct_idx is not None:
            x = x[self.deconstruct_idx]
        return x.transpose(-2, -1)


class Fp32LayerNorm(nn.LayerNorm):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def forward(self, input):
        output = F.layer_norm(
            input.float(),
            self.normalized_shape,
            self.weight.float() if self.weight is not None else None,
            self.bias.float() if self.bias is not None else None,
            self.eps,
        )
        return output.type_as(input)


class Fp32GroupNorm(nn.GroupNorm):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def forward(self, input):
        output = F.group_norm(
            input.float(),
            self.num_groups,
            self.weight.float() if self.weight is not None else None,
            self.bias.float() if self.bias is not None else None,
            self.eps,
        )
        return output.type_as(input)


class GradMultiply(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x, scale):
        ctx.scale = scale
        res = x.new(x)
        return res

    @staticmethod
    def backward(ctx, grad):
        return grad * ctx.scale, None


class SamePad(nn.Module):
    def __init__(self, kernel_size, causal=False):
        super().__init__()
        if causal:
            self.remove = kernel_size - 1
        else:
            self.remove = 1 if kernel_size % 2 == 0 else 0

    def forward(self, x):
        if self.remove > 0:
            x = x[:, :, : -self.remove]
        return x


class Swish(nn.Module):
    """Swish function"""

    def __init__(self):
        """Construct an MultiHeadedAttention object."""
        super(Swish, self).__init__()
        self.act = torch.nn.Sigmoid()

    def forward(self, x):
        return x * self.act(x)


class GLU_Linear(nn.Module):
    def __init__(self, input_dim, output_dim, glu_type="sigmoid", bias_in_glu=True):
        super(GLU_Linear, self).__init__()

        self.glu_type = glu_type
        self.output_dim = output_dim

        if glu_type == "sigmoid":
            self.glu_act = torch.nn.Sigmoid()
        elif glu_type == "swish":
            self.glu_act = Swish()
        elif glu_type == "relu":
            self.glu_act = torch.nn.ReLU()
        elif glu_type == "gelu":
            self.glu_act = torch.nn.GELU()

        if bias_in_glu:
            self.linear = nn.Linear(input_dim, output_dim * 2, True)
        else:
            self.linear = nn.Linear(input_dim, output_dim * 2, False)

    def forward(self, x):
        # to be consistent with GLU_Linear, we assume the input always has the #channel (#dim) in the last dimension of the tensor, so need to switch the dimension first for 1D-Conv case
        x = self.linear(x)

        if self.glu_type == "bilinear":
            x = x[:, :, 0 : self.output_dim] * x[:, :, self.output_dim : self.output_dim * 2]
        else:
            x = x[:, :, 0 : self.output_dim] * self.glu_act(x[:, :, self.output_dim : self.output_dim * 2])

        return x


def gelu_accurate(x):
    if not hasattr(gelu_accurate, "_a"):
        gelu_accurate._a = math.sqrt(2 / math.pi)
    return 0.5 * x * (1 + torch.tanh(gelu_accurate._a * (x + 0.044715 * torch.pow(x, 3))))


def gelu(x: torch.Tensor) -> torch.Tensor:
    return torch.nn.functional.gelu(x.float()).type_as(x)


def get_activation_fn(activation: str):
    """Returns the activation function corresponding to `activation`"""

    if activation == "relu":
        return F.relu
    elif activation == "gelu":
        return gelu
    elif activation == "gelu_fast":
        warnings.warn("--activation-fn=gelu_fast has been renamed to gelu_accurate")
        return gelu_accurate
    elif activation == "gelu_accurate":
        return gelu_accurate
    elif activation == "tanh":
        return torch.tanh
    elif activation == "linear":
        return lambda x: x
    elif activation == "glu":
        return lambda x: x
    else:
        raise RuntimeError("--activation-fn {} not supported".format(activation))


def init_bert_params(module):
    """
    Initialize the weights specific to the BERT Model.
    This overrides the default initializations depending on the specified arguments.
        1. If normal_init_linear_weights is set then weights of linear
           layer will be initialized using the normal distribution and
           bais will be set to the specified value.
        2. If normal_init_embed_weights is set then weights of embedding
           layer will be initialized using the normal distribution.
        3. If normal_init_proj_weights is set then weights of
           in_project_weight for MultiHeadAttention initialized using
           the normal distribution (to be validated).
    """

    def normal_(data):
        # with FSDP, module params will be on CUDA, so we cast them back to CPU
        # so that the RNG is consistent with and without FSDP
        data.copy_(data.cpu().normal_(mean=0.0, std=0.02).to(data.device))

    if isinstance(module, nn.Linear):
        normal_(module.weight.data)
        if module.bias is not None:
            module.bias.data.zero_()
    if isinstance(module, nn.Embedding):
        normal_(module.weight.data)
        if module.padding_idx is not None:
            module.weight.data[module.padding_idx].zero_()
    if isinstance(module, MultiheadAttention):
        normal_(module.q_proj.weight.data)
        normal_(module.k_proj.weight.data)
        normal_(module.v_proj.weight.data)


def quant_noise(module, p, block_size):
    """
    Wraps modules and applies quantization noise to the weights for
    subsequent quantization with Iterative Product Quantization as
    described in "Training with Quantization Noise for Extreme Model Compression"

    Args:
        - module: nn.Module
        - p: amount of Quantization Noise
        - block_size: size of the blocks for subsequent quantization with iPQ

    Remarks:
        - Module weights must have the right sizes wrt the block size
        - Only Linear, Embedding and Conv2d modules are supported for the moment
        - For more detail on how to quantize by blocks with convolutional weights,
          see "And the Bit Goes Down: Revisiting the Quantization of Neural Networks"
        - We implement the simplest form of noise here as stated in the paper
          which consists in randomly dropping blocks
    """

    # if no quantization noise, don't register hook
    if p <= 0:
        return module

    # supported modules
    assert isinstance(module, (nn.Linear, nn.Embedding, nn.Conv2d))

    # test whether module.weight has the right sizes wrt block_size
    is_conv = module.weight.ndim == 4

    # 2D matrix
    if not is_conv:
        assert module.weight.size(1) % block_size == 0, "Input features must be a multiple of block sizes"

    # 4D matrix
    else:
        # 1x1 convolutions
        if module.kernel_size == (1, 1):
            assert module.in_channels % block_size == 0, "Input channels must be a multiple of block sizes"
        # regular convolutions
        else:
            k = module.kernel_size[0] * module.kernel_size[1]
            assert k % block_size == 0, "Kernel size must be a multiple of block size"

    def _forward_pre_hook(mod, input):
        # no noise for evaluation
        if mod.training:
            if not is_conv:
                # gather weight and sizes
                weight = mod.weight
                in_features = weight.size(1)
                out_features = weight.size(0)

                # split weight matrix into blocks and randomly drop selected blocks
                mask = torch.zeros(in_features // block_size * out_features, device=weight.device)
                mask.bernoulli_(p)
                mask = mask.repeat_interleave(block_size, -1).view(-1, in_features)

            else:
                # gather weight and sizes
                weight = mod.weight
                in_channels = mod.in_channels
                out_channels = mod.out_channels

                # split weight matrix into blocks and randomly drop selected blocks
                if mod.kernel_size == (1, 1):
                    mask = torch.zeros(
                        int(in_channels // block_size * out_channels),
                        device=weight.device,
                    )
                    mask.bernoulli_(p)
                    mask = mask.repeat_interleave(block_size, -1).view(-1, in_channels)
                else:
                    mask = torch.zeros(weight.size(0), weight.size(1), device=weight.device)
                    mask.bernoulli_(p)
                    mask = mask.unsqueeze(2).unsqueeze(3).repeat(1, 1, mod.kernel_size[0], mod.kernel_size[1])

            # scale weights and apply mask
            mask = mask.to(torch.bool)  # x.bool() is not currently supported in TorchScript
            s = 1 / (1 - p)
            mod.weight.data = s * weight.masked_fill(mask, 0)

    module.register_forward_pre_hook(_forward_pre_hook)
    return module


class MultiheadAttention(nn.Module):
    """Multi-headed attention.

    See "Attention Is All You Need" for more details.
    """

    def __init__(
        self,
        embed_dim,
        num_heads,
        kdim=None,
        vdim=None,
        dropout=0.0,
        bias=True,
        add_bias_kv=False,
        add_zero_attn=False,
        self_attention=False,
        encoder_decoder_attention=False,
        q_noise=0.0,
        qn_block_size=8,
        has_relative_attention_bias=False,
        num_buckets=32,
        max_distance=128,
        gru_rel_pos=False,
        rescale_init=False,
    ):
        super().__init__()
        self.embed_dim = embed_dim
        self.kdim = kdim if kdim is not None else embed_dim
        self.vdim = vdim if vdim is not None else embed_dim
        self.qkv_same_dim = self.kdim == embed_dim and self.vdim == embed_dim

        self.num_heads = num_heads
        self.dropout_module = nn.Dropout(dropout)

        self.has_relative_attention_bias = has_relative_attention_bias
        self.num_buckets = num_buckets
        self.max_distance = max_distance
        if self.has_relative_attention_bias:
            self.relative_attention_bias = nn.Embedding(num_buckets, num_heads)

        self.head_dim = embed_dim // num_heads
        self.q_head_dim = self.head_dim
        self.k_head_dim = self.head_dim
        assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads"
        self.scaling = self.head_dim**-0.5

        self.self_attention = self_attention
        self.encoder_decoder_attention = encoder_decoder_attention

        assert not self.self_attention or self.qkv_same_dim, (
            "Self-attention requires query, key and " "value to be of the same size"
        )

        k_bias = True
        if rescale_init:
            k_bias = False

        k_embed_dim = embed_dim
        q_embed_dim = embed_dim

        self.k_proj = quant_noise(nn.Linear(self.kdim, k_embed_dim, bias=k_bias), q_noise, qn_block_size)
        self.v_proj = quant_noise(nn.Linear(self.vdim, embed_dim, bias=bias), q_noise, qn_block_size)
        self.q_proj = quant_noise(nn.Linear(embed_dim, q_embed_dim, bias=bias), q_noise, qn_block_size)

        self.out_proj = quant_noise(nn.Linear(embed_dim, embed_dim, bias=bias), q_noise, qn_block_size)

        if add_bias_kv:
            self.bias_k = Parameter(torch.Tensor(1, 1, embed_dim))
            self.bias_v = Parameter(torch.Tensor(1, 1, embed_dim))
        else:
            self.bias_k = self.bias_v = None

        self.add_zero_attn = add_zero_attn

        self.gru_rel_pos = gru_rel_pos
        if self.gru_rel_pos:
            self.grep_linear = nn.Linear(self.q_head_dim, 8)
            self.grep_a = nn.Parameter(torch.ones(1, num_heads, 1, 1))

        self.reset_parameters()

    def reset_parameters(self):
        if self.qkv_same_dim:
            # Empirically observed the convergence to be much better with
            # the scaled initialization
            nn.init.xavier_uniform_(self.k_proj.weight, gain=1 / math.sqrt(2))
            nn.init.xavier_uniform_(self.v_proj.weight, gain=1 / math.sqrt(2))
            nn.init.xavier_uniform_(self.q_proj.weight, gain=1 / math.sqrt(2))
        else:
            nn.init.xavier_uniform_(self.k_proj.weight)
            nn.init.xavier_uniform_(self.v_proj.weight)
            nn.init.xavier_uniform_(self.q_proj.weight)

        nn.init.xavier_uniform_(self.out_proj.weight)
        if self.out_proj.bias is not None:
            nn.init.constant_(self.out_proj.bias, 0.0)
        if self.bias_k is not None:
            nn.init.xavier_normal_(self.bias_k)
        if self.bias_v is not None:
            nn.init.xavier_normal_(self.bias_v)
        if self.has_relative_attention_bias:
            nn.init.xavier_normal_(self.relative_attention_bias.weight)

    def _relative_positions_bucket(self, relative_positions, bidirectional=True):
        num_buckets = self.num_buckets
        max_distance = self.max_distance
        relative_buckets = 0

        if bidirectional:
            num_buckets = num_buckets // 2
            relative_buckets += (relative_positions > 0).to(torch.long) * num_buckets
            relative_positions = torch.abs(relative_positions)
        else:
            relative_positions = -torch.min(relative_positions, torch.zeros_like(relative_positions))

        max_exact = num_buckets // 2
        is_small = relative_positions < max_exact

        relative_postion_if_large = max_exact + (
            torch.log(relative_positions.float() / max_exact)
            / math.log(max_distance / max_exact)
            * (num_buckets - max_exact)
        ).to(torch.long)
        relative_postion_if_large = torch.min(
            relative_postion_if_large, torch.full_like(relative_postion_if_large, num_buckets - 1)
        )

        relative_buckets += torch.where(is_small, relative_positions, relative_postion_if_large)
        return relative_buckets

    def compute_bias(self, query_length, key_length):
        context_position = torch.arange(query_length, dtype=torch.long)[:, None]
        memory_position = torch.arange(key_length, dtype=torch.long)[None, :]
        relative_position = memory_position - context_position
        relative_position_bucket = self._relative_positions_bucket(relative_position, bidirectional=True)
        relative_position_bucket = relative_position_bucket.to(self.relative_attention_bias.weight.device)
        values = self.relative_attention_bias(relative_position_bucket)
        values = values.permute([2, 0, 1])
        return values

    def forward(
        self,
        query,
        key: Optional[Tensor],
        value: Optional[Tensor],
        key_padding_mask: Optional[Tensor] = None,
        incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]] = None,
        need_weights: bool = True,
        static_kv: bool = False,
        attn_mask: Optional[Tensor] = None,
        before_softmax: bool = False,
        need_head_weights: bool = False,
        position_bias: Optional[Tensor] = None,
    ) -> Tuple[Tensor, Optional[Tensor], Optional[Tensor]]:
        """Input shape: Time x Batch x Channel

        Args:
            key_padding_mask (ByteTensor, optional): mask to exclude
                keys that are pads, of shape `(batch, src_len)`, where
                padding elements are indicated by 1s.
            need_weights (bool, optional): return the attention weights,
                averaged over heads (default: False).
            attn_mask (ByteTensor, optional): typically used to
                implement causal attention, where the mask prevents the
                attention from looking forward in time (default: None).
            before_softmax (bool, optional): return the raw attention
                weights and values before the attention softmax.
            need_head_weights (bool, optional): return the attention
                weights for each head. Implies *need_weights*. Default:
                return the average attention weights over all heads.
        """
        if need_head_weights:
            need_weights = True

        is_tpu = query.device.type == "xla"

        tgt_len, bsz, embed_dim = query.size()
        src_len = tgt_len
        assert embed_dim == self.embed_dim
        assert list(query.size()) == [tgt_len, bsz, embed_dim]
        if key is not None:
            src_len, key_bsz, _ = key.size()
            if not torch.jit.is_scripting():
                assert key_bsz == bsz
                assert value is not None
                assert src_len, bsz == value.shape[:2]

        if self.has_relative_attention_bias and position_bias is None:
            position_bias = self.compute_bias(tgt_len, src_len)
            position_bias = position_bias.unsqueeze(0).repeat(bsz, 1, 1, 1).view(bsz * self.num_heads, tgt_len, src_len)

        if (
            not is_tpu  # don't use PyTorch version on TPUs
            and incremental_state is None
            and not static_kv
            # A workaround for quantization to work. Otherwise JIT compilation
            # treats bias in linear module as method.
            and not torch.jit.is_scripting()
            and self.q_head_dim == self.head_dim
        ):
            assert key is not None and value is not None
            assert attn_mask is None

            attn_mask_rel_pos = None
            if position_bias is not None:
                attn_mask_rel_pos = position_bias
                if self.gru_rel_pos:
                    query_layer = query.transpose(0, 1)
                    new_x_shape = query_layer.size()[:-1] + (self.num_heads, -1)
                    query_layer = query_layer.view(*new_x_shape)
                    query_layer = query_layer.permute(0, 2, 1, 3)
                    _B, _H, _L, __ = query_layer.size()

                    gate_a, gate_b = torch.sigmoid(
                        self.grep_linear(query_layer).view(_B, _H, _L, 2, 4).sum(-1, keepdim=False)
                    ).chunk(2, dim=-1)
                    gate_a_1 = gate_a * (gate_b * self.grep_a - 1.0) + 2.0
                    attn_mask_rel_pos = gate_a_1.view(bsz * self.num_heads, -1, 1) * position_bias

                attn_mask_rel_pos = attn_mask_rel_pos.view((-1, tgt_len, tgt_len))
            k_proj_bias = self.k_proj.bias
            if k_proj_bias is None:
                k_proj_bias = torch.zeros_like(self.q_proj.bias)

            x, attn = F.multi_head_attention_forward(
                query,
                key,
                value,
                self.embed_dim,
                self.num_heads,
                torch.empty([0]),
                torch.cat((self.q_proj.bias, self.k_proj.bias, self.v_proj.bias)),
                self.bias_k,
                self.bias_v,
                self.add_zero_attn,
                self.dropout_module.p,
                self.out_proj.weight,
                self.out_proj.bias,
                self.training,
                # self.training or self.dropout_module.apply_during_inference,
                key_padding_mask,
                need_weights,
                attn_mask_rel_pos,
                use_separate_proj_weight=True,
                q_proj_weight=self.q_proj.weight,
                k_proj_weight=self.k_proj.weight,
                v_proj_weight=self.v_proj.weight,
            )
            return x, attn, position_bias

        if incremental_state is not None:
            saved_state = self._get_input_buffer(incremental_state)
            if saved_state is not None and "prev_key" in saved_state:
                # previous time steps are cached - no need to recompute
                # key and value if they are static
                if static_kv:
                    assert self.encoder_decoder_attention and not self.self_attention
                    key = value = None
        else:
            saved_state = None

        if self.self_attention:
            q = self.q_proj(query)
            k = self.k_proj(query)
            v = self.v_proj(query)
        elif self.encoder_decoder_attention:
            # encoder-decoder attention
            q = self.q_proj(query)
            if key is None:
                assert value is None
                k = v = None
            else:
                k = self.k_proj(key)
                v = self.v_proj(key)

        else:
            assert key is not None and value is not None
            q = self.q_proj(query)
            k = self.k_proj(key)
            v = self.v_proj(value)
        q *= self.scaling

        if self.bias_k is not None:
            assert self.bias_v is not None
            k = torch.cat([k, self.bias_k.repeat(1, bsz, 1)])
            v = torch.cat([v, self.bias_v.repeat(1, bsz, 1)])
            if attn_mask is not None:
                attn_mask = torch.cat([attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1)
            if key_padding_mask is not None:
                key_padding_mask = torch.cat(
                    [
                        key_padding_mask,
                        key_padding_mask.new_zeros(key_padding_mask.size(0), 1),
                    ],
                    dim=1,
                )

        q = q.contiguous().view(tgt_len, bsz * self.num_heads, self.q_head_dim).transpose(0, 1)
        if k is not None:
            k = k.contiguous().view(-1, bsz * self.num_heads, self.k_head_dim).transpose(0, 1)
        if v is not None:
            v = v.contiguous().view(-1, bsz * self.num_heads, self.head_dim).transpose(0, 1)

        if saved_state is not None:
            # saved states are stored with shape (bsz, num_heads, seq_len, head_dim)
            if "prev_key" in saved_state:
                _prev_key = saved_state["prev_key"]
                assert _prev_key is not None
                prev_key = _prev_key.view(bsz * self.num_heads, -1, self.head_dim)
                if static_kv:
                    k = prev_key
                else:
                    assert k is not None
                    k = torch.cat([prev_key, k], dim=1)
                src_len = k.size(1)
            if "prev_value" in saved_state:
                _prev_value = saved_state["prev_value"]
                assert _prev_value is not None
                prev_value = _prev_value.view(bsz * self.num_heads, -1, self.head_dim)
                if static_kv:
                    v = prev_value
                else:
                    assert v is not None
                    v = torch.cat([prev_value, v], dim=1)
            prev_key_padding_mask: Optional[Tensor] = None
            if "prev_key_padding_mask" in saved_state:
                prev_key_padding_mask = saved_state["prev_key_padding_mask"]
            assert k is not None and v is not None
            key_padding_mask = MultiheadAttention._append_prev_key_padding_mask(
                key_padding_mask=key_padding_mask,
                prev_key_padding_mask=prev_key_padding_mask,
                batch_size=bsz,
                src_len=k.size(1),
                static_kv=static_kv,
            )

            saved_state["prev_key"] = k.view(bsz, self.num_heads, -1, self.head_dim)
            saved_state["prev_value"] = v.view(bsz, self.num_heads, -1, self.head_dim)
            saved_state["prev_key_padding_mask"] = key_padding_mask
            # In this branch incremental_state is never None
            assert incremental_state is not None
            incremental_state = self._set_input_buffer(incremental_state, saved_state)
        assert k is not None
        assert k.size(1) == src_len

        # This is part of a workaround to get around fork/join parallelism
        # not supporting Optional types.
        if key_padding_mask is not None and key_padding_mask.dim() == 0:
            key_padding_mask = None

        if key_padding_mask is not None:
            assert key_padding_mask.size(0) == bsz
            assert key_padding_mask.size(1) == src_len

        if self.add_zero_attn:
            assert v is not None
            src_len += 1
            k = torch.cat([k, k.new_zeros((k.size(0), 1) + k.size()[2:])], dim=1)
            v = torch.cat([v, v.new_zeros((v.size(0), 1) + v.size()[2:])], dim=1)
            if attn_mask is not None:
                attn_mask = torch.cat([attn_mask, attn_mask.new_zeros(attn_mask.size(0), 1)], dim=1)
            if key_padding_mask is not None:
                key_padding_mask = torch.cat(
                    [
                        key_padding_mask,
                        torch.zeros(key_padding_mask.size(0), 1).type_as(key_padding_mask),
                    ],
                    dim=1,
                )

        attn_weights = torch.bmm(q, k.transpose(1, 2))
        attn_weights = self.apply_sparse_mask(attn_weights, tgt_len, src_len, bsz)

        assert list(attn_weights.size()) == [bsz * self.num_heads, tgt_len, src_len]

        if attn_mask is not None:
            attn_mask = attn_mask.unsqueeze(0)
            attn_weights += attn_mask

        if key_padding_mask is not None:
            # don't attend to padding symbols
            attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len)
            if not is_tpu:
                attn_weights = attn_weights.masked_fill(
                    key_padding_mask.unsqueeze(1).unsqueeze(2).to(torch.bool),
                    float("-inf"),
                )
            else:
                attn_weights = attn_weights.transpose(0, 2)
                attn_weights = attn_weights.masked_fill(key_padding_mask, float("-inf"))
                attn_weights = attn_weights.transpose(0, 2)
            attn_weights = attn_weights.view(bsz * self.num_heads, tgt_len, src_len)

        if before_softmax:
            return attn_weights, v, position_bias

        if position_bias is not None:
            if self.gru_rel_pos == 1:
                query_layer = q.view(bsz, self.num_heads, tgt_len, self.q_head_dim)
                _B, _H, _L, __ = query_layer.size()
                gate_a, gate_b = torch.sigmoid(
                    self.grep_linear(query_layer).view(_B, _H, _L, 2, 4).sum(-1, keepdim=False)
                ).chunk(2, dim=-1)
                gate_a_1 = gate_a * (gate_b * self.grep_a - 1.0) + 2.0
                position_bias = gate_a_1.view(bsz * self.num_heads, -1, 1) * position_bias

            position_bias = position_bias.view(attn_weights.size())

            attn_weights = attn_weights + position_bias

        attn_weights_float = F.softmax(attn_weights, dim=-1)
        attn_weights = attn_weights_float.type_as(attn_weights)
        attn_probs = self.dropout_module(attn_weights)

        assert v is not None
        attn = torch.bmm(attn_probs, v)
        assert list(attn.size()) == [bsz * self.num_heads, tgt_len, self.head_dim]
        attn = attn.transpose(0, 1).contiguous().view(tgt_len, bsz, embed_dim)
        attn = self.out_proj(attn)
        attn_weights: Optional[Tensor] = None
        if need_weights:
            attn_weights = attn_weights_float.view(bsz, self.num_heads, tgt_len, src_len).transpose(1, 0)
            if not need_head_weights:
                # average attention weights over heads
                attn_weights = attn_weights.mean(dim=0)

        return attn, attn_weights, position_bias

    @staticmethod
    def _append_prev_key_padding_mask(
        key_padding_mask: Optional[Tensor],
        prev_key_padding_mask: Optional[Tensor],
        batch_size: int,
        src_len: int,
        static_kv: bool,
    ) -> Optional[Tensor]:
        # saved key padding masks have shape (bsz, seq_len)
        if prev_key_padding_mask is not None and static_kv:
            new_key_padding_mask = prev_key_padding_mask
        elif prev_key_padding_mask is not None and key_padding_mask is not None:
            new_key_padding_mask = torch.cat([prev_key_padding_mask.float(), key_padding_mask.float()], dim=1)
        # During incremental decoding, as the padding token enters and
        # leaves the frame, there will be a time when prev or current
        # is None
        elif prev_key_padding_mask is not None:
            if src_len > prev_key_padding_mask.size(1):
                filler = torch.zeros(
                    (batch_size, src_len - prev_key_padding_mask.size(1)),
                    device=prev_key_padding_mask.device,
                )
                new_key_padding_mask = torch.cat([prev_key_padding_mask.float(), filler.float()], dim=1)
            else:
                new_key_padding_mask = prev_key_padding_mask.float()
        elif key_padding_mask is not None:
            if src_len > key_padding_mask.size(1):
                filler = torch.zeros(
                    (batch_size, src_len - key_padding_mask.size(1)),
                    device=key_padding_mask.device,
                )
                new_key_padding_mask = torch.cat([filler.float(), key_padding_mask.float()], dim=1)
            else:
                new_key_padding_mask = key_padding_mask.float()
        else:
            new_key_padding_mask = prev_key_padding_mask
        return new_key_padding_mask

    def _get_input_buffer(
        self, incremental_state: Optional[Dict[str, Dict[str, Optional[Tensor]]]]
    ) -> Dict[str, Optional[Tensor]]:
        result = self.get_incremental_state(incremental_state, "attn_state")
        if result is not None:
            return result
        else:
            empty_result: Dict[str, Optional[Tensor]] = {}
            return empty_result

    def _set_input_buffer(
        self,
        incremental_state: Dict[str, Dict[str, Optional[Tensor]]],
        buffer: Dict[str, Optional[Tensor]],
    ):
        return self.set_incremental_state(incremental_state, "attn_state", buffer)

    def apply_sparse_mask(self, attn_weights, tgt_len: int, src_len: int, bsz: int):
        return attn_weights


================================================
FILE: TTS/vc/modules/freevc/wavlm/wavlm.py
================================================
# --------------------------------------------------------
# WavLM: Large-Scale Self-Supervised  Pre-training  for Full Stack Speech Processing (https://arxiv.org/abs/2110.13900.pdf)
# Github source: https://github.com/microsoft/unilm/tree/master/wavlm
# Copyright (c) 2021 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Based on fairseq code bases
# https://github.com/pytorch/fairseq
# --------------------------------------------------------

import logging
import math
from typing import List, Optional, Tuple

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import LayerNorm

from TTS.vc.modules.freevc.wavlm.modules import (
    Fp32GroupNorm,
    Fp32LayerNorm,
    GLU_Linear,
    GradMultiply,
    MultiheadAttention,
    SamePad,
    TransposeLast,
    get_activation_fn,
    init_bert_params,
)

logger = logging.getLogger(__name__)


def compute_mask_indices(
    shape: Tuple[int, int],
    padding_mask: Optional[torch.Tensor],
    mask_prob: float,
    mask_length: int,
    mask_type: str = "static",
    mask_other: float = 0.0,
    min_masks: int = 0,
    no_overlap: bool = False,
    min_space: int = 0,
) -> np.ndarray:
    """
    Computes random mask spans for a given shape

    Args:
        shape: the the shape for which to compute masks.
            should be of size 2 where first element is batch size and 2nd is timesteps
        padding_mask: optional padding mask of the same size as shape, which will prevent masking padded elements
        mask_prob: probability for each token to be chosen as start of the span to be masked. this will be multiplied by
            number of timesteps divided by length of mask span to mask approximately this percentage of all elements.
            however due to overlaps, the actual number will be smaller (unless no_overlap is True)
        mask_type: how to compute mask lengths
            static = fixed size
            uniform = sample from uniform distribution [mask_other, mask_length*2]
            normal = sample from normal distribution with mean mask_length and stdev mask_other. mask is min 1 element
            poisson = sample from possion distribution with lambda = mask length
        min_masks: minimum number of masked spans
        no_overlap: if false, will switch to an alternative recursive algorithm that prevents spans from overlapping
        min_space: only used if no_overlap is True, this is how many elements to keep unmasked between spans
    """

    bsz, all_sz = shape
    mask = np.full((bsz, all_sz), False)

    all_num_mask = int(
        # add a random number for probabilistic rounding
        mask_prob * all_sz / float(mask_length)
        + np.random.rand()
    )

    all_num_mask = max(min_masks, all_num_mask)

    mask_idcs = []
    for i in range(bsz):
        if padding_mask is not None:
            sz = all_sz - padding_mask[i].long().sum().item()
            num_mask = int(
                # add a random number for probabilistic rounding
                mask_prob * sz / float(mask_length)
                + np.random.rand()
            )
            num_mask = max(min_masks, num_mask)
        else:
            sz = all_sz
            num_mask = all_num_mask

        if mask_type == "static":
            lengths = np.full(num_mask, mask_length)
        elif mask_type == "uniform":
            lengths = np.random.randint(mask_other, mask_length * 2 + 1, size=num_mask)
        elif mask_type == "normal":
            lengths = np.random.normal(mask_length, mask_other, size=num_mask)
            lengths = [max(1, int(round(x))) for x in lengths]
        elif mask_type == "poisson":
            lengths = np.random.poisson(mask_length, size=num_mask)
            lengths = [int(round(x)) for x in lengths]
        else:
            raise Exception("unknown mask selection " + mask_type)

        if sum(lengths) == 0:
            lengths[0] = min(mask_length, sz - 1)

        if no_overlap:
            mask_idc = []

            def arrange(s, e, length, keep_length):
                span_start = np.random.randint(s, e - length)
                mask_idc.extend(span_start + i for i in range(length))

                new_parts = []
                if span_start - s - min_space >= keep_length:
                    new_parts.append((s, span_start - min_space + 1))
                if e - span_start - keep_length - min_space > keep_length:
                    new_parts.append((span_start + length + min_space, e))
                return new_parts

            parts = [(0, sz)]
            min_length = min(lengths)
            for length in sorted(lengths, reverse=True):
                lens = np.fromiter(
                    (e - s if e - s >= length + min_space else 0 for s, e in parts),
                    np.int,
                )
                l_sum = np.sum(lens)
                if l_sum == 0:
                    break
                probs = lens / np.sum(lens)
                c = np.random.choice(len(parts), p=probs)
                s, e = parts.pop(c)
                parts.extend(arrange(s, e, length, min_length))
            mask_idc = np.asarray(mask_idc)
        else:
            min_len = min(lengths)
            if sz - min_len <= num_mask:
                min_len = sz - num_mask - 1

            mask_idc = np.random.choice(sz - min_len, num_mask, replace=False)

            mask_idc = np.asarray([mask_idc[j] + offset for j in range(len(mask_idc)) for offset in range(lengths[j])])

        mask_idcs.append(np.unique(mask_idc[mask_idc < sz]))

    min_len = min([len(m) for m in mask_idcs])
    for i, mask_idc in enumerate(mask_idcs):
        if len(mask_idc) > min_len:
            mask_idc = np.random.choice(mask_idc, min_len, replace=False)
        mask[i, mask_idc] = True

    return mask


class WavLMConfig:
    def __init__(self, cfg=None):
        self.extractor_mode: str = "default"  # mode for feature extractor. default has a single group norm with d groups in the first conv block, whereas layer_norm has layer norms in every block (meant to use with normalize=True)
        self.encoder_layers: int = 12  # num encoder layers in the transformer

        self.encoder_embed_dim: int = 768  # encoder embedding dimension
        self.encoder_ffn_embed_dim: int = 3072  # encoder embedding dimension for FFN
        self.encoder_attention_heads: int = 12  # num encoder attention heads
        self.activation_fn: str = "gelu"  # activation function to use

        self.layer_norm_first: bool = False  # apply layernorm first in the transformer
        self.conv_feature_layers: str = "[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2"  # string describing convolutional feature extraction layers in form of a python list that contains [(dim, kernel_size, stride), ...]
        self.conv_bias: bool = False  # include bias in conv encoder
        self.feature_grad_mult: float = 1.0  # multiply feature extractor var grads by this

        self.normalize: bool = False  # normalize input to have 0 mean and unit variance during training

        # dropouts
        self.dropout: float = 0.1  # dropout probability for the transformer
        self.attention_dropout: float = 0.1  # dropout probability for attention weights
        self.activation_dropout: float = 0.0  # dropout probability after activation in FFN
        self.encoder_layerdrop: float = 0.0  # probability of dropping a tarnsformer layer
        self.dropout_input: float = 0.0  # dropout to apply to the input (after feat extr)
        self.dropout_features: float = 0.0  # dropout to apply to the features (after feat extr)

        # masking
        self.mask_length: int = 10  # mask length
        self.mask_prob: float = 0.65  # probability of replacing a token with mask
        self.mask_selection: str = "static"  # how to choose mask length
        self.mask_other: float = (
            0  # secondary mask argument (used for more complex distributions), see help in compute_mask_indicesh
        )
        self.no_mask_overlap: bool = False  # whether to allow masks to overlap
        self.mask_min_space: int = 1  # min space between spans (if no overlap is enabled)

        # channel masking
        self.mask_channel_length: int = 10  # length of the mask for features (channels)
        self.mask_channel_prob: float = 0.0  # probability of replacing a feature with 0
        self.mask_channel_selection: str = "static"  # how to choose mask length for channel masking
        self.mask_channel_other: float = (
            0  # secondary mask argument (used for more complex distributions), see help in compute_mask_indices
        )
        self.no_mask_channel_overlap: bool = False  # whether to allow channel masks to overlap
        self.mask_channel_min_space: int = 1  # min space between spans (if no overlap is enabled)

        # positional embeddings
        self.conv_pos: int = 128  # number of filters for convolutional positional embeddings
        self.conv_pos_groups: int = 16  # number of groups for convolutional positional embedding

        # relative position embedding
        self.relative_position_embedding: bool = False  # apply relative position embedding
        self.num_buckets: int = 320  # number of buckets for relative position embedding
        self.max_distance: int = 1280  # maximum distance for relative position embedding
        self.gru_rel_pos: bool = False  # apply gated relative position embedding

        if cfg is not None:
            self.update(cfg)

    def update(self, cfg: dict):
        self.__dict__.update(cfg)


class WavLM(nn.Module):
    def __init__(
        self,
        cfg: WavLMConfig,
    ) -> None:
        super().__init__()
        logger.info(f"WavLM Config: {cfg.__dict__}")

        self.cfg = cfg
        feature_enc_layers = eval(cfg.conv_feature_layers)
        self.embed = feature_enc_layers[-1][0]

        self.feature_extractor = ConvFeatureExtractionModel(
            conv_layers=feature_enc_layers,
            dropout=0.0,
            mode=cfg.extractor_mode,
            conv_bias=cfg.conv_bias,
        )

        self.post_extract_proj = (
            nn.Linear(self.embed, cfg.encoder_embed_dim) if self.embed != cfg.encoder_embed_dim else None
        )

        self.mask_prob = cfg.mask_prob
        self.mask_selection = cfg.mask_selection
        self.mask_other = cfg.mask_other
        self.mask_length = cfg.mask_length
        self.no_mask_overlap = cfg.no_mask_overlap
        self.mask_min_space = cfg.mask_min_space

        self.mask_channel_prob = cfg.mask_channel_prob
        self.mask_channel_selection = cfg.mask_channel_selection
        self.mask_channel_other = cfg.mask_channel_other
        self.mask_channel_length = cfg.mask_channel_length
        self.no_mask_channel_overlap = cfg.no_mask_channel_overlap
        self.mask_channel_min_space = cfg.mask_channel_min_space

        self.dropout_input = nn.Dropout(cfg.dropout_input)
        self.dropout_features = nn.Dropout(cfg.dropout_features)

        self.feature_grad_mult = cfg.feature_grad_mult

        self.mask_emb = nn.Parameter(torch.FloatTensor(cfg.encoder_embed_dim).uniform_())

        self.encoder = TransformerEncoder(cfg)
        self.layer_norm = LayerNorm(self.embed)

    def apply_mask(self, x, padding_mask):
        B, T, C = x.shape
        if self.mask_prob > 0:
            mask_indices = compute_mask_indices(
                (B, T),
                padding_mask,
                self.mask_prob,
                self.mask_length,
                self.mask_selection,
                self.mask_other,
                min_masks=2,
                no_overlap=self.no_mask_overlap,
                min_space=self.mask_min_space,
            )
            mask_indices = torch.from_numpy(mask_indices).to(x.device)
            x[mask_indices] = self.mask_emb
        else:
            mask_indices = None

        if self.mask_channel_prob > 0:
            mask_channel_indices = compute_mask_indices(
                (B, C),
                None,
                self.mask_channel_prob,
                self.mask_channel_length,
                self.mask_channel_selection,
                self.mask_channel_other,
                no_overlap=self.no_mask_channel_overlap,
                min_space=self.mask_channel_min_space,
            )
            mask_channel_indices = torch.from_numpy(mask_channel_indices).to(x.device).unsqueeze(1).expand(-1, T, -1)
            x[mask_channel_indices] = 0

        return x, mask_indices

    def forward_padding_mask(
        self,
        features: torch.Tensor,
        padding_mask: torch.Tensor,
    ) -> torch.Tensor:
        extra = padding_mask.size(1) % features.size(1)
        if extra > 0:
            padding_mask = padding_mask[:, :-extra]
        padding_mask = padding_mask.view(padding_mask.size(0), features.size(1), -1)
        # padding_mask = padding_mask.all(-1)
        padding_mask = padding_mask.any(-1)
        return padding_mask

    def extract_features(
        self,
        source: torch.Tensor,
        padding_mask: Optional[torch.Tensor] = None,
        mask: bool = False,
        ret_conv: bool = False,
        output_layer: Optional[int] = None,
        ret_layer_results: bool = False,
    ):
        if self.feature_grad_mult > 0:
            features = self.feature_extractor(source)
            if self.feature_grad_mult != 1.0:
                features = GradMultiply.apply(features, self.feature_grad_mult)
        else:
            with torch.no_grad():
                features = self.feature_extractor(source)

        features = features.transpose(1, 2)
        features = self.layer_norm(features)

        if padding_mask is not None:
            padding_mask = self.forward_padding_mask(features, padding_mask)

        if self.post_extract_proj is not None:
            features = self.post_extract_proj(features)

        features = self.dropout_input(features)

        if mask:
            x, mask_indices = self.apply_mask(features, padding_mask)
        else:
            x = features

        # feature: (B, T, D), float
        # target: (B, T), long
        # x: (B, T, D), float
        # padding_mask: (B, T), bool
        # mask_indices: (B, T), bool
        x, layer_results = self.encoder(
            x, padding_mask=padding_mask, layer=None if output_layer is None else output_layer - 1
        )

        res = {"x": x, "padding_mask": padding_mask, "features": features, "layer_results": layer_results}

        feature = res["features"] if ret_conv else res["x"]
        if ret_layer_results:
            feature = (feature, res["layer_results"])
        return feature, res["padding_mask"]


class ConvFeatureExtractionModel(nn.Module):
    def __init__(
        self,
        conv_layers: List[Tuple[int, int, int]],
        dropout: float = 0.0,
        mode: str = "default",
        conv_bias: bool = False,
        conv_type: str = "default",
    ):
        super().__init__()

        assert mode in {"default", "layer_norm"}

        def block(
            n_in,
            n_out,
            k,
            stride,
            is_layer_norm=False,
            is_group_norm=False,
            conv_bias=False,
        ):
            def make_conv():
                conv = nn.Conv1d(n_in, n_out, k, stride=stride, bias=conv_bias)
                nn.init.kaiming_normal_(conv.weight)
                return conv

            assert (is_layer_norm and is_group_norm) == False, "layer norm and group norm are exclusive"

            if is_layer_norm:
                return nn.Sequential(
                    make_conv(),
                    nn.Dropout(p=dropout),
                    nn.Sequential(
                        TransposeLast(),
                        Fp32LayerNorm(dim, elementwise_affine=True),
                        TransposeLast(),
                    ),
                    nn.GELU(),
                )
            elif is_group_norm:
                return nn.Sequential(
                    make_conv(),
                    nn.Dropout(p=dropout),
                    Fp32GroupNorm(dim, dim, affine=True),
                    nn.GELU(),
                )
            else:
                return nn.Sequential(make_conv(), nn.Dropout(p=dropout), nn.GELU())

        self.conv_type = conv_type
        if self.conv_type == "default":
            in_d = 1
            self.conv_layers = nn.ModuleList()
            for i, cl in enumerate(conv_layers):
                assert len(cl) == 3, "invalid conv definition: " + str(cl)
                (dim, k, stride) = cl

                self.conv_layers.append(
                    block(
                        in_d,
                        dim,
                        k,
                        stride,
                        is_layer_norm=mode == "layer_norm",
                        is_group_norm=mode == "default" and i == 0,
                        conv_bias=conv_bias,
                    )
                )
                in_d = dim
        elif self.conv_type == "conv2d":
            in_d = 1
            self.conv_layers = nn.ModuleList()
            for i, cl in enumerate(conv_layers):
                assert len(cl) == 3
                (dim, k, stride) = cl

                self.conv_layers.append(torch.nn.Conv2d(in_d, dim, k, stride))
                self.conv_layers.append(torch.nn.ReLU())
                in_d = dim
        elif self.conv_type == "custom":
            in_d = 1
            idim = 80
            self.conv_layers = nn.ModuleList()
            for i, cl in enumerate(conv_layers):
                assert len(cl) == 3
                (dim, k, stride) = cl
                self.conv_layers.append(torch.nn.Conv2d(in_d, dim, k, stride, padding=1))
                self.conv_layers.append(torch.nn.LayerNorm([dim, idim]))
                self.conv_layers.append(torch.nn.ReLU())
                in_d = dim
                if (i + 1) % 2 == 0:
                    self.conv_layers.append(torch.nn.MaxPool2d(2, stride=2, ceil_mode=True))
                    idim = int(math.ceil(idim / 2))
        else:
            pass

    def forward(self, x, mask=None):
        # BxT -> BxCxT
        x = x.unsqueeze(1)
        if self.conv_type == "custom":
            for conv in self.conv_layers:
                if isinstance(conv, nn.LayerNorm):
                    x = x.transpose(1, 2)
                    x = conv(x).transpose(1, 2)
                else:
                    x = conv(x)
            x = x.transpose(2, 3).contiguous()
            x = x.view(x.size(0), -1, x.size(-1))
        else:
            for conv in self.conv_layers:
                x = conv(x)
            if self.conv_type == "conv2d":
                b, c, t, f = x.size()
                x = x.transpose(2, 3).contiguous().view(b, c * f, t)
        return x


class TransformerEncoder(nn.Module):
    def __init__(self, args):
        super().__init__()

        self.dropout = args.dropout
        self.embedding_dim = args.encoder_embed_dim

        self.pos_conv = nn.Conv1d(
            self.embedding_dim,
            self.embedding_dim,
            kernel_size=args.conv_pos,
            padding=args.conv_pos // 2,
            groups=args.conv_pos_groups,
        )
        dropout = 0
        std = math.sqrt((4 * (1.0 - dropout)) / (args.conv_pos * self.embedding_dim))
        nn.init.normal_(self.pos_conv.weight, mean=0, std=std)
        nn.init.constant_(self.pos_conv.bias, 0)

        self.pos_conv = nn.utils.weight_norm(self.pos_conv, name="weight", dim=2)
        self.pos_conv = nn.Sequential(self.pos_conv, SamePad(args.conv_pos), nn.GELU())

        if hasattr(args, "relative_position_embedding"):
            self.relative_position_embedding = args.relative_position_embedding
            self.num_buckets = args.num_buckets
            self.max_distance = args.max_distance
        else:
            self.relative_position_embedding = False
            self.num_buckets = 0
            self.max_distance = 0

        self.layers = nn.ModuleList(
            [
                TransformerSentenceEncoderLayer(
                    embedding_dim=self.embedding_dim,
                    ffn_embedding_dim=args.encoder_ffn_embed_dim,
                    num_attention_heads=args.encoder_attention_heads,
                    dropout=self.dropout,
                    attention_dropout=args.attention_dropout,
                    activation_dropout=args.activation_dropout,
                    activation_fn=args.activation_fn,
                    layer_norm_first=args.layer_norm_first,
                    has_relative_attention_bias=(self.relative_position_embedding and i == 0),
                    num_buckets=self.num_buckets,
                    max_distance=self.max_distance,
                    gru_rel_pos=args.gru_rel_pos,
                )
                for i in range(args.encoder_layers)
            ]
        )

        self.layer_norm_first = args.layer_norm_first
        self.layer_norm = LayerNorm(self.embedding_dim)
        self.layerdrop = args.encoder_layerdrop

        self.apply(init_bert_params)

    def forward(self, x, padding_mask=None, streaming_mask=None, layer=None):
        x, layer_results = self.extract_features(x, padding_mask, streaming_mask, layer)

        if self.layer_norm_first and layer is None:
            x = self.layer_norm(x)

        return x, layer_results

    def extract_features(self, x, padding_mask=None, streaming_mask=None, tgt_layer=None):
        if padding_mask is not None:
            x[padding_mask] = 0

        x_conv = self.pos_conv(x.transpose(1, 2))
        x_conv = x_conv.transpose(1, 2)
        x += x_conv

        if not self.layer_norm_first:
            x = self.layer_norm(x)

        x = F.dropout(x, p=self.dropout, training=self.training)

        # B x T x C -> T x B x C
        x = x.transpose(0, 1)

        layer_results = []
        z = None
        if tgt_layer is not None:
            layer_results.append((x, z))
        r = None
        pos_bias = None
        for i, layer in enumerate(self.layers):
            dropout_probability = np.random.random()
            if not self.training or (dropout_probability > self.layerdrop):
                x, z, pos_bias = layer(
                    x,
                    self_attn_padding_mask=padding_mask,
                    need_weights=False,
                    self_attn_mask=streaming_mask,
                    pos_bias=pos_bias,
                )
            if tgt_layer is not None:
                layer_results.append((x, z))
            if i == tgt_layer:
                r = x
                break

        if r is not None:
            x = r

        # T x B x C -> B x T x C
        x = x.transpose(0, 1)

        return x, layer_results


class TransformerSentenceEncoderLayer(nn.Module):
    """
    Implements a Transformer Encoder Layer used in BERT/XLM style pre-trained
    models.
    """

    def __init__(
        self,
        embedding_dim: float = 768,
        ffn_embedding_dim: float = 3072,
        num_attention_heads: float = 8,
        dropout: float = 0.1,
        attention_dropout: float = 0.1,
        activation_dropout: float = 0.1,
        activation_fn: str = "relu",
        layer_norm_first: bool = False,
        has_relative_attention_bias: bool = False,
        num_buckets: int = 0,
        max_distance: int = 0,
        rescale_init: bool = False,
        gru_rel_pos: bool = False,
    ) -> None:
        super().__init__()
        # Initialize parameters
        self.embedding_dim = embedding_dim
        self.dropout = dropout
        self.activation_dropout = activation_dropout

        # Initialize blocks
        self.activation_name = activation_fn
        self.activation_fn = get_activation_fn(activation_fn)
        self.self_attn = MultiheadAttention(
            self.embedding_dim,
            num_attention_heads,
            dropout=attention_dropout,
            self_attention=True,
            has_relative_attention_bias=has_relative_attention_bias,
            num_buckets=num_buckets,
            max_distance=max_distance,
            rescale_init=rescale_init,
            gru_rel_pos=gru_rel_pos,
        )

        self.dropout1 = nn.Dropout(dropout)
        self.dropout2 = nn.Dropout(self.activation_dropout)
        self.dropout3 = nn.Dropout(dropout)

        self.layer_norm_first = layer_norm_first

        # layer norm associated with the self attention layer
        self.self_attn_layer_norm = LayerNorm(self.embedding_dim)

        if self.activation_name == "glu":
            self.fc1 = GLU_Linear(self.embedding_dim, ffn_embedding_dim, "swish")
        else:
            self.fc1 = nn.Linear(self.embedding_dim, ffn_embedding_dim)
        self.fc2 = nn.Linear(ffn_embedding_dim, self.embedding_dim)

        # layer norm associated with the position wise feed-forward NN
        self.final_layer_norm = LayerNorm(self.embedding_dim)

    def forward(
        self,
        x: torch.Tensor,
        self_attn_mask: torch.Tensor = None,
        self_attn_padding_mask: torch.Tensor = None,
        need_weights: bool = False,
        pos_bias=None,
    ):
        """
        LayerNorm is applied either before or after the self-attention/ffn
        modules similar to the original Transformer imlementation.
        """
        residual = x

        if self.layer_norm_first:
            x = self.self_attn_layer_norm(x)
            x, attn, pos_bias = self.self_attn(
                query=x,
                key=x,
                value=x,
                key_padding_mask=self_attn_padding_mask,
                need_weights=False,
                attn_mask=self_attn_mask,
                position_bias=pos_bias,
            )
            x = self.dropout1(x)
            x = residual + x

            residual = x
            x = self.final_layer_norm(x)
            if self.activation_name == "glu":
                x = self.fc1(x)
            else:
                x = self.activation_fn(self.fc1(x))
            x = self.dropout2(x)
            x = self.fc2(x)
            x = self.dropout3(x)
            x = residual + x
        else:
            x, attn, pos_bias = self.self_attn(
                query=x,
                key=x,
                value=x,
                key_padding_mask=self_attn_padding_mask,
                need_weights=need_weights,
                attn_mask=self_attn_mask,
                position_bias=pos_bias,
            )

            x = self.dropout1(x)
            x = residual + x

            x = self.self_attn_layer_norm(x)

            residual = x
            if self.activation_name == "glu":
                x = self.fc1(x)
            else:
                x = self.activation_fn(self.fc1(x))
            x = self.dropout2(x)
            x = self.fc2(x)
            x = self.dropout3(x)
            x = residual + x
            x = self.final_layer_norm(x)

        return x, attn, pos_bias


================================================
FILE: TTS/vocoder/README.md
================================================
# Mozilla TTS Vocoders (Experimental)

Here there are vocoder model implementations which can be combined with the other TTS models.

Currently, following models are implemented:

- Melgan
- MultiBand-Melgan
- ParallelWaveGAN
- GAN-TTS (Discriminator Only)

It is also very easy to adapt different vocoder models as we provide a flexible and modular (but not too modular) framework.

## Training a model

You can see here an example (Soon)[Colab Notebook]() training MelGAN with LJSpeech dataset.

In order to train a new model, you need to gather all wav files into a folder and give this folder to `data_path` in '''config.json'''

You need to define other relevant parameters in your ```config.json``` and then start traning with the following command.

```CUDA_VISIBLE_DEVICES='0' python tts/bin/train_vocoder.py --config_path path/to/config.json```

Example config files can be found under `tts/vocoder/configs/` folder.

You can continue a previous training run by the following command.

```CUDA_VISIBLE_DEVICES='0' python tts/bin/train_vocoder.py --continue_path path/to/your/model/folder```

You can fine-tune a pre-trained model by the following command.

```CUDA_VISIBLE_DEVICES='0' python tts/bin/train_vocoder.py --restore_path path/to/your/model.pth```

Restoring a model starts a new training in a different folder. It only restores model weights with the given checkpoint file. However, continuing a training starts from the same directory where the previous training run left off.

You can also follow your training runs on Tensorboard as you do with our TTS models.

## Acknowledgement
Thanks to @kan-bayashi for his [repository](https://github.com/kan-bayashi/ParallelWaveGAN) being the start point of our work.


================================================
FILE: TTS/vocoder/__init__.py
================================================


================================================
FILE: TTS/vocoder/configs/__init__.py
================================================
import importlib
import os
from inspect import isclass

# import all files under configs/
configs_dir = os.path.dirname(__file__)
for file in os.listdir(configs_dir):
    path = os.path.join(configs_dir, file)
    if not file.startswith("_") and not file.startswith(".") and (file.endswith(".py") or os.path.isdir(path)):
        config_name = file[: file.find(".py")] if file.endswith(".py") else file
        module = importlib.import_module("TTS.vocoder.configs." + config_name)
        for attribute_name in dir(module):
            attribute = getattr(module, attribute_name)

            if isclass(attribute):
                # Add the class to this package's variables
                globals()[attribute_name] = attribute


================================================
FILE: TTS/vocoder/configs/fullband_melgan_config.py
================================================
from dataclasses import dataclass, field

from .shared_configs import BaseGANVocoderConfig


@dataclass
class FullbandMelganConfig(BaseGANVocoderConfig):
    """Defines parameters for FullBand MelGAN vocoder.

    Example:

        >>> from TTS.vocoder.configs import FullbandMelganConfig
        >>> config = FullbandMelganConfig()

    Args:
        model (str):
            Model name used for selecting the right model at initialization. Defaults to `fullband_melgan`.
        discriminator_model (str): One of the discriminators from `TTS.vocoder.models.*_discriminator`. Defaults to
            'melgan_multiscale_discriminator`.
        discriminator_model_params (dict): The discriminator model parameters. Defaults to
            '{"base_channels": 16, "max_channels": 1024, "downsample_factors": [4, 4, 4, 4]}`
        generator_model (str): One of the generators from TTS.vocoder.models.*`. Every other non-GAN vocoder model is
            considered as a generator too. Defaults to `melgan_generator`.
        batch_size (int):
            Batch size used at training. Larger values use more memory. Defaults to 16.
        seq_len (int):
            Audio segment length used at training. Larger values use more memory. Defaults to 8192.
        pad_short (int):
            Additional padding applied to the audio samples shorter than `seq_len`. Defaults to 0.
        use_noise_augment (bool):
            enable / disable random noise added to the input waveform. The noise is added after computing the
            features. Defaults to True.
        use_cache (bool):
            enable / disable in memory caching of the computed features. It can cause OOM error if the system RAM is
            not large enough. Defaults to True.
        use_stft_loss (bool):
            enable / disable use of STFT loss originally used by ParallelWaveGAN model. Defaults to True.
        use_subband_stft (bool):
            enable / disable use of subband loss computation originally used by MultiBandMelgan model. Defaults to True.
        use_mse_gan_loss (bool):
            enable / disable using Mean Squeare Error GAN loss. Defaults to True.
        use_hinge_gan_loss (bool):
            enable / disable using Hinge GAN loss. You should choose either Hinge or MSE loss for training GAN models.
            Defaults to False.
        use_feat_match_loss (bool):
            enable / disable using Feature Matching loss originally used by MelGAN model. Defaults to True.
        use_l1_spec_loss (bool):
            enable / disable using L1 spectrogram loss originally used by HifiGAN model. Defaults to False.
        stft_loss_params (dict): STFT loss parameters. Default to
        `{"n_ffts": [1024, 2048, 512], "hop_lengths": [120, 240, 50], "win_lengths": [600, 1200, 240]}`
        stft_loss_weight (float): STFT loss weight that multiplies the computed loss before summing up the total
            model loss. Defaults to 0.5.
        subband_stft_loss_weight (float):
            Subband STFT loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
        mse_G_loss_weight (float):
            MSE generator loss weight that multiplies the computed loss before summing up the total loss. faults to 2.5.
        hinge_G_loss_weight (float):
            Hinge generator loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
        feat_match_loss_weight (float):
            Feature matching loss weight that multiplies the computed loss before summing up the total loss. faults to 108.
        l1_spec_loss_weight (float):
            L1 spectrogram loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
    """

    model: str = "fullband_melgan"

    # Model specific params
    discriminator_model: str = "melgan_multiscale_discriminator"
    discriminator_model_params: dict = field(
        default_factory=lambda: {"base_channels": 16, "max_channels": 512, "downsample_factors": [4, 4, 4]}
    )
    generator_model: str = "melgan_generator"
    generator_model_params: dict = field(
        default_factory=lambda: {"upsample_factors": [8, 8, 2, 2], "num_res_blocks": 4}
    )

    # Training - overrides
    batch_size: int = 16
    seq_len: int = 8192
    pad_short: int = 2000
    use_noise_augment: bool = True
    use_cache: bool = True

    # LOSS PARAMETERS - overrides
    use_stft_loss: bool = True
    use_subband_stft_loss: bool = False
    use_mse_gan_loss: bool = True
    use_hinge_gan_loss: bool = False
    use_feat_match_loss: bool = True  # requires MelGAN Discriminators (MelGAN and HifiGAN)
    use_l1_spec_loss: bool = False

    stft_loss_params: dict = field(
        default_factory=lambda: {
            "n_ffts": [1024, 2048, 512],
            "hop_lengths": [120, 240, 50],
            "win_lengths": [600, 1200, 240],
        }
    )

    # loss weights - overrides
    stft_loss_weight: float = 0.5
    subband_stft_loss_weight: float = 0
    mse_G_loss_weight: float = 2.5
    hinge_G_loss_weight: float = 0
    feat_match_loss_weight: float = 108
    l1_spec_loss_weight: float = 0.0


================================================
FILE: TTS/vocoder/configs/hifigan_config.py
================================================
from dataclasses import dataclass, field

from TTS.vocoder.configs.shared_configs import BaseGANVocoderConfig


@dataclass
class HifiganConfig(BaseGANVocoderConfig):
    """Defines parameters for FullBand MelGAN vocoder.

    Example:

        >>> from TTS.vocoder.configs import HifiganConfig
        >>> config = HifiganConfig()

    Args:
        model (str):
            Model name used for selecting the right model at initialization. Defaults to `hifigan`.
        discriminator_model (str): One of the discriminators from `TTS.vocoder.models.*_discriminator`. Defaults to
            'hifigan_discriminator`.
        generator_model (str): One of the generators from TTS.vocoder.models.*`. Every other non-GAN vocoder model is
            considered as a generator too. Defaults to `hifigan_generator`.
        generator_model_params (dict): Parameters of the generator model. Defaults to
            `
            {
                "upsample_factors": [8, 8, 2, 2],
                "upsample_kernel_sizes": [16, 16, 4, 4],
                "upsample_initial_channel": 512,
                "resblock_kernel_sizes": [3, 7, 11],
                "resblock_dilation_sizes": [[1, 3, 5], [1, 3, 5], [1, 3, 5]],
                "resblock_type": "1",
            }
            `
        batch_size (int):
            Batch size used at training. Larger values use more memory. Defaults to 16.
        seq_len (int):
            Audio segment length used at training. Larger values use more memory. Defaults to 8192.
        pad_short (int):
            Additional padding applied to the audio samples shorter than `seq_len`. Defaults to 0.
        use_noise_augment (bool):
            enable / disable random noise added to the input waveform. The noise is added after computing the
            features. Defaults to True.
        use_cache (bool):
            enable / disable in memory caching of the computed features. It can cause OOM error if the system RAM is
            not large enough. Defaults to True.
        use_stft_loss (bool):
            enable / disable use of STFT loss originally used by ParallelWaveGAN model. Defaults to True.
        use_subband_stft (bool):
            enable / disable use of subband loss computation originally used by MultiBandMelgan model. Defaults to True.
        use_mse_gan_loss (bool):
            enable / disable using Mean Squeare Error GAN loss. Defaults to True.
        use_hinge_gan_loss (bool):
            enable / disable using Hinge GAN loss. You should choose either Hinge or MSE loss for training GAN models.
            Defaults to False.
        use_feat_match_loss (bool):
            enable / disable using Feature Matching loss originally used by MelGAN model. Defaults to True.
        use_l1_spec_loss (bool):
            enable / disable using L1 spectrogram loss originally used by HifiGAN model. Defaults to False.
        stft_loss_params (dict):
            STFT loss parameters. Default to
            `{
                "n_ffts": [1024, 2048, 512],
                "hop_lengths": [120, 240, 50],
                "win_lengths": [600, 1200, 240]
            }`
        l1_spec_loss_params (dict):
            L1 spectrogram loss parameters. Default to
            `{
                "use_mel": True,
                "sample_rate": 22050,
                "n_fft": 1024,
                "hop_length": 256,
                "win_length": 1024,
                "n_mels": 80,
                "mel_fmin": 0.0,
                "mel_fmax": None,
            }`
        stft_loss_weight (float): STFT loss weight that multiplies the computed loss before summing up the total
            model loss. Defaults to 0.5.
        subband_stft_loss_weight (float):
            Subband STFT loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
        mse_G_loss_weight (float):
            MSE generator loss weight that multiplies the computed loss before summing up the total loss. faults to 2.5.
        hinge_G_loss_weight (float):
            Hinge generator loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
        feat_match_loss_weight (float):
            Feature matching loss weight that multiplies the computed loss before summing up the total loss. faults to 108.
        l1_spec_loss_weight (float):
            L1 spectrogram loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
    """

    model: str = "hifigan"
    # model specific params
    discriminator_model: str = "hifigan_discriminator"
    generator_model: str = "hifigan_generator"
    generator_model_params: dict = field(
        default_factory=lambda: {
            "upsample_factors": [8, 8, 2, 2],
            "upsample_kernel_sizes": [16, 16, 4, 4],
            "upsample_initial_channel": 512,
            "resblock_kernel_sizes": [3, 7, 11],
            "resblock_dilation_sizes": [[1, 3, 5], [1, 3, 5], [1, 3, 5]],
            "resblock_type": "1",
        }
    )

    # LOSS PARAMETERS - overrides
    use_stft_loss: bool = False
    use_subband_stft_loss: bool = False
    use_mse_gan_loss: bool = True
    use_hinge_gan_loss: bool = False
    use_feat_match_loss: bool = True  # requires MelGAN Discriminators (MelGAN and HifiGAN)
    use_l1_spec_loss: bool = True

    # loss weights - overrides
    stft_loss_weight: float = 0
    subband_stft_loss_weight: float = 0
    mse_G_loss_weight: float = 1
    hinge_G_loss_weight: float = 0
    feat_match_loss_weight: float = 108
    l1_spec_loss_weight: float = 45
    l1_spec_loss_params: dict = field(
        default_factory=lambda: {
            "use_mel": True,
            "sample_rate": 22050,
            "n_fft": 1024,
            "hop_length": 256,
            "win_length": 1024,
            "n_mels": 80,
            "mel_fmin": 0.0,
            "mel_fmax": None,
        }
    )

    # optimizer parameters
    lr: float = 1e-4
    wd: float = 1e-6


================================================
FILE: TTS/vocoder/configs/melgan_config.py
================================================
from dataclasses import dataclass, field

from TTS.vocoder.configs.shared_configs import BaseGANVocoderConfig


@dataclass
class MelganConfig(BaseGANVocoderConfig):
    """Defines parameters for MelGAN vocoder.

    Example:

        >>> from TTS.vocoder.configs import MelganConfig
        >>> config = MelganConfig()

    Args:
        model (str):
            Model name used for selecting the right model at initialization. Defaults to `melgan`.
        discriminator_model (str): One of the discriminators from `TTS.vocoder.models.*_discriminator`. Defaults to
            'melgan_multiscale_discriminator`.
        discriminator_model_params (dict): The discriminator model parameters. Defaults to
            '{"base_channels": 16, "max_channels": 1024, "downsample_factors": [4, 4, 4, 4]}`
        generator_model (str): One of the generators from TTS.vocoder.models.*`. Every other non-GAN vocoder model is
            considered as a generator too. Defaults to `melgan_generator`.
        batch_size (int):
            Batch size used at training. Larger values use more memory. Defaults to 16.
        seq_len (int):
            Audio segment length used at training. Larger values use more memory. Defaults to 8192.
        pad_short (int):
            Additional padding applied to the audio samples shorter than `seq_len`. Defaults to 0.
        use_noise_augment (bool):
            enable / disable random noise added to the input waveform. The noise is added after computing the
            features. Defaults to True.
        use_cache (bool):
            enable / disable in memory caching of the computed features. It can cause OOM error if the system RAM is
            not large enough. Defaults to True.
        use_stft_loss (bool):
            enable / disable use of STFT loss originally used by ParallelWaveGAN model. Defaults to True.
        use_subband_stft (bool):
            enable / disable use of subband loss computation originally used by MultiBandMelgan model. Defaults to True.
        use_mse_gan_loss (bool):
            enable / disable using Mean Squeare Error GAN loss. Defaults to True.
        use_hinge_gan_loss (bool):
            enable / disable using Hinge GAN loss. You should choose either Hinge or MSE loss for training GAN models.
            Defaults to False.
        use_feat_match_loss (bool):
            enable / disable using Feature Matching loss originally used by MelGAN model. Defaults to True.
        use_l1_spec_loss (bool):
            enable / disable using L1 spectrogram loss originally used by HifiGAN model. Defaults to False.
        stft_loss_params (dict): STFT loss parameters. Default to
        `{"n_ffts": [1024, 2048, 512], "hop_lengths": [120, 240, 50], "win_lengths": [600, 1200, 240]}`
        stft_loss_weight (float): STFT loss weight that multiplies the computed loss before summing up the total
            model loss. Defaults to 0.5.
        subband_stft_loss_weight (float):
            Subband STFT loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
        mse_G_loss_weight (float):
            MSE generator loss weight that multiplies the computed loss before summing up the total loss. faults to 2.5.
        hinge_G_loss_weight (float):
            Hinge generator loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
        feat_match_loss_weight (float):
            Feature matching loss weight that multiplies the computed loss before summing up the total loss. faults to 108.
        l1_spec_loss_weight (float):
            L1 spectrogram loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
    """

    model: str = "melgan"

    # Model specific params
    discriminator_model: str = "melgan_multiscale_discriminator"
    discriminator_model_params: dict = field(
        default_factory=lambda: {"base_channels": 16, "max_channels": 1024, "downsample_factors": [4, 4, 4, 4]}
    )
    generator_model: str = "melgan_generator"
    generator_model_params: dict = field(
        default_factory=lambda: {"upsample_factors": [8, 8, 2, 2], "num_res_blocks": 3}
    )

    # Training - overrides
    batch_size: int = 16
    seq_len: int = 8192
    pad_short: int = 2000
    use_noise_augment: bool = True
    use_cache: bool = True

    # LOSS PARAMETERS - overrides
    use_stft_loss: bool = True
    use_subband_stft_loss: bool = False
    use_mse_gan_loss: bool = True
    use_hinge_gan_loss: bool = False
    use_feat_match_loss: bool = True  # requires MelGAN Discriminators (MelGAN and HifiGAN)
    use_l1_spec_loss: bool = False

    stft_loss_params: dict = field(
        default_factory=lambda: {
            "n_ffts": [1024, 2048, 512],
            "hop_lengths": [120, 240, 50],
            "win_lengths": [600, 1200, 240],
        }
    )

    # loss weights - overrides
    stft_loss_weight: float = 0.5
    subband_stft_loss_weight: float = 0
    mse_G_loss_weight: float = 2.5
    hinge_G_loss_weight: float = 0
    feat_match_loss_weight: float = 108
    l1_spec_loss_weight: float = 0


================================================
FILE: TTS/vocoder/configs/multiband_melgan_config.py
================================================
from dataclasses import dataclass, field

from TTS.vocoder.configs.shared_configs import BaseGANVocoderConfig


@dataclass
class MultibandMelganConfig(BaseGANVocoderConfig):
    """Defines parameters for MultiBandMelGAN vocoder.

    Example:

        >>> from TTS.vocoder.configs import MultibandMelganConfig
        >>> config = MultibandMelganConfig()

    Args:
        model (str):
            Model name used for selecting the right model at initialization. Defaults to `multiband_melgan`.
        discriminator_model (str): One of the discriminators from `TTS.vocoder.models.*_discriminator`. Defaults to
            'melgan_multiscale_discriminator`.
        discriminator_model_params (dict): The discriminator model parameters. Defaults to
            '{
                "base_channels": 16,
                "max_channels": 512,
                "downsample_factors": [4, 4, 4]
            }`
        generator_model (str): One of the generators from TTS.vocoder.models.*`. Every other non-GAN vocoder model is
            considered as a generator too. Defaults to `melgan_generator`.
        generator_model_param (dict):
            The generator model parameters. Defaults to `{"upsample_factors": [8, 4, 2], "num_res_blocks": 4}`.
        use_pqmf (bool):
            enable / disable PQMF modulation for multi-band training. Defaults to True.
        lr_gen (float):
            Initial learning rate for the generator model. Defaults to 0.0001.
        lr_disc (float):
            Initial learning rate for the discriminator model. Defaults to 0.0001.
        optimizer (torch.optim.Optimizer):
            Optimizer used for the training. Defaults to `AdamW`.
        optimizer_params (dict):
            Optimizer kwargs. Defaults to `{"betas": [0.8, 0.99], "weight_decay": 0.0}`
        lr_scheduler_gen (torch.optim.Scheduler):
            Learning rate scheduler for the generator. Defaults to `MultiStepLR`.
        lr_scheduler_gen_params (dict):
            Parameters for the generator learning rate scheduler. Defaults to
            `{"gamma": 0.5, "milestones": [100000, 200000, 300000, 400000, 500000, 600000]}`.
        lr_scheduler_disc (torch.optim.Scheduler):
            Learning rate scheduler for the discriminator. Defaults to `MultiStepLR`.
        lr_scheduler_dict_params (dict):
            Parameters for the discriminator learning rate scheduler. Defaults to
            `{"gamma": 0.5, "milestones": [100000, 200000, 300000, 400000, 500000, 600000]}`.
        batch_size (int):
            Batch size used at training. Larger values use more memory. Defaults to 16.
        seq_len (int):
            Audio segment length used at training. Larger values use more memory. Defaults to 8192.
        pad_short (int):
            Additional padding applied to the audio samples shorter than `seq_len`. Defaults to 0.
        use_noise_augment (bool):
            enable / disable random noise added to the input waveform. The noise is added after computing the
            features. Defaults to True.
        use_cache (bool):
            enable / disable in memory caching of the computed features. It can cause OOM error if the system RAM is
            not large enough. Defaults to True.
        steps_to_start_discriminator (int):
            Number of steps required to start training the discriminator. Defaults to 0.
        use_stft_loss (bool):`
            enable / disable use of STFT loss originally used by ParallelWaveGAN model. Defaults to True.
        use_subband_stft (bool):
            enable / disable use of subband loss computation originally used by MultiBandMelgan model. Defaults to True.
        use_mse_gan_loss (bool):
            enable / disable using Mean Squeare Error GAN loss. Defaults to True.
        use_hinge_gan_loss (bool):
            enable / disable using Hinge GAN loss. You should choose either Hinge or MSE loss for training GAN models.
            Defaults to False.
        use_feat_match_loss (bool):
            enable / disable using Feature Matching loss originally used by MelGAN model. Defaults to True.
        use_l1_spec_loss (bool):
            enable / disable using L1 spectrogram loss originally used by HifiGAN model. Defaults to False.
        stft_loss_params (dict): STFT loss parameters. Default to
            `{"n_ffts": [1024, 2048, 512], "hop_lengths": [120, 240, 50], "win_lengths": [600, 1200, 240]}`
        stft_loss_weight (float): STFT loss weight that multiplies the computed loss before summing up the total
            model loss. Defaults to 0.5.
        subband_stft_loss_weight (float):
            Subband STFT loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
        mse_G_loss_weight (float):
            MSE generator loss weight that multiplies the computed loss before summing up the total loss. faults to 2.5.
        hinge_G_loss_weight (float):
            Hinge generator loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
        feat_match_loss_weight (float):
            Feature matching loss weight that multiplies the computed loss before summing up the total loss. faults to 108.
        l1_spec_loss_weight (float):
            L1 spectrogram loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
    """

    model: str = "multiband_melgan"

    # Model specific params
    discriminator_model: str = "melgan_multiscale_discriminator"
    discriminator_model_params: dict = field(
        default_factory=lambda: {"base_channels": 16, "max_channels": 512, "downsample_factors": [4, 4, 4]}
    )
    generator_model: str = "multiband_melgan_generator"
    generator_model_params: dict = field(default_factory=lambda: {"upsample_factors": [8, 4, 2], "num_res_blocks": 4})
    use_pqmf: bool = True

    # optimizer - overrides
    lr_gen: float = 0.0001  # Initial learning rate.
    lr_disc: float = 0.0001  # Initial learning rate.
    optimizer: str = "AdamW"
    optimizer_params: dict = field(default_factory=lambda: {"betas": [0.8, 0.99], "weight_decay": 0.0})
    lr_scheduler_gen: str = "MultiStepLR"  # one of the schedulers from https:#pytorch.org/docs/stable/optim.html
    lr_scheduler_gen_params: dict = field(
        default_factory=lambda: {"gamma": 0.5, "milestones": [100000, 200000, 300000, 400000, 500000, 600000]}
    )
    lr_scheduler_disc: str = "MultiStepLR"  # one of the schedulers from https:#pytorch.org/docs/stable/optim.html
    lr_scheduler_disc_params: dict = field(
        default_factory=lambda: {"gamma": 0.5, "milestones": [100000, 200000, 300000, 400000, 500000, 600000]}
    )

    # Training - overrides
    batch_size: int = 64
    seq_len: int = 16384
    pad_short: int = 2000
    use_noise_augment: bool = False
    use_cache: bool = True
    steps_to_start_discriminator: bool = 200000

    # LOSS PARAMETERS - overrides
    use_stft_loss: bool = True
    use_subband_stft_loss: bool = True
    use_mse_gan_loss: bool = True
    use_hinge_gan_loss: bool = False
    use_feat_match_loss: bool = False  # requires MelGAN Discriminators (MelGAN and HifiGAN)
    use_l1_spec_loss: bool = False

    subband_stft_loss_params: dict = field(
        default_factory=lambda: {"n_ffts": [384, 683, 171], "hop_lengths": [30, 60, 10], "win_lengths": [150, 300, 60]}
    )

    # loss weights - overrides
    stft_loss_weight: float = 0.5
    subband_stft_loss_weight: float = 0
    mse_G_loss_weight: float = 2.5
    hinge_G_loss_weight: float = 0
    feat_match_loss_weight: float = 108
    l1_spec_loss_weight: float = 0


================================================
FILE: TTS/vocoder/configs/parallel_wavegan_config.py
================================================
from dataclasses import dataclass, field

from .shared_configs import BaseGANVocoderConfig


@dataclass
class ParallelWaveganConfig(BaseGANVocoderConfig):
    """Defines parameters for ParallelWavegan vocoder.

    Args:
        model (str):
            Model name used for selecting the right configuration at initialization. Defaults to `gan`.
        discriminator_model (str): One of the discriminators from `TTS.vocoder.models.*_discriminator`. Defaults to
            'parallel_wavegan_discriminator`.
        discriminator_model_params (dict): The discriminator model kwargs. Defaults to
            '{"num_layers": 10}`
        generator_model (str): One of the generators from TTS.vocoder.models.*`. Every other non-GAN vocoder model is
            considered as a generator too. Defaults to `parallel_wavegan_generator`.
        generator_model_param (dict):
            The generator model kwargs. Defaults to `{"upsample_factors": [4, 4, 4, 4], "stacks": 3, "num_res_blocks": 30}`.
        batch_size (int):
            Batch size used at training. Larger values use more memory. Defaults to 16.
        seq_len (int):
            Audio segment length used at training. Larger values use more memory. Defaults to 8192.
        pad_short (int):
            Additional padding applied to the audio samples shorter than `seq_len`. Defaults to 0.
        use_noise_augment (bool):
            enable / disable random noise added to the input waveform. The noise is added after computing the
            features. Defaults to True.
        use_cache (bool):
            enable / disable in memory caching of the computed features. It can cause OOM error if the system RAM is
            not large enough. Defaults to True.
        steps_to_start_discriminator (int):
            Number of steps required to start training the discriminator. Defaults to 0.
        use_stft_loss (bool):`
            enable / disable use of STFT loss originally used by ParallelWaveGAN model. Defaults to True.
        use_subband_stft (bool):
            enable / disable use of subband loss computation originally used by MultiBandMelgan model. Defaults to True.
        use_mse_gan_loss (bool):
            enable / disable using Mean Squeare Error GAN loss. Defaults to True.
        use_hinge_gan_loss (bool):
            enable / disable using Hinge GAN loss. You should choose either Hinge or MSE loss for training GAN models.
            Defaults to False.
        use_feat_match_loss (bool):
            enable / disable using Feature Matching loss originally used by MelGAN model. Defaults to True.
        use_l1_spec_loss (bool):
            enable / disable using L1 spectrogram loss originally used by HifiGAN model. Defaults to False.
        stft_loss_params (dict): STFT loss parameters. Default to
            `{"n_ffts": [1024, 2048, 512], "hop_lengths": [120, 240, 50], "win_lengths": [600, 1200, 240]}`
        stft_loss_weight (float): STFT loss weight that multiplies the computed loss before summing up the total
            model loss. Defaults to 0.5.
        subband_stft_loss_weight (float):
            Subband STFT loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
        mse_G_loss_weight (float):
            MSE generator loss weight that multiplies the computed loss before summing up the total loss. faults to 2.5.
        hinge_G_loss_weight (float):
            Hinge generator loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
        feat_match_loss_weight (float):
            Feature matching loss weight that multiplies the computed loss before summing up the total loss. faults to 0.
        l1_spec_loss_weight (float):
            L1 spectrogram loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
        lr_gen (float):
            Generator model initial learning rate. Defaults to 0.0002.
        lr_disc (float):
            Discriminator model initial learning rate. Defaults to 0.0002.
        optimizer (torch.optim.Optimizer):
            Optimizer used for the training. Defaults to `AdamW`.
        optimizer_params (dict):
            Optimizer kwargs. Defaults to `{"betas": [0.8, 0.99], "weight_decay": 0.0}`
        lr_scheduler_gen (torch.optim.Scheduler):
            Learning rate scheduler for the generator. Defaults to `ExponentialLR`.
        lr_scheduler_gen_params (dict):
            Parameters for the generator learning rate scheduler. Defaults to `{"gamma": 0.5, "step_size": 200000, "last_epoch": -1}`.
        lr_scheduler_disc (torch.optim.Scheduler):
            Learning rate scheduler for the discriminator. Defaults to `ExponentialLR`.
        lr_scheduler_dict_params (dict):
            Parameters for the discriminator learning rate scheduler. Defaults to `{"gamma": 0.5, "step_size": 200000, "last_epoch": -1}`.
    """

    model: str = "parallel_wavegan"

    # Model specific params
    discriminator_model: str = "parallel_wavegan_discriminator"
    discriminator_model_params: dict = field(default_factory=lambda: {"num_layers": 10})
    generator_model: str = "parallel_wavegan_generator"
    generator_model_params: dict = field(
        default_factory=lambda: {"upsample_factors": [4, 4, 4, 4], "stacks": 3, "num_res_blocks": 30}
    )

    # Training - overrides
    batch_size: int = 6
    seq_len: int = 25600
    pad_short: int = 2000
    use_noise_augment: bool = False
    use_cache: bool = True
    steps_to_start_discriminator: int = 200000

    # LOSS PARAMETERS - overrides
    use_stft_loss: bool = True
    use_subband_stft_loss: bool = False
    use_mse_gan_loss: bool = True
    use_hinge_gan_loss: bool = False
    use_feat_match_loss: bool = False  # requires MelGAN Discriminators (MelGAN and HifiGAN)
    use_l1_spec_loss: bool = False

    stft_loss_params: dict = field(
        default_factory=lambda: {
            "n_ffts": [1024, 2048, 512],
            "hop_lengths": [120, 240, 50],
            "win_lengths": [600, 1200, 240],
        }
    )

    # loss weights - overrides
    stft_loss_weight: float = 0.5
    subband_stft_loss_weight: float = 0
    mse_G_loss_weight: float = 2.5
    hinge_G_loss_weight: float = 0
    feat_match_loss_weight: float = 0
    l1_spec_loss_weight: float = 0

    # optimizer overrides
    lr_gen: float = 0.0002  # Initial learning rate.
    lr_disc: float = 0.0002  # Initial learning rate.
    optimizer: str = "AdamW"
    optimizer_params: dict = field(default_factory=lambda: {"betas": [0.8, 0.99], "weight_decay": 0.0})
    lr_scheduler_gen: str = "StepLR"  # one of the schedulers from https:#pytorch.org/docs/stable/optim.html
    lr_scheduler_gen_params: dict = field(default_factory=lambda: {"gamma": 0.5, "step_size": 200000, "last_epoch": -1})
    lr_scheduler_disc: str = "StepLR"  # one of the schedulers from https:#pytorch.org/docs/stable/optim.html
    lr_scheduler_disc_params: dict = field(
        default_factory=lambda: {"gamma": 0.5, "step_size": 200000, "last_epoch": -1}
    )
    scheduler_after_epoch: bool = False


================================================
FILE: TTS/vocoder/configs/shared_configs.py
================================================
from dataclasses import dataclass, field

from TTS.config import BaseAudioConfig, BaseTrainingConfig


@dataclass
class BaseVocoderConfig(BaseTrainingConfig):
    """Shared parameters among all the vocoder models.
    Args:
        audio (BaseAudioConfig):
            Audio processor config instance. Defaultsto `BaseAudioConfig()`.
        use_noise_augment (bool):
            Augment the input audio with random noise. Defaults to False/
        eval_split_size (int):
            Number of instances used for evaluation. Defaults to 10.
        data_path (str):
            Root path of the training data. All the audio files found recursively from this root path are used for
            training. Defaults to `""`.
        feature_path (str):
            Root path to the precomputed feature files. Defaults to None.
        seq_len (int):
            Length of the waveform segments used for training. Defaults to 1000.
        pad_short (int):
            Extra padding for the waveforms shorter than `seq_len`. Defaults to 0.
        conv_path (int):
            Extra padding for the feature frames against convolution of the edge frames. Defaults to MISSING.
            Defaults to 0.
        use_cache (bool):
            enable / disable in memory caching of the computed features. If the RAM is not enough, if may cause OOM.
            Defaults to False.
        epochs (int):
            Number of training epochs to. Defaults to 10000.
        wd (float):
            Weight decay.
         optimizer (torch.optim.Optimizer):
            Optimizer used for the training. Defaults to `AdamW`.
        optimizer_params (dict):
            Optimizer kwargs. Defaults to `{"betas": [0.8, 0.99], "weight_decay": 0.0}`
    """

    audio: BaseAudioConfig = field(default_factory=BaseAudioConfig)
    # dataloading
    use_noise_augment: bool = False  # enable/disable random noise augmentation in spectrograms.
    eval_split_size: int = 10  # number of samples used for evaluation.
    # dataset
    data_path: str = ""  # root data path. It finds all wav files recursively from there.
    feature_path: str = None  # if you use precomputed features
    seq_len: int = 1000  # signal length used in training.
    pad_short: int = 0  # additional padding for short wavs
    conv_pad: int = 0  # additional padding against convolutions applied to spectrograms
    use_cache: bool = False  # use in memory cache to keep the computed features. This might cause OOM.
    # OPTIMIZER
    epochs: int = 10000  # total number of epochs to train.
    wd: float = 0.0  # Weight decay weight.
    optimizer: str = "AdamW"
    optimizer_params: dict = field(default_factory=lambda: {"betas": [0.8, 0.99], "weight_decay": 0.0})


@dataclass
class BaseGANVocoderConfig(BaseVocoderConfig):
    """Base config class used among all the GAN based vocoders.
    Args:
        use_stft_loss (bool):
            enable / disable the use of STFT loss. Defaults to True.
        use_subband_stft_loss (bool):
            enable / disable the use of Subband STFT loss. Defaults to True.
        use_mse_gan_loss (bool):
            enable / disable the use of Mean Squared Error based GAN loss. Defaults to True.
        use_hinge_gan_loss (bool):
            enable / disable the use of Hinge GAN loss. Defaults to True.
        use_feat_match_loss (bool):
            enable / disable feature matching loss. Defaults to True.
        use_l1_spec_loss (bool):
            enable / disable L1 spectrogram loss. Defaults to True.
        stft_loss_weight (float):
            Loss weight that multiplies the computed loss value. Defaults to 0.
        subband_stft_loss_weight (float):
            Loss weight that multiplies the computed loss value. Defaults to 0.
        mse_G_loss_weight (float):
            Loss weight that multiplies the computed loss value. Defaults to 1.
        hinge_G_loss_weight (float):
            Loss weight that multiplies the computed loss value. Defaults to 0.
        feat_match_loss_weight (float):
            Loss weight that multiplies the computed loss value. Defaults to 100.
        l1_spec_loss_weight (float):
            Loss weight that multiplies the computed loss value. Defaults to 45.
        stft_loss_params (dict):
            Parameters for the STFT loss. Defaults to `{"n_ffts": [1024, 2048, 512], "hop_lengths": [120, 240, 50], "win_lengths": [600, 1200, 240]}`.
        l1_spec_loss_params (dict):
            Parameters for the L1 spectrogram loss. Defaults to
            `{
                "use_mel": True,
                "sample_rate": 22050,
                "n_fft": 1024,
                "hop_length": 256,
                "win_length": 1024,
                "n_mels": 80,
                "mel_fmin": 0.0,
                "mel_fmax": None,
            }`
        target_loss (str):
            Target loss name that defines the quality of the model. Defaults to `G_avg_loss`.
        grad_clip (list):
            A list of gradient clipping theresholds for each optimizer. Any value less than 0 disables clipping.
            Defaults to [5, 5].
        lr_gen (float):
            Generator model initial learning rate. Defaults to 0.0002.
        lr_disc (float):
            Discriminator model initial learning rate. Defaults to 0.0002.
        lr_scheduler_gen (torch.optim.Scheduler):
            Learning rate scheduler for the generator. Defaults to `ExponentialLR`.
        lr_scheduler_gen_params (dict):
            Parameters for the generator learning rate scheduler. Defaults to `{"gamma": 0.999, "last_epoch": -1}`.
        lr_scheduler_disc (torch.optim.Scheduler):
            Learning rate scheduler for the discriminator. Defaults to `ExponentialLR`.
        lr_scheduler_disc_params (dict):
            Parameters for the discriminator learning rate scheduler. Defaults to `{"gamma": 0.999, "last_epoch": -1}`.
        scheduler_after_epoch (bool):
            Whether to update the learning rate schedulers after each epoch. Defaults to True.
        use_pqmf (bool):
            enable / disable PQMF for subband approximation at training. Defaults to False.
        steps_to_start_discriminator (int):
            Number of steps required to start training the discriminator. Defaults to 0.
        diff_samples_for_G_and_D (bool):
            enable / disable use of different training samples for the generator and the discriminator iterations.
            Enabling it results in slower iterations but faster convergance in some cases. Defaults to False.
    """

    model: str = "gan"

    # LOSS PARAMETERS
    use_stft_loss: bool = True
    use_subband_stft_loss: bool = True
    use_mse_gan_loss: bool = True
    use_hinge_gan_loss: bool = True
    use_feat_match_loss: bool = True  # requires MelGAN Discriminators (MelGAN and HifiGAN)
    use_l1_spec_loss: bool = True

    # loss weights
    stft_loss_weight: float = 0
    subband_stft_loss_weight: float = 0
    mse_G_loss_weight: float = 1
    hinge_G_loss_weight: float = 0
    feat_match_loss_weight: float = 100
    l1_spec_loss_weight: float = 45

    stft_loss_params: dict = field(
        default_factory=lambda: {
            "n_ffts": [1024, 2048, 512],
            "hop_lengths": [120, 240, 50],
            "win_lengths": [600, 1200, 240],
        }
    )

    l1_spec_loss_params: dict = field(
        default_factory=lambda: {
            "use_mel": True,
            "sample_rate": 22050,
            "n_fft": 1024,
            "hop_length": 256,
            "win_length": 1024,
            "n_mels": 80,
            "mel_fmin": 0.0,
            "mel_fmax": None,
        }
    )

    target_loss: str = "loss_0"  # loss value to pick the best model to save after each epoch

    # optimizer
    grad_clip: float = field(default_factory=lambda: [5, 5])
    lr_gen: float = 0.0002  # Initial learning rate.
    lr_disc: float = 0.0002  # Initial learning rate.
    lr_scheduler_gen: str = "ExponentialLR"  # one of the schedulers from https:#pytorch.org/docs/stable/optim.html
    lr_scheduler_gen_params: dict = field(default_factory=lambda: {"gamma": 0.999, "last_epoch": -1})
    lr_scheduler_disc: str = "ExponentialLR"  # one of the schedulers from https:#pytorch.org/docs/stable/optim.html
    lr_scheduler_disc_params: dict = field(default_factory=lambda: {"gamma": 0.999, "last_epoch": -1})
    scheduler_after_epoch: bool = True

    use_pqmf: bool = False  # enable/disable using pqmf for multi-band training. (Multi-band MelGAN)
    steps_to_start_discriminator = 0  # start training the discriminator after this number of steps.
    diff_samples_for_G_and_D: bool = False  # use different samples for G and D training steps.


================================================
FILE: TTS/vocoder/configs/univnet_config.py
================================================
from dataclasses import dataclass, field
from typing import Dict

from TTS.vocoder.configs.shared_configs import BaseGANVocoderConfig


@dataclass
class UnivnetConfig(BaseGANVocoderConfig):
    """Defines parameters for UnivNet vocoder.

    Example:

        >>> from TTS.vocoder.configs import UnivNetConfig
        >>> config = UnivNetConfig()

    Args:
        model (str):
            Model name used for selecting the right model at initialization. Defaults to `UnivNet`.
        discriminator_model (str): One of the discriminators from `TTS.vocoder.models.*_discriminator`. Defaults to
            'UnivNet_discriminator`.
        generator_model (str): One of the generators from TTS.vocoder.models.*`. Every other non-GAN vocoder model is
            considered as a generator too. Defaults to `UnivNet_generator`.
        generator_model_params (dict): Parameters of the generator model. Defaults to
            `
            {
                "use_mel": True,
                "sample_rate": 22050,
                "n_fft": 1024,
                "hop_length": 256,
                "win_length": 1024,
                "n_mels": 80,
                "mel_fmin": 0.0,
                "mel_fmax": None,
            }
            `
        batch_size (int):
            Batch size used at training. Larger values use more memory. Defaults to 32.
        seq_len (int):
            Audio segment length used at training. Larger values use more memory. Defaults to 8192.
        pad_short (int):
            Additional padding applied to the audio samples shorter than `seq_len`. Defaults to 0.
        use_noise_augment (bool):
            enable / disable random noise added to the input waveform. The noise is added after computing the
            features. Defaults to True.
        use_cache (bool):
            enable / disable in memory caching of the computed features. It can cause OOM error if the system RAM is
            not large enough. Defaults to True.
        use_stft_loss (bool):
            enable / disable use of STFT loss originally used by ParallelWaveGAN model. Defaults to True.
        use_subband_stft (bool):
            enable / disable use of subband loss computation originally used by MultiBandMelgan model. Defaults to True.
        use_mse_gan_loss (bool):
            enable / disable using Mean Squeare Error GAN loss. Defaults to True.
        use_hinge_gan_loss (bool):
            enable / disable using Hinge GAN loss. You should choose either Hinge or MSE loss for training GAN models.
            Defaults to False.
        use_feat_match_loss (bool):
            enable / disable using Feature Matching loss originally used by MelGAN model. Defaults to True.
        use_l1_spec_loss (bool):
            enable / disable using L1 spectrogram loss originally used by univnet model. Defaults to False.
        stft_loss_params (dict):
            STFT loss parameters. Default to
            `{
                "n_ffts": [1024, 2048, 512],
                "hop_lengths": [120, 240, 50],
                "win_lengths": [600, 1200, 240]
            }`
        l1_spec_loss_params (dict):
            L1 spectrogram loss parameters. Default to
            `{
                "use_mel": True,
                "sample_rate": 22050,
                "n_fft": 1024,
                "hop_length": 256,
                "win_length": 1024,
                "n_mels": 80,
                "mel_fmin": 0.0,
                "mel_fmax": None,
            }`
        stft_loss_weight (float): STFT loss weight that multiplies the computed loss before summing up the total
            model loss. Defaults to 0.5.
        subband_stft_loss_weight (float):
            Subband STFT loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
        mse_G_loss_weight (float):
            MSE generator loss weight that multiplies the computed loss before summing up the total loss. faults to 2.5.
        hinge_G_loss_weight (float):
            Hinge generator loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
        feat_match_loss_weight (float):
            Feature matching loss weight that multiplies the computed loss before summing up the total loss. faults to 108.
        l1_spec_loss_weight (float):
            L1 spectrogram loss weight that multiplies the computed loss before summing up the total loss. Defaults to 0.
    """

    model: str = "univnet"
    batch_size: int = 32
    # model specific params
    discriminator_model: str = "univnet_discriminator"
    generator_model: str = "univnet_generator"
    generator_model_params: Dict = field(
        default_factory=lambda: {
            "in_channels": 64,
            "out_channels": 1,
            "hidden_channels": 32,
            "cond_channels": 80,
            "upsample_factors": [8, 8, 4],
            "lvc_layers_each_block": 4,
            "lvc_kernel_size": 3,
            "kpnet_hidden_channels": 64,
            "kpnet_conv_size": 3,
            "dropout": 0.0,
        }
    )

    # LOSS PARAMETERS - overrides
    use_stft_loss: bool = True
    use_subband_stft_loss: bool = False
    use_mse_gan_loss: bool = True
    use_hinge_gan_loss: bool = False
    use_feat_match_loss: bool = False  # requires MelGAN Discriminators (MelGAN and univnet)
    use_l1_spec_loss: bool = False

    # loss weights - overrides
    stft_loss_weight: float = 2.5
    stft_loss_params: Dict = field(
        default_factory=lambda: {
            "n_ffts": [1024, 2048, 512],
            "hop_lengths": [120, 240, 50],
            "win_lengths": [600, 1200, 240],
        }
    )
    subband_stft_loss_weight: float = 0
    mse_G_loss_weight: float = 1
    hinge_G_loss_weight: float = 0
    feat_match_loss_weight: float = 0
    l1_spec_loss_weight: float = 0
    l1_spec_loss_params: Dict = field(
        default_factory=lambda: {
            "use_mel": True,
            "sample_rate": 22050,
            "n_fft": 1024,
            "hop_length": 256,
            "win_length": 1024,
            "n_mels": 80,
            "mel_fmin": 0.0,
            "mel_fmax": None,
        }
    )

    # optimizer parameters
    lr_gen: float = 1e-4  # Initial learning rate.
    lr_disc: float = 1e-4  # Initial learning rate.
    lr_scheduler_gen: str = None  # one of the schedulers from https:#pytorch.org/docs/stable/optim.html
    # lr_scheduler_gen_params: dict = field(default_factory=lambda: {"gamma": 0.999, "last_epoch": -1})
    lr_scheduler_disc: str = None  # one of the schedulers from https:#pytorch.org/docs/stable/optim.html
    # lr_scheduler_disc_params: dict = field(default_factory=lambda: {"gamma": 0.999, "last_epoch": -1})
    optimizer_params: Dict = field(default_factory=lambda: {"betas": [0.5, 0.9], "weight_decay": 0.0})
    steps_to_start_discriminator: int = 200000

    def __post_init__(self):
        super().__post_init__()
        self.generator_model_params["cond_channels"] = self.audio.num_mels


================================================
FILE: TTS/vocoder/configs/wavegrad_config.py
================================================
from dataclasses import dataclass, field

from TTS.vocoder.configs.shared_configs import BaseVocoderConfig
from TTS.vocoder.models.wavegrad import WavegradArgs


@dataclass
class WavegradConfig(BaseVocoderConfig):
    """Defines parameters for WaveGrad vocoder.
    Example:

        >>> from TTS.vocoder.configs import WavegradConfig
        >>> config = WavegradConfig()

    Args:
        model (str):
            Model name used for selecting the right model at initialization. Defaults to `wavegrad`.
        generator_model (str): One of the generators from TTS.vocoder.models.*`. Every other non-GAN vocoder model is
            considered as a generator too. Defaults to `wavegrad`.
        model_params (WavegradArgs): Model parameters. Check `WavegradArgs` for default values.
        target_loss (str):
            Target loss name that defines the quality of the model. Defaults to `avg_wavegrad_loss`.
        epochs (int):
            Number of epochs to traing the model. Defaults to 10000.
        batch_size (int):
            Batch size used at training. Larger values use more memory. Defaults to 96.
        seq_len (int):
            Audio segment length used at training. Larger values use more memory. Defaults to 6144.
        use_cache (bool):
            enable / disable in memory caching of the computed features. It can cause OOM error if the system RAM is
            not large enough. Defaults to True.
        mixed_precision (bool):
            enable / disable mixed precision training. Default is True.
        eval_split_size (int):
            Number of samples used for evalutaion. Defaults to 50.
        train_noise_schedule (dict):
            Training noise schedule. Defaults to
            `{"min_val": 1e-6, "max_val": 1e-2, "num_steps": 1000}`
        test_noise_schedule (dict):
            Inference noise schedule. For a better performance, you may need to use `bin/tune_wavegrad.py` to find a
            better schedule. Defaults to
            `
            {
                "min_val": 1e-6,
                "max_val": 1e-2,
                "num_steps": 50,
            }
            `
        grad_clip (float):
            Gradient clipping threshold. If <= 0.0, no clipping is applied. Defaults to 1.0
        lr (float):
            Initila leraning rate. Defaults to 1e-4.
        lr_scheduler (str):
            One of the learning rate schedulers from `torch.optim.scheduler.*`. Defaults to `MultiStepLR`.
        lr_scheduler_params (dict):
            kwargs for the scheduler. Defaults to `{"gamma": 0.5, "milestones": [100000, 200000, 300000, 400000, 500000, 600000]}`
    """

    model: str = "wavegrad"
    # Model specific params
    generator_model: str = "wavegrad"
    model_params: WavegradArgs = field(default_factory=WavegradArgs)
    target_loss: str = "loss"  # loss value to pick the best model to save after each epoch

    # Training - overrides
    epochs: int = 10000
    batch_size: int = 96
    seq_len: int = 6144
    use_cache: bool = True
    mixed_precision: bool = True
    eval_split_size: int = 50

    # NOISE SCHEDULE PARAMS
    train_noise_schedule: dict = field(default_factory=lambda: {"min_val": 1e-6, "max_val": 1e-2, "num_steps": 1000})

    test_noise_schedule: dict = field(
        default_factory=lambda: {  # inference noise schedule. Try TTS/bin/tune_wavegrad.py to find the optimal values.
            "min_val": 1e-6,
            "max_val": 1e-2,
            "num_steps": 50,
        }
    )

    # optimizer overrides
    grad_clip: float = 1.0
    lr: float = 1e-4  # Initial learning rate.
    lr_scheduler: str = "MultiStepLR"  # one of the schedulers from https:#pytorch.org/docs/stable/optim.html
    lr_scheduler_params: dict = field(
        default_factory=lambda: {"gamma": 0.5, "milestones": [100000, 200000, 300000, 400000, 500000, 600000]}
    )


================================================
FILE: TTS/vocoder/configs/wavernn_config.py
================================================
from dataclasses import dataclass, field

from TTS.vocoder.configs.shared_configs import BaseVocoderConfig
from TTS.vocoder.models.wavernn import WavernnArgs


@dataclass
class WavernnConfig(BaseVocoderConfig):
    """Defines parameters for Wavernn vocoder.
    Example:

        >>> from TTS.vocoder.configs import WavernnConfig
        >>> config = WavernnConfig()

    Args:
        model (str):
            Model name used for selecting the right model at initialization. Defaults to `wavernn`.
        mode (str):
            Output mode of the WaveRNN vocoder. `mold` for Mixture of Logistic Distribution, `gauss` for a single
            Gaussian Distribution and `bits` for quantized bits as the model's output.
        mulaw (bool):
            enable / disable the use of Mulaw quantization for training. Only applicable if `mode == 'bits'`. Defaults
            to `True`.
        generator_model (str):
            One of the generators from TTS.vocoder.models.*`. Every other non-GAN vocoder model is
            considered as a generator too. Defaults to `WaveRNN`.
        wavernn_model_params (dict):
            kwargs for the WaveRNN model. Defaults to
            `{
                "rnn_dims": 512,
                "fc_dims": 512,
                "compute_dims": 128,
                "res_out_dims": 128,
                "num_res_blocks": 10,
                "use_aux_net": True,
                "use_upsample_net": True,
                "upsample_factors": [4, 8, 8]
            }`
        batched (bool):
            enable / disable the batched inference. It speeds up the inference by splitting the input into segments and
            processing the segments in a batch. Then it merges the outputs with a certain overlap and smoothing. If
            you set it False, without CUDA, it is too slow to be practical. Defaults to True.
        target_samples (int):
            Size of the segments in batched mode. Defaults to 11000.
        overlap_sampels (int):
            Size of the overlap between consecutive segments. Defaults to 550.
        batch_size (int):
            Batch size used at training. Larger values use more memory. Defaults to 256.
        seq_len (int):
            Audio segment length used at training. Larger values use more memory. Defaults to 1280.

        use_noise_augment (bool):
            enable / disable random noise added to the input waveform. The noise is added after computing the
            features. Defaults to True.
        use_cache (bool):
            enable / disable in memory caching of the computed features. It can cause OOM error if the system RAM is
            not large enough. Defaults to True.
        mixed_precision (bool):
            enable / disable mixed precision training. Default is True.
        eval_split_size (int):
            Number of samples used for evalutaion. Defaults to 50.
        num_epochs_before_test (int):
            Number of epochs waited to run the next evalution. Since inference takes some time, it is better to
            wait some number of epochs not ot waste training time. Defaults to 10.
        grad_clip (float):
            Gradient clipping threshold. If <= 0.0, no clipping is applied. Defaults to 4.0
        lr (float):
            Initila leraning rate. Defaults to 1e-4.
        lr_scheduler (str):
            One of the learning rate schedulers from `torch.optim.scheduler.*`. Defaults to `MultiStepLR`.
        lr_scheduler_params (dict):
            kwargs for the scheduler. Defaults to `{"gamma": 0.5, "milestones": [200000, 400000, 600000]}`
    """

    model: str = "wavernn"

    # Model specific params
    model_args: WavernnArgs = field(default_factory=WavernnArgs)
    target_loss: str = "loss"

    # Inference
    batched: bool = True
    target_samples: int = 11000
    overlap_samples: int = 550

    # Training - overrides
    epochs: int = 10000
    batch_size: int = 256
    seq_len: int = 1280
    use_noise_augment: bool = False
    use_cache: bool = True
    mixed_precision: bool = True
    eval_split_size: int = 50
    num_epochs_before_test: int = (
        10  # number of epochs to wait until the next test run (synthesizing a full audio clip).
    )

    # optimizer overrides
    grad_clip: float = 4.0
    lr: float = 1e-4  # Initial learning rate.
    lr_scheduler: str = "MultiStepLR"  # one of the schedulers from https:#pytorch.org/docs/stable/optim.html
    lr_scheduler_params: dict = field(default_factory=lambda: {"gamma": 0.5, "milestones": [200000, 400000, 600000]})


================================================
FILE: TTS/vocoder/datasets/__init__.py
================================================
from typing import List

from coqpit import Coqpit
from torch.utils.data import Dataset

from TTS.utils.audio import AudioProcessor
from TTS.vocoder.datasets.gan_dataset import GANDataset
from TTS.vocoder.datasets.preprocess import load_wav_data, load_wav_feat_data
from TTS.vocoder.datasets.wavegrad_dataset import WaveGradDataset
from TTS.vocoder.datasets.wavernn_dataset import WaveRNNDataset


def setup_dataset(config: Coqpit, ap: AudioProcessor, is_eval: bool, data_items: List, verbose: bool) -> Dataset:
    if config.model.lower() in "gan":
        dataset = GANDataset(
            ap=ap,
            items=data_items,
            seq_len=config.seq_len,
            hop_len=ap.hop_length,
            pad_short=config.pad_short,
            conv_pad=config.conv_pad,
            return_pairs=config.diff_samples_for_G_and_D if "diff_samples_for_G_and_D" in config else False,
            is_training=not is_eval,
            return_segments=not is_eval,
            use_noise_augment=config.use_noise_augment,
            use_cache=config.use_cache,
            verbose=verbose,
        )
        dataset.shuffle_mapping()
    elif config.model.lower() == "wavegrad":
        dataset = WaveGradDataset(
            ap=ap,
            items=data_items,
            seq_len=config.seq_len,
            hop_len=ap.hop_length,
            pad_short=config.pad_short,
            conv_pad=config.conv_pad,
            is_training=not is_eval,
            return_segments=True,
            use_noise_augment=False,
            use_cache=config.use_cache,
            verbose=verbose,
        )
    elif config.model.lower() == "wavernn":
        dataset = WaveRNNDataset(
            ap=ap,
            items=data_items,
            seq_len=config.seq_len,
            hop_len=ap.hop_length,
            pad=config.model_params.pad,
            mode=config.model_params.mode,
            mulaw=config.model_params.mulaw,
            is_training=not is_eval,
            verbose=verbose,
        )
    else:
        raise ValueError(f" [!] Dataset for model {config.model.lower()} cannot be found.")
    return dataset


================================================
FILE: TTS/vocoder/datasets/gan_dataset.py
================================================
import glob
import os
import random
from multiprocessing import Manager

import numpy as np
import torch
from torch.utils.data import Dataset


class GANDataset(Dataset):
    """
    GAN Dataset searchs for all the wav files under root path
    and converts them to acoustic features on the fly and returns
    random segments of (audio, feature) couples.
    """

    def __init__(
        self,
        ap,
        items,
        seq_len,
        hop_len,
        pad_short,
        conv_pad=2,
        return_pairs=False,
        is_training=True,
        return_segments=True,
        use_noise_augment=False,
        use_cache=False,
        verbose=False,
    ):
        super().__init__()
        self.ap = ap
        self.item_list = items
        self.compute_feat = not isinstance(items[0], (tuple, list))
        self.seq_len = seq_len
        self.hop_len = hop_len
        self.pad_short = pad_short
        self.conv_pad = conv_pad
        self.return_pairs = return_pairs
        self.is_training = is_training
        self.return_segments = return_segments
        self.use_cache = use_cache
        self.use_noise_augment = use_noise_augment
        self.verbose = verbose

        assert seq_len % hop_len == 0, " [!] seq_len has to be a multiple of hop_len."
        self.feat_frame_len = seq_len // hop_len + (2 * conv_pad)

        # map G and D instances
        self.G_to_D_mappings = list(range(len(self.item_list)))
        self.shuffle_mapping()

        # cache acoustic features
        if use_cache:
            self.create_feature_cache()

    def create_feature_cache(self):
        self.manager = Manager()
        self.cache = self.manager.list()
        self.cache += [None for _ in range(len(self.item_list))]

    @staticmethod
    def find_wav_files(path):
        return glob.glob(os.path.join(path, "**", "*.wav"), recursive=True)

    def __len__(self):
        return len(self.item_list)

    def __getitem__(self, idx):
        """Return different items for Generator and Discriminator and
        cache acoustic features"""

        # set the seed differently for each worker
        if torch.utils.data.get_worker_info():
            random.seed(torch.utils.data.get_worker_info().seed)

        if self.return_segments:
            item1 = self.load_item(idx)
            if self.return_pairs:
                idx2 = self.G_to_D_mappings[idx]
                item2 = self.load_item(idx2)
                return item1, item2
            return item1
        item1 = self.load_item(idx)
        return item1

    def _pad_short_samples(self, audio, mel=None):
        """Pad samples shorter than the output sequence length"""
        if len(audio) < self.seq_len:
            audio = np.pad(audio, (0, self.seq_len - len(audio)), mode="constant", constant_values=0.0)

        if mel is not None and mel.shape[1] < self.feat_frame_len:
            pad_value = self.ap.melspectrogram(np.zeros([self.ap.win_length]))[:, 0]
            mel = np.pad(
                mel,
                ([0, 0], [0, self.feat_frame_len - mel.shape[1]]),
                mode="constant",
                constant_values=pad_value.mean(),
            )
        return audio, mel

    def shuffle_mapping(self):
        random.shuffle(self.G_to_D_mappings)

    def load_item(self, idx):
        """load (audio, feat) couple"""
        if self.compute_feat:
            # compute features from wav
            wavpath = self.item_list[idx]
            # print(wavpath)

            if self.use_cache and self.cache[idx] is not None:
                audio, mel = self.cache[idx]
            else:
                audio = self.ap.load_wav(wavpath)
                mel = self.ap.melspectrogram(audio)
                audio, mel = self._pad_short_samples(audio, mel)
        else:
            # load precomputed features
            wavpath, feat_path = self.item_list[idx]

            if self.use_cache and self.cache[idx] is not None:
                audio, mel = self.cache[idx]
            else:
                audio = self.ap.load_wav(wavpath)
                mel = np.load(feat_path)
                audio, mel = self._pad_short_samples(audio, mel)

        # correct the audio length wrt padding applied in stft
        audio = np.pad(audio, (0, self.hop_len), mode="edge")
        audio = audio[: mel.shape[-1] * self.hop_len]
        assert (
            mel.shape[-1] * self.hop_len == audio.shape[-1]
        ), f" [!] {mel.shape[-1] * self.hop_len} vs {audio.shape[-1]}"

        audio = torch.from_numpy(audio).float().unsqueeze(0)
        mel = torch.from_numpy(mel).float().squeeze(0)

        if self.return_segments:
            max_mel_start = mel.shape[1] - self.feat_frame_len
            mel_start = random.randint(0, max_mel_start)
            mel_end = mel_start + self.feat_frame_len
            mel = mel[:, mel_start:mel_end]

            audio_start = mel_start * self.hop_len
            audio = audio[:, audio_start : audio_start + self.seq_len]

        if self.use_noise_augment and self.is_training and self.return_segments:
            audio = audio + (1 / 32768) * torch.randn_like(audio)
        return (mel, audio)


================================================
FILE: TTS/vocoder/datasets/preprocess.py
================================================
import glob
import os
from pathlib import Path

import numpy as np
from coqpit import Coqpit
from tqdm import tqdm

from TTS.utils.audio import AudioProcessor


def preprocess_wav_files(out_path: str, config: Coqpit, ap: AudioProcessor):
    """Process wav and compute mel and quantized wave signal.
    It is mainly used by WaveRNN dataloader.

    Args:
        out_path (str): Parent folder path to save the files.
        config (Coqpit): Model config.
        ap (AudioProcessor): Audio processor.
    """
    os.makedirs(os.path.join(out_path, "quant"), exist_ok=True)
    os.makedirs(os.path.join(out_path, "mel"), exist_ok=True)
    wav_files = find_wav_files(config.data_path)
    for path in tqdm(wav_files):
        wav_name = Path(path).stem
        quant_path = os.path.join(out_path, "quant", wav_name + ".npy")
        mel_path = os.path.join(out_path, "mel", wav_name + ".npy")
        y = ap.load_wav(path)
        mel = ap.melspectrogram(y)
        np.save(mel_path, mel)
        if isinstance(config.mode, int):
            quant = ap.mulaw_encode(y, qc=config.mode) if config.model_args.mulaw else ap.quantize(y, bits=config.mode)
            np.save(quant_path, quant)


def find_wav_files(data_path, file_ext="wav"):
    wav_paths = glob.glob(os.path.join(data_path, "**", f"*.{file_ext}"), recursive=True)
    return wav_paths


def find_feat_files(data_path):
    feat_paths = glob.glob(os.path.join(data_path, "**", "*.npy"), recursive=True)
    return feat_paths


def load_wav_data(data_path, eval_split_size, file_ext="wav"):
    wav_paths = find_wav_files(data_path, file_ext=file_ext)
    assert len(wav_paths) > 0, f" [!] {data_path} is empty."
    np.random.seed(0)
    np.random.shuffle(wav_paths)
    return wav_paths[:eval_split_size], wav_paths[eval_split_size:]


def load_wav_feat_data(data_path, feat_path, eval_split_size):
    wav_paths = find_wav_files(data_path)
    feat_paths = find_feat_files(feat_path)

    wav_paths.sort(key=lambda x: Path(x).stem)
    feat_paths.sort(key=lambda x: Path(x).stem)

    assert len(wav_paths) == len(feat_paths), f" [!] {len(wav_paths)} vs {feat_paths}"
    for wav, feat in zip(wav_paths, feat_paths):
        wav_name = Path(wav).stem
        feat_name = Path(feat).stem
        assert wav_name == feat_name

    items = list(zip(wav_paths, feat_paths))
    np.random.seed(0)
    np.random.shuffle(items)
    return items[:eval_split_size], items[eval_split_size:]


================================================
FILE: TTS/vocoder/datasets/wavegrad_dataset.py
================================================
import glob
import os
import random
from multiprocessing import Manager
from typing import List, Tuple

import numpy as np
import torch
from torch.utils.data import Dataset


class WaveGradDataset(Dataset):
    """
    WaveGrad Dataset searchs for all the wav files under root path
    and converts them to acoustic features on the fly and returns
    random segments of (audio, feature) couples.
    """

    def __init__(
        self,
        ap,
        items,
        seq_len,
        hop_len,
        pad_short,
        conv_pad=2,
        is_training=True,
        return_segments=True,
        use_noise_augment=False,
        use_cache=False,
        verbose=False,
    ):
        super().__init__()
        self.ap = ap
        self.item_list = items
        self.seq_len = seq_len if return_segments else None
        self.hop_len = hop_len
        self.pad_short = pad_short
        self.conv_pad = conv_pad
        self.is_training = is_training
        self.return_segments = return_segments
        self.use_cache = use_cache
        self.use_noise_augment = use_noise_augment
        self.verbose = verbose

        if return_segments:
            assert seq_len % hop_len == 0, " [!] seq_len has to be a multiple of hop_len."
        self.feat_frame_len = seq_len // hop_len + (2 * conv_pad)

        # cache acoustic features
        if use_cache:
            self.create_feature_cache()

    def create_feature_cache(self):
        self.manager = Manager()
        self.cache = self.manager.list()
        self.cache += [None for _ in range(len(self.item_list))]

    @staticmethod
    def find_wav_files(path):
        return glob.glob(os.path.join(path, "**", "*.wav"), recursive=True)

    def __len__(self):
        return len(self.item_list)

    def __getitem__(self, idx):
        item = self.load_item(idx)
        return item

    def load_test_samples(self, num_samples: int) -> List[Tuple]:
        """Return test samples.

        Args:
            num_samples (int): Number of samples to return.

        Returns:
            List[Tuple]: melspectorgram and audio.

        Shapes:
            - melspectrogram (Tensor): :math:`[C, T]`
            - audio (Tensor): :math:`[T_audio]`
        """
        samples = []
        return_segments = self.return_segments
        self.return_segments = False
        for idx in range(num_samples):
            mel, audio = self.load_item(idx)
            samples.append([mel, audio])
        self.return_segments = return_segments
        return samples

    def load_item(self, idx):
        """load (audio, feat) couple"""
        # compute features from wav
        wavpath = self.item_list[idx]

        if self.use_cache and self.cache[idx] is not None:
            audio = self.cache[idx]
        else:
            audio = self.ap.load_wav(wavpath)

            if self.return_segments:
                # correct audio length wrt segment length
                if audio.shape[-1] < self.seq_len + self.pad_short:
                    audio = np.pad(
                        audio, (0, self.seq_len + self.pad_short - len(audio)), mode="constant", constant_values=0.0
                    )
                assert (
                    audio.shape[-1] >= self.seq_len + self.pad_short
                ), f"{audio.shape[-1]} vs {self.seq_len + self.pad_short}"

            # correct the audio length wrt hop length
            p = (audio.shape[-1] // self.hop_len + 1) * self.hop_len - audio.shape[-1]
            audio = np.pad(audio, (0, p), mode="constant", constant_values=0.0)

            if self.use_cache:
                self.cache[idx] = audio

        if self.return_segments:
            max_start = len(audio) - self.seq_len
            start = random.randint(0, max_start)
            end = start + self.seq_len
            audio = audio[start:end]

        if self.use_noise_augment and self.is_training and self.return_segments:
            audio = audio + (1 / 32768) * torch.randn_like(audio)

        mel = self.ap.melspectrogram(audio)
        mel = mel[..., :-1]  # ignore the padding

        audio = torch.from_numpy(audio).float()
        mel = torch.from_numpy(mel).float().squeeze(0)
        return (mel, audio)

    @staticmethod
    def collate_full_clips(batch):
        """This is used in tune_wavegrad.py.
        It pads sequences to the max length."""
        max_mel_length = max([b[0].shape[1] for b in batch]) if len(batch) > 1 else batch[0][0].shape[1]
        max_audio_length = max([b[1].shape[0] for b in batch]) if len(batch) > 1 else batch[0][1].shape[0]

        mels = torch.zeros([len(batch), batch[0][0].shape[0], max_mel_length])
        audios = torch.zeros([len(batch), max_audio_length])

        for idx, b in enumerate(batch):
            mel = b[0]
            audio = b[1]
            mels[idx, :, : mel.shape[1]] = mel
            audios[idx, : audio.shape[0]] = audio

        return mels, audios


================================================
FILE: TTS/vocoder/datasets/wavernn_dataset.py
================================================
import numpy as np
import torch
from torch.utils.data import Dataset


class WaveRNNDataset(Dataset):
    """
    WaveRNN Dataset searchs for all the wav files under root path
    and converts them to acoustic features on the fly.
    """

    def __init__(
        self, ap, items, seq_len, hop_len, pad, mode, mulaw, is_training=True, verbose=False, return_segments=True
    ):
        super().__init__()
        self.ap = ap
        self.compute_feat = not isinstance(items[0], (tuple, list))
        self.item_list = items
        self.seq_len = seq_len
        self.hop_len = hop_len
        self.mel_len = seq_len // hop_len
        self.pad = pad
        self.mode = mode
        self.mulaw = mulaw
        self.is_training = is_training
        self.verbose = verbose
        self.return_segments = return_segments

        assert self.seq_len % self.hop_len == 0

    def __len__(self):
        return len(self.item_list)

    def __getitem__(self, index):
        item = self.load_item(index)
        return item

    def load_test_samples(self, num_samples):
        samples = []
        return_segments = self.return_segments
        self.return_segments = False
        for idx in range(num_samples):
            mel, audio, _ = self.load_item(idx)
            samples.append([mel, audio])
        self.return_segments = return_segments
        return samples

    def load_item(self, index):
        """
        load (audio, feat) couple if feature_path is set
        else compute it on the fly
        """
        if self.compute_feat:
            wavpath = self.item_list[index]
            audio = self.ap.load_wav(wavpath)
            if self.return_segments:
                min_audio_len = 2 * self.seq_len + (2 * self.pad * self.hop_len)
            else:
                min_audio_len = audio.shape[0] + (2 * self.pad * self.hop_len)
            if audio.shape[0] < min_audio_len:
                print(" [!] Instance is too short! : {}".format(wavpath))
                audio = np.pad(audio, [0, min_audio_len - audio.shape[0] + self.hop_len])
            mel = self.ap.melspectrogram(audio)

            if self.mode in ["gauss", "mold"]:
                x_input = audio
            elif isinstance(self.mode, int):
                x_input = (
                    self.ap.mulaw_encode(audio, qc=self.mode) if self.mulaw else self.ap.quantize(audio, bits=self.mode)
                )
            else:
                raise RuntimeError("Unknown dataset mode - ", self.mode)

        else:
            wavpath, feat_path = self.item_list[index]
            mel = np.load(feat_path.replace("/quant/", "/mel/"))

            if mel.shape[-1] < self.mel_len + 2 * self.pad:
                print(" [!] Instance is too short! : {}".format(wavpath))
                self.item_list[index] = self.item_list[index + 1]
                feat_path = self.item_list[index]
                mel = np.load(feat_path.replace("/quant/", "/mel/"))
            if self.mode in ["gauss", "mold"]:
                x_input = self.ap.load_wav(wavpath)
            elif isinstance(self.mode, int):
                x_input = np.load(feat_path.replace("/mel/", "/quant/"))
            else:
                raise RuntimeError("Unknown dataset mode - ", self.mode)

        return mel, x_input, wavpath

    def collate(self, batch):
        mel_win = self.seq_len // self.hop_len + 2 * self.pad
        max_offsets = [x[0].shape[-1] - (mel_win + 2 * self.pad) for x in batch]

        mel_offsets = [np.random.randint(0, offset) for offset in max_offsets]
        sig_offsets = [(offset + self.pad) * self.hop_len for offset in mel_offsets]

        mels = [x[0][:, mel_offsets[i] : mel_offsets[i] + mel_win] for i, x in enumerate(batch)]

        coarse = [x[1][sig_offsets[i] : sig_offsets[i] + self.seq_len + 1] for i, x in enumerate(batch)]

        mels = np.stack(mels).astype(np.float32)
        if self.mode in ["gauss", "mold"]:
            coarse = np.stack(coarse).astype(np.float32)
            coarse = torch.FloatTensor(coarse)
            x_input = coarse[:, : self.seq_len]
        elif isinstance(self.mode, int):
            coarse = np.stack(coarse).astype(np.int64)
            coarse = torch.LongTensor(coarse)
            x_input = 2 * coarse[:, : self.seq_len].float() / (2**self.mode - 1.0) - 1.0
        y_coarse = coarse[:, 1:]
        mels = torch.FloatTensor(mels)
        return x_input, mels, y_coarse


================================================
FILE: TTS/vocoder/layers/__init__.py
================================================


================================================
FILE: TTS/vocoder/layers/hifigan.py
================================================
from torch import nn


# pylint: disable=dangerous-default-value
class ResStack(nn.Module):
    def __init__(self, kernel, channel, padding, dilations=[1, 3, 5]):
        super().__init__()
        resstack = []
        for dilation in dilations:
            resstack += [
                nn.LeakyReLU(0.2),
                nn.ReflectionPad1d(dilation),
                nn.utils.weight_norm(nn.Conv1d(channel, channel, kernel_size=kernel, dilation=dilation)),
                nn.LeakyReLU(0.2),
                nn.ReflectionPad1d(padding),
                nn.utils.weight_norm(nn.Conv1d(channel, channel, kernel_size=1)),
            ]
        self.resstack = nn.Sequential(*resstack)

        self.shortcut = nn.utils.weight_norm(nn.Conv1d(channel, channel, kernel_size=1))

    def forward(self, x):
        x1 = self.shortcut(x)
        x2 = self.resstack(x)
        return x1 + x2

    def remove_weight_norm(self):
        nn.utils.remove_weight_norm(self.shortcut)
        nn.utils.remove_weight_norm(self.resstack[2])
        nn.utils.remove_weight_norm(self.resstack[5])
        nn.utils.remove_weight_norm(self.resstack[8])
        nn.utils.remove_weight_norm(self.resstack[11])
        nn.utils.remove_weight_norm(self.resstack[14])
        nn.utils.remove_weight_norm(self.resstack[17])


class MRF(nn.Module):
    def __init__(self, kernels, channel, dilations=[1, 3, 5]):  # # pylint: disable=dangerous-default-value
        super().__init__()
        self.resblock1 = ResStack(kernels[0], channel, 0, dilations)
        self.resblock2 = ResStack(kernels[1], channel, 6, dilations)
        self.resblock3 = ResStack(kernels[2], channel, 12, dilations)

    def forward(self, x):
        x1 = self.resblock1(x)
        x2 = self.resblock2(x)
        x3 = self.resblock3(x)
        return x1 + x2 + x3

    def remove_weight_norm(self):
        self.resblock1.remove_weight_norm()
        self.resblock2.remove_weight_norm()
        self.resblock3.remove_weight_norm()


================================================
FILE: TTS/vocoder/layers/losses.py
================================================
from typing import Dict, Union

import torch
from torch import nn
from torch.nn import functional as F

from TTS.utils.audio.torch_transforms import TorchSTFT
from TTS.vocoder.utils.distribution import discretized_mix_logistic_loss, gaussian_loss

#################################
# GENERATOR LOSSES
#################################


class STFTLoss(nn.Module):
    """STFT loss. Input generate and real waveforms are converted
    to spectrograms compared with L1 and Spectral convergence losses.
    It is from ParallelWaveGAN paper https://arxiv.org/pdf/1910.11480.pdf"""

    def __init__(self, n_fft, hop_length, win_length):
        super().__init__()
        self.n_fft = n_fft
        self.hop_length = hop_length
        self.win_length = win_length
        self.stft = TorchSTFT(n_fft, hop_length, win_length)

    def forward(self, y_hat, y):
        y_hat_M = self.stft(y_hat)
        y_M = self.stft(y)
        # magnitude loss
        loss_mag = F.l1_loss(torch.log(y_M), torch.log(y_hat_M))
        # spectral convergence loss
        loss_sc = torch.norm(y_M - y_hat_M, p="fro") / torch.norm(y_M, p="fro")
        return loss_mag, loss_sc


class MultiScaleSTFTLoss(torch.nn.Module):
    """Multi-scale STFT loss. Input generate and real waveforms are converted
    to spectrograms compared with L1 and Spectral convergence losses.
    It is from ParallelWaveGAN paper https://arxiv.org/pdf/1910.11480.pdf"""

    def __init__(self, n_ffts=(1024, 2048, 512), hop_lengths=(120, 240, 50), win_lengths=(600, 1200, 240)):
        super().__init__()
        self.loss_funcs = torch.nn.ModuleList()
        for n_fft, hop_length, win_length in zip(n_ffts, hop_lengths, win_lengths):
            self.loss_funcs.append(STFTLoss(n_fft, hop_length, win_length))

    def forward(self, y_hat, y):
        N = len(self.loss_funcs)
        loss_sc = 0
        loss_mag = 0
        for f in self.loss_funcs:
            lm, lsc = f(y_hat, y)
            loss_mag += lm
            loss_sc += lsc
        loss_sc /= N
        loss_mag /= N
        return loss_mag, loss_sc


class L1SpecLoss(nn.Module):
    """L1 Loss over Spectrograms as described in HiFiGAN paper https://arxiv.org/pdf/2010.05646.pdf"""

    def __init__(
        self, sample_rate, n_fft, hop_length, win_length, mel_fmin=None, mel_fmax=None, n_mels=None, use_mel=True
    ):
        super().__init__()
        self.use_mel = use_mel
        self.stft = TorchSTFT(
            n_fft,
            hop_length,
            win_length,
            sample_rate=sample_rate,
            mel_fmin=mel_fmin,
            mel_fmax=mel_fmax,
            n_mels=n_mels,
            use_mel=use_mel,
        )

    def forward(self, y_hat, y):
        y_hat_M = self.stft(y_hat)
        y_M = self.stft(y)
        # magnitude loss
        loss_mag = F.l1_loss(torch.log(y_M), torch.log(y_hat_M))
        return loss_mag


class MultiScaleSubbandSTFTLoss(MultiScaleSTFTLoss):
    """Multiscale STFT loss for multi band model outputs.
    From MultiBand-MelGAN paper https://arxiv.org/abs/2005.05106"""

    # pylint: disable=no-self-use
    def forward(self, y_hat, y):
        y_hat = y_hat.view(-1, 1, y_hat.shape[2])
        y = y.view(-1, 1, y.shape[2])
        return super().forward(y_hat.squeeze(1), y.squeeze(1))


class MSEGLoss(nn.Module):
    """Mean Squared Generator Loss"""

    # pylint: disable=no-self-use
    def forward(self, score_real):
        loss_fake = F.mse_loss(score_real, score_real.new_ones(score_real.shape))
        return loss_fake


class HingeGLoss(nn.Module):
    """Hinge Discriminator Loss"""

    # pylint: disable=no-self-use
    def forward(self, score_real):
        # TODO: this might be wrong
        loss_fake = torch.mean(F.relu(1.0 - score_real))
        return loss_fake


##################################
# DISCRIMINATOR LOSSES
##################################


class MSEDLoss(nn.Module):
    """Mean Squared Discriminator Loss"""

    def __init__(
        self,
    ):
        super().__init__()
        self.loss_func = nn.MSELoss()

    # pylint: disable=no-self-use
    def forward(self, score_fake, score_real):
        loss_real = self.loss_func(score_real, score_real.new_ones(score_real.shape))
        loss_fake = self.loss_func(score_fake, score_fake.new_zeros(score_fake.shape))
        loss_d = loss_real + loss_fake
        return loss_d, loss_real, loss_fake


class HingeDLoss(nn.Module):
    """Hinge Discriminator Loss"""

    # pylint: disable=no-self-use
    def forward(self, score_fake, score_real):
        loss_real = torch.mean(F.relu(1.0 - score_real))
        loss_fake = torch.mean(F.relu(1.0 + score_fake))
        loss_d = loss_real + loss_fake
        return loss_d, loss_real, loss_fake


class MelganFeatureLoss(nn.Module):
    def __init__(
        self,
    ):
        super().__init__()
        self.loss_func = nn.L1Loss()

    # pylint: disable=no-self-use
    def forward(self, fake_feats, real_feats):
        loss_feats = 0
        num_feats = 0
        for idx, _ in enumerate(fake_feats):
            for fake_feat, real_feat in zip(fake_feats[idx], real_feats[idx]):
                loss_feats += self.loss_func(fake_feat, real_feat)
                num_feats += 1
        loss_feats = loss_feats / num_feats
        return loss_feats


#####################################
# LOSS WRAPPERS
#####################################


def _apply_G_adv_loss(scores_fake, loss_func):
    """Compute G adversarial loss function
    and normalize values"""
    adv_loss = 0
    if isinstance(scores_fake, list):
        for score_fake in scores_fake:
            fake_loss = loss_func(score_fake)
            adv_loss += fake_loss
        adv_loss /= len(scores_fake)
    else:
        fake_loss = loss_func(scores_fake)
        adv_loss = fake_loss
    return adv_loss


def _apply_D_loss(scores_fake, scores_real, loss_func):
    """Compute D loss func and normalize loss values"""
    loss = 0
    real_loss = 0
    fake_loss = 0
    if isinstance(scores_fake, list):
        # multi-scale loss
        for score_fake, score_real in zip(scores_fake, scores_real):
            total_loss, real_loss, fake_loss = loss_func(score_fake=score_fake, score_real=score_real)
            loss += total_loss
            real_loss += real_loss
            fake_loss += fake_loss
        # normalize loss values with number of scales (discriminators)
        loss /= len(scores_fake)
        real_loss /= len(scores_real)
        fake_loss /= len(scores_fake)
    else:
        # single scale loss
        total_loss, real_loss, fake_loss = loss_func(scores_fake, scores_real)
        loss = total_loss
    return loss, real_loss, fake_loss


##################################
# MODEL LOSSES
##################################


class GeneratorLoss(nn.Module):
    """Generator Loss Wrapper. Based on model configuration it sets a right set of loss functions and computes
    losses. It allows to experiment with different combinations of loss functions with different models by just
    changing configurations.

    Args:
        C (AttrDict): model configuration.
    """

    def __init__(self, C):
        super().__init__()
        assert not (
            C.use_mse_gan_loss and C.use_hinge_gan_loss
        ), " [!] Cannot use HingeGANLoss and MSEGANLoss together."

        self.use_stft_loss = C.use_stft_loss if "use_stft_loss" in C else False
        self.use_subband_stft_loss = C.use_subband_stft_loss if "use_subband_stft_loss" in C else False
        self.use_mse_gan_loss = C.use_mse_gan_loss if "use_mse_gan_loss" in C else False
        self.use_hinge_gan_loss = C.use_hinge_gan_loss if "use_hinge_gan_loss" in C else False
        self.use_feat_match_loss = C.use_feat_match_loss if "use_feat_match_loss" in C else False
        self.use_l1_spec_loss = C.use_l1_spec_loss if "use_l1_spec_loss" in C else False

        self.stft_loss_weight = C.stft_loss_weight if "stft_loss_weight" in C else 0.0
        self.subband_stft_loss_weight = C.subband_stft_loss_weight if "subband_stft_loss_weight" in C else 0.0
        self.mse_gan_loss_weight = C.mse_G_loss_weight if "mse_G_loss_weight" in C else 0.0
        self.hinge_gan_loss_weight = C.hinge_G_loss_weight if "hinde_G_loss_weight" in C else 0.0
        self.feat_match_loss_weight = C.feat_match_loss_weight if "feat_match_loss_weight" in C else 0.0
        self.l1_spec_loss_weight = C.l1_spec_loss_weight if "l1_spec_loss_weight" in C else 0.0

        if C.use_stft_loss:
            self.stft_loss = MultiScaleSTFTLoss(**C.stft_loss_params)
        if C.use_subband_stft_loss:
            self.subband_stft_loss = MultiScaleSubbandSTFTLoss(**C.subband_stft_loss_params)
        if C.use_mse_gan_loss:
            self.mse_loss = MSEGLoss()
        if C.use_hinge_gan_loss:
            self.hinge_loss = HingeGLoss()
        if C.use_feat_match_loss:
            self.feat_match_loss = MelganFeatureLoss()
        if C.use_l1_spec_loss:
            assert C.audio["sample_rate"] == C.l1_spec_loss_params["sample_rate"]
            self.l1_spec_loss = L1SpecLoss(**C.l1_spec_loss_params)

    def forward(
        self, y_hat=None, y=None, scores_fake=None, feats_fake=None, feats_real=None, y_hat_sub=None, y_sub=None
    ):
        gen_loss = 0
        adv_loss = 0
        return_dict = {}

        # STFT Loss
        if self.use_stft_loss:
            stft_loss_mg, stft_loss_sc = self.stft_loss(y_hat[:, :, : y.size(2)].squeeze(1), y.squeeze(1))
            return_dict["G_stft_loss_mg"] = stft_loss_mg
            return_dict["G_stft_loss_sc"] = stft_loss_sc
            gen_loss = gen_loss + self.stft_loss_weight * (stft_loss_mg + stft_loss_sc)

        # L1 Spec loss
        if self.use_l1_spec_loss:
            l1_spec_loss = self.l1_spec_loss(y_hat, y)
            return_dict["G_l1_spec_loss"] = l1_spec_loss
            gen_loss = gen_loss + self.l1_spec_loss_weight * l1_spec_loss

        # subband STFT Loss
        if self.use_subband_stft_loss:
            subband_stft_loss_mg, subband_stft_loss_sc = self.subband_stft_loss(y_hat_sub, y_sub)
            return_dict["G_subband_stft_loss_mg"] = subband_stft_loss_mg
            return_dict["G_subband_stft_loss_sc"] = subband_stft_loss_sc
            gen_loss = gen_loss + self.subband_stft_loss_weight * (subband_stft_loss_mg + subband_stft_loss_sc)

        # multiscale MSE adversarial loss
        if self.use_mse_gan_loss and scores_fake is not None:
            mse_fake_loss = _apply_G_adv_loss(scores_fake, self.mse_loss)
            return_dict["G_mse_fake_loss"] = mse_fake_loss
            adv_loss = adv_loss + self.mse_gan_loss_weight * mse_fake_loss

        # multiscale Hinge adversarial loss
        if self.use_hinge_gan_loss and not scores_fake is not None:
            hinge_fake_loss = _apply_G_adv_loss(scores_fake, self.hinge_loss)
            return_dict["G_hinge_fake_loss"] = hinge_fake_loss
            adv_loss = adv_loss + self.hinge_gan_loss_weight * hinge_fake_loss

        # Feature Matching Loss
        if self.use_feat_match_loss and not feats_fake is None:
            feat_match_loss = self.feat_match_loss(feats_fake, feats_real)
            return_dict["G_feat_match_loss"] = feat_match_loss
            adv_loss = adv_loss + self.feat_match_loss_weight * feat_match_loss
        return_dict["loss"] = gen_loss + adv_loss
        return_dict["G_gen_loss"] = gen_loss
        return_dict["G_adv_loss"] = adv_loss
        return return_dict


class DiscriminatorLoss(nn.Module):
    """Like ```GeneratorLoss```"""

    def __init__(self, C):
        super().__init__()
        assert not (
            C.use_mse_gan_loss and C.use_hinge_gan_loss
        ), " [!] Cannot use HingeGANLoss and MSEGANLoss together."

        self.use_mse_gan_loss = C.use_mse_gan_loss
        self.use_hinge_gan_loss = C.use_hinge_gan_loss

        if C.use_mse_gan_loss:
            self.mse_loss = MSEDLoss()
        if C.use_hinge_gan_loss:
            self.hinge_loss = HingeDLoss()

    def forward(self, scores_fake, scores_real):
        loss = 0
        return_dict = {}

        if self.use_mse_gan_loss:
            mse_D_loss, mse_D_real_loss, mse_D_fake_loss = _apply_D_loss(
                scores_fake=scores_fake, scores_real=scores_real, loss_func=self.mse_loss
            )
            return_dict["D_mse_gan_loss"] = mse_D_loss
            return_dict["D_mse_gan_real_loss"] = mse_D_real_loss
            return_dict["D_mse_gan_fake_loss"] = mse_D_fake_loss
            loss += mse_D_loss

        if self.use_hinge_gan_loss:
            hinge_D_loss, hinge_D_real_loss, hinge_D_fake_loss = _apply_D_loss(
                scores_fake=scores_fake, scores_real=scores_real, loss_func=self.hinge_loss
            )
            return_dict["D_hinge_gan_loss"] = hinge_D_loss
            return_dict["D_hinge_gan_real_loss"] = hinge_D_real_loss
            return_dict["D_hinge_gan_fake_loss"] = hinge_D_fake_loss
            loss += hinge_D_loss

        return_dict["loss"] = loss
        return return_dict


class WaveRNNLoss(nn.Module):
    def __init__(self, wave_rnn_mode: Union[str, int]):
        super().__init__()
        if wave_rnn_mode == "mold":
            self.loss_func = discretized_mix_logistic_loss
        elif wave_rnn_mode == "gauss":
            self.loss_func = gaussian_loss
        elif isinstance(wave_rnn_mode, int):
            self.loss_func = torch.nn.CrossEntropyLoss()
        else:
            raise ValueError(" [!] Unknown mode for Wavernn.")

    def forward(self, y_hat, y) -> Dict:
        loss = self.loss_func(y_hat, y)
        return {"loss": loss}


================================================
FILE: TTS/vocoder/layers/lvc_block.py
================================================
import torch
import torch.nn.functional as F


class KernelPredictor(torch.nn.Module):
    """Kernel predictor for the location-variable convolutions"""

    def __init__(  # pylint: disable=dangerous-default-value
        self,
        cond_channels,
        conv_in_channels,
        conv_out_channels,
        conv_layers,
        conv_kernel_size=3,
        kpnet_hidden_channels=64,
        kpnet_conv_size=3,
        kpnet_dropout=0.0,
        kpnet_nonlinear_activation="LeakyReLU",
        kpnet_nonlinear_activation_params={"negative_slope": 0.1},
    ):
        """
        Args:
            cond_channels (int): number of channel for the conditioning sequence,
            conv_in_channels (int): number of channel for the input sequence,
            conv_out_channels (int): number of channel for the output sequence,
            conv_layers (int):
            kpnet_
        """
        super().__init__()

        self.conv_in_channels = conv_in_channels
        self.conv_out_channels = conv_out_channels
        self.conv_kernel_size = conv_kernel_size
        self.conv_layers = conv_layers

        l_w = conv_in_channels * conv_out_channels * conv_kernel_size * conv_layers
        l_b = conv_out_channels * conv_layers

        padding = (kpnet_conv_size - 1) // 2
        self.input_conv = torch.nn.Sequential(
            torch.nn.Conv1d(cond_channels, kpnet_hidden_channels, 5, padding=(5 - 1) // 2, bias=True),
            getattr(torch.nn, kpnet_nonlinear_activation)(**kpnet_nonlinear_activation_params),
        )

        self.residual_conv = torch.nn.Sequential(
            torch.nn.Dropout(kpnet_dropout),
            torch.nn.Conv1d(kpnet_hidden_channels, kpnet_hidden_channels, kpnet_conv_size, padding=padding, bias=True),
            getattr(torch.nn, kpnet_nonlinear_activation)(**kpnet_nonlinear_activation_params),
            torch.nn.Conv1d(kpnet_hidden_channels, kpnet_hidden_channels, kpnet_conv_size, padding=padding, bias=True),
            getattr(torch.nn, kpnet_nonlinear_activation)(**kpnet_nonlinear_activation_params),
            torch.nn.Dropout(kpnet_dropout),
            torch.nn.Conv1d(kpnet_hidden_channels, kpnet_hidden_channels, kpnet_conv_size, padding=padding, bias=True),
            getattr(torch.nn, kpnet_nonlinear_activation)(**kpnet_nonlinear_activation_params),
            torch.nn.Conv1d(kpnet_hidden_channels, kpnet_hidden_channels, kpnet_conv_size, padding=padding, bias=True),
            getattr(torch.nn, kpnet_nonlinear_activation)(**kpnet_nonlinear_activation_params),
            torch.nn.Dropout(kpnet_dropout),
            torch.nn.Conv1d(kpnet_hidden_channels, kpnet_hidden_channels, kpnet_conv_size, padding=padding, bias=True),
            getattr(torch.nn, kpnet_nonlinear_activation)(**kpnet_nonlinear_activation_params),
            torch.nn.Conv1d(kpnet_hidden_channels, kpnet_hidden_channels, kpnet_conv_size, padding=padding, bias=True),
            getattr(torch.nn, kpnet_nonlinear_activation)(**kpnet_nonlinear_activation_params),
        )

        self.kernel_conv = torch.nn.Conv1d(kpnet_hidden_channels, l_w, kpnet_conv_size, padding=padding, bias=True)
        self.bias_conv = torch.nn.Conv1d(kpnet_hidden_channels, l_b, kpnet_conv_size, padding=padding, bias=True)

    def forward(self, c):
        """
        Args:
            c (Tensor): the conditioning sequence (batch, cond_channels, cond_length)
        Returns:
        """
        batch, _, cond_length = c.shape

        c = self.input_conv(c)
        c = c + self.residual_conv(c)
        k = self.kernel_conv(c)
        b = self.bias_conv(c)

        kernels = k.contiguous().view(
            batch, self.conv_layers, self.conv_in_channels, self.conv_out_channels, self.conv_kernel_size, cond_length
        )
        bias = b.contiguous().view(batch, self.conv_layers, self.conv_out_channels, cond_length)
        return kernels, bias


class LVCBlock(torch.nn.Module):
    """the location-variable convolutions"""

    def __init__(
        self,
        in_channels,
        cond_channels,
        upsample_ratio,
        conv_layers=4,
        conv_kernel_size=3,
        cond_hop_length=256,
        kpnet_hidden_channels=64,
        kpnet_conv_size=3,
        kpnet_dropout=0.0,
    ):
        super().__init__()

        self.cond_hop_length = cond_hop_length
        self.conv_layers = conv_layers
        self.conv_kernel_size = conv_kernel_size
        self.convs = torch.nn.ModuleList()

        self.upsample = torch.nn.ConvTranspose1d(
            in_channels,
            in_channels,
            kernel_size=upsample_ratio * 2,
            stride=upsample_ratio,
            padding=upsample_ratio // 2 + upsample_ratio % 2,
            output_padding=upsample_ratio % 2,
        )

        self.kernel_predictor = KernelPredictor(
            cond_channels=cond_channels,
            conv_in_channels=in_channels,
            conv_out_channels=2 * in_channels,
            conv_layers=conv_layers,
            conv_kernel_size=conv_kernel_size,
            kpnet_hidden_channels=kpnet_hidden_channels,
            kpnet_conv_size=kpnet_conv_size,
            kpnet_dropout=kpnet_dropout,
        )

        for i in range(conv_layers):
            padding = (3**i) * int((conv_kernel_size - 1) / 2)
            conv = torch.nn.Conv1d(
                in_channels, in_channels, kernel_size=conv_kernel_size, padding=padding, dilation=3**i
            )

            self.convs.append(conv)

    def forward(self, x, c):
        """forward propagation of the location-variable convolutions.
        Args:
            x (Tensor): the input sequence (batch, in_channels, in_length)
            c (Tensor): the conditioning sequence (batch, cond_channels, cond_length)

        Returns:
            Tensor: the output sequence (batch, in_channels, in_length)
        """
        in_channels = x.shape[1]
        kernels, bias = self.kernel_predictor(c)

        x = F.leaky_relu(x, 0.2)
        x = self.upsample(x)

        for i in range(self.conv_layers):
            y = F.leaky_relu(x, 0.2)
            y = self.convs[i](y)
            y = F.leaky_relu(y, 0.2)

            k = kernels[:, i, :, :, :, :]
            b = bias[:, i, :, :]
            y = self.location_variable_convolution(y, k, b, 1, self.cond_hop_length)
            x = x + torch.sigmoid(y[:, :in_channels, :]) * torch.tanh(y[:, in_channels:, :])
        return x

    @staticmethod
    def location_variable_convolution(x, kernel, bias, dilation, hop_size):
        """perform location-variable convolution operation on the input sequence (x) using the local convolution kernl.
        Time: 414 μs ± 309 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each), test on NVIDIA V100.
        Args:
            x (Tensor): the input sequence (batch, in_channels, in_length).
            kernel (Tensor): the local convolution kernel (batch, in_channel, out_channels, kernel_size, kernel_length)
            bias (Tensor): the bias for the local convolution (batch, out_channels, kernel_length)
            dilation (int): the dilation of convolution.
            hop_size (int): the hop_size of the conditioning sequence.
        Returns:
            (Tensor): the output sequence after performing local convolution. (batch, out_channels, in_length).
        """
        batch, _, in_length = x.shape
        batch, _, out_channels, kernel_size, kernel_length = kernel.shape

        assert in_length == (
            kernel_length * hop_size
        ), f"length of (x, kernel) is not matched, {in_length} vs {kernel_length * hop_size}"

        padding = dilation * int((kernel_size - 1) / 2)
        x = F.pad(x, (padding, padding), "constant", 0)  # (batch, in_channels, in_length + 2*padding)
        x = x.unfold(2, hop_size + 2 * padding, hop_size)  # (batch, in_channels, kernel_length, hop_size + 2*padding)

        if hop_size < dilation:
            x = F.pad(x, (0, dilation), "constant", 0)
        x = x.unfold(
            3, dilation, dilation
        )  # (batch, in_channels, kernel_length, (hop_size + 2*padding)/dilation, dilation)
        x = x[:, :, :, :, :hop_size]
        x = x.transpose(3, 4)  # (batch, in_channels, kernel_length, dilation, (hop_size + 2*padding)/dilation)
        x = x.unfold(4, kernel_size, 1)  # (batch, in_channels, kernel_length, dilation, _, kernel_size)

        o = torch.einsum("bildsk,biokl->bolsd", x, kernel)
        o = o + bias.unsqueeze(-1).unsqueeze(-1)
        o = o.contiguous().view(batch, out_channels, -1)
        return o


================================================
FILE: TTS/vocoder/layers/melgan.py
================================================
from torch import nn
from torch.nn.utils import weight_norm


class ResidualStack(nn.Module):
    def __init__(self, channels, num_res_blocks, kernel_size):
        super().__init__()

        assert (kernel_size - 1) % 2 == 0, " [!] kernel_size has to be odd."
        base_padding = (kernel_size - 1) // 2

        self.blocks = nn.ModuleList()
        for idx in range(num_res_blocks):
            layer_kernel_size = kernel_size
            layer_dilation = layer_kernel_size**idx
            layer_padding = base_padding * layer_dilation
            self.blocks += [
                nn.Sequential(
                    nn.LeakyReLU(0.2),
                    nn.ReflectionPad1d(layer_padding),
                    weight_norm(
                        nn.Conv1d(channels, channels, kernel_size=kernel_size, dilation=layer_dilation, bias=True)
                    ),
                    nn.LeakyReLU(0.2),
                    weight_norm(nn.Conv1d(channels, channels, kernel_size=1, bias=True)),
                )
            ]

        self.shortcuts = nn.ModuleList(
            [weight_norm(nn.Conv1d(channels, channels, kernel_size=1, bias=True)) for i in range(num_res_blocks)]
        )

    def forward(self, x):
        for block, shortcut in zip(self.blocks, self.shortcuts):
            x = shortcut(x) + block(x)
        return x

    def remove_weight_norm(self):
        for block, shortcut in zip(self.blocks, self.shortcuts):
            nn.utils.remove_weight_norm(block[2])
            nn.utils.remove_weight_norm(block[4])
            nn.utils.remove_weight_norm(shortcut)


================================================
FILE: TTS/vocoder/layers/parallel_wavegan.py
================================================
import torch
from torch.nn import functional as F


class ResidualBlock(torch.nn.Module):
    """Residual block module in WaveNet."""

    def __init__(
        self,
        kernel_size=3,
        res_channels=64,
        gate_channels=128,
        skip_channels=64,
        aux_channels=80,
        dropout=0.0,
        dilation=1,
        bias=True,
        use_causal_conv=False,
    ):
        super().__init__()
        self.dropout = dropout
        # no future time stamps available
        if use_causal_conv:
            padding = (kernel_size - 1) * dilation
        else:
            assert (kernel_size - 1) % 2 == 0, "Not support even number kernel size."
            padding = (kernel_size - 1) // 2 * dilation
        self.use_causal_conv = use_causal_conv

        # dilation conv
        self.conv = torch.nn.Conv1d(
            res_channels, gate_channels, kernel_size, padding=padding, dilation=dilation, bias=bias
        )

        # local conditioning
        if aux_channels > 0:
            self.conv1x1_aux = torch.nn.Conv1d(aux_channels, gate_channels, 1, bias=False)
        else:
            self.conv1x1_aux = None

        # conv output is split into two groups
        gate_out_channels = gate_channels // 2
        self.conv1x1_out = torch.nn.Conv1d(gate_out_channels, res_channels, 1, bias=bias)
        self.conv1x1_skip = torch.nn.Conv1d(gate_out_channels, skip_channels, 1, bias=bias)

    def forward(self, x, c):
        """
        x: B x D_res x T
        c: B x D_aux x T
        """
        residual = x
        x = F.dropout(x, p=self.dropout, training=self.training)
        x = self.conv(x)

        # remove future time steps if use_causal_conv conv
        x = x[:, :, : residual.size(-1)] if self.use_causal_conv else x

        # split into two part for gated activation
        splitdim = 1
        xa, xb = x.split(x.size(splitdim) // 2, dim=splitdim)

        # local conditioning
        if c is not None:
            assert self.conv1x1_aux is not None
            c = self.conv1x1_aux(c)
            ca, cb = c.split(c.size(splitdim) // 2, dim=splitdim)
            xa, xb = xa + ca, xb + cb

        x = torch.tanh(xa) * torch.sigmoid(xb)

        # for skip connection
        s = self.conv1x1_skip(x)

        # for residual connection
        x = (self.conv1x1_out(x) + residual) * (0.5**2)

        return x, s


================================================
FILE: TTS/vocoder/layers/pqmf.py
================================================
import numpy as np
import torch
import torch.nn.functional as F
from scipy import signal as sig


# adapted from
# https://github.com/kan-bayashi/ParallelWaveGAN/tree/master/parallel_wavegan
class PQMF(torch.nn.Module):
    def __init__(self, N=4, taps=62, cutoff=0.15, beta=9.0):
        super().__init__()

        self.N = N
        self.taps = taps
        self.cutoff = cutoff
        self.beta = beta

        QMF = sig.firwin(taps + 1, cutoff, window=("kaiser", beta))
        H = np.zeros((N, len(QMF)))
        G = np.zeros((N, len(QMF)))
        for k in range(N):
            constant_factor = (
                (2 * k + 1) * (np.pi / (2 * N)) * (np.arange(taps + 1) - ((taps - 1) / 2))
            )  # TODO: (taps - 1) -> taps
            phase = (-1) ** k * np.pi / 4
            H[k] = 2 * QMF * np.cos(constant_factor + phase)

            G[k] = 2 * QMF * np.cos(constant_factor - phase)

        H = torch.from_numpy(H[:, None, :]).float()
        G = torch.from_numpy(G[None, :, :]).float()

        self.register_buffer("H", H)
        self.register_buffer("G", G)

        updown_filter = torch.zeros((N, N, N)).float()
        for k in range(N):
            updown_filter[k, k, 0] = 1.0
        self.register_buffer("updown_filter", updown_filter)
        self.N = N

        self.pad_fn = torch.nn.ConstantPad1d(taps // 2, 0.0)

    def forward(self, x):
        return self.analysis(x)

    def analysis(self, x):
        return F.conv1d(x, self.H, padding=self.taps // 2, stride=self.N)

    def synthesis(self, x):
        x = F.conv_transpose1d(x, self.updown_filter * self.N, stride=self.N)
        x = F.conv1d(x, self.G, padding=self.taps // 2)
        return x


================================================
FILE: TTS/vocoder/layers/upsample.py
================================================
import torch
from torch.nn import functional as F


class Stretch2d(torch.nn.Module):
    def __init__(self, x_scale, y_scale, mode="nearest"):
        super().__init__()
        self.x_scale = x_scale
        self.y_scale = y_scale
        self.mode = mode

    def forward(self, x):
        """
        x (Tensor): Input tensor (B, C, F, T).
        Tensor: Interpolated tensor (B, C, F * y_scale, T * x_scale),
        """
        return F.interpolate(x, scale_factor=(self.y_scale, self.x_scale), mode=self.mode)


class UpsampleNetwork(torch.nn.Module):
    # pylint: disable=dangerous-default-value
    def __init__(
        self,
        upsample_factors,
        nonlinear_activation=None,
        nonlinear_activation_params={},
        interpolate_mode="nearest",
        freq_axis_kernel_size=1,
        use_causal_conv=False,
    ):
        super().__init__()
        self.use_causal_conv = use_causal_conv
        self.up_layers = torch.nn.ModuleList()
        for scale in upsample_factors:
            # interpolation layer
            stretch = Stretch2d(scale, 1, interpolate_mode)
            self.up_layers += [stretch]

            # conv layer
            assert (freq_axis_kernel_size - 1) % 2 == 0, "Not support even number freq axis kernel size."
            freq_axis_padding = (freq_axis_kernel_size - 1) // 2
            kernel_size = (freq_axis_kernel_size, scale * 2 + 1)
            if use_causal_conv:
                padding = (freq_axis_padding, scale * 2)
            else:
                padding = (freq_axis_padding, scale)
            conv = torch.nn.Conv2d(1, 1, kernel_size=kernel_size, padding=padding, bias=False)
            self.up_layers += [conv]

            # nonlinear
            if nonlinear_activation is not None:
                nonlinear = getattr(torch.nn, nonlinear_activation)(**nonlinear_activation_params)
                self.up_layers += [nonlinear]

    def forward(self, c):
        """
        c :  (B, C, T_in).
        Tensor: (B, C, T_upsample)
        """
        c = c.unsqueeze(1)  # (B, 1, C, T)
        for f in self.up_layers:
            c = f(c)
        return c.squeeze(1)  # (B, C, T')


class ConvUpsample(torch.nn.Module):
    # pylint: disable=dangerous-default-value
    def __init__(
        self,
        upsample_factors,
        nonlinear_activation=None,
        nonlinear_activation_params={},
        interpolate_mode="nearest",
        freq_axis_kernel_size=1,
        aux_channels=80,
        aux_context_window=0,
        use_causal_conv=False,
    ):
        super().__init__()
        self.aux_context_window = aux_context_window
        self.use_causal_conv = use_causal_conv and aux_context_window > 0
        # To capture wide-context information in conditional features
        kernel_size = aux_context_window + 1 if use_causal_conv else 2 * aux_context_window + 1
        # NOTE(kan-bayashi): Here do not use padding because the input is already padded
        self.conv_in = torch.nn.Conv1d(aux_channels, aux_channels, kernel_size=kernel_size, bias=False)
        self.upsample = UpsampleNetwork(
            upsample_factors=upsample_factors,
            nonlinear_activation=nonlinear_activation,
            nonlinear_activation_params=nonlinear_activation_params,
            interpolate_mode=interpolate_mode,
            freq_axis_kernel_size=freq_axis_kernel_size,
            use_causal_conv=use_causal_conv,
        )

    def forward(self, c):
        """
        c : (B, C, T_in).
        Tensor: (B, C, T_upsampled),
        """
        c_ = self.conv_in(c)
        c = c_[:, :, : -self.aux_context_window] if self.use_causal_conv else c_
        return self.upsample(c)


================================================
FILE: TTS/vocoder/layers/wavegrad.py
================================================
import torch
import torch.nn.functional as F
from torch import nn
from torch.nn.utils import weight_norm


class Conv1d(nn.Conv1d):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        nn.init.orthogonal_(self.weight)
        nn.init.zeros_(self.bias)


class PositionalEncoding(nn.Module):
    """Positional encoding with noise level conditioning"""

    def __init__(self, n_channels, max_len=10000):
        super().__init__()
        self.n_channels = n_channels
        self.max_len = max_len
        self.C = 5000
        self.pe = torch.zeros(0, 0)

    def forward(self, x, noise_level):
        if x.shape[2] > self.pe.shape[1]:
            self.init_pe_matrix(x.shape[1], x.shape[2], x)
        return x + noise_level[..., None, None] + self.pe[:, : x.size(2)].repeat(x.shape[0], 1, 1) / self.C

    def init_pe_matrix(self, n_channels, max_len, x):
        pe = torch.zeros(max_len, n_channels)
        position = torch.arange(0, max_len, dtype=torch.float).unsqueeze(1)
        div_term = torch.pow(10000, torch.arange(0, n_channels, 2).float() / n_channels)

        pe[:, 0::2] = torch.sin(position / div_term)
        pe[:, 1::2] = torch.cos(position / div_term)
        self.pe = pe.transpose(0, 1).to(x)


class FiLM(nn.Module):
    def __init__(self, input_size, output_size):
        super().__init__()
        self.encoding = PositionalEncoding(input_size)
        self.input_conv = nn.Conv1d(input_size, input_size, 3, padding=1)
        self.output_conv = nn.Conv1d(input_size, output_size * 2, 3, padding=1)

        nn.init.xavier_uniform_(self.input_conv.weight)
        nn.init.xavier_uniform_(self.output_conv.weight)
        nn.init.zeros_(self.input_conv.bias)
        nn.init.zeros_(self.output_conv.bias)

    def forward(self, x, noise_scale):
        o = self.input_conv(x)
        o = F.leaky_relu(o, 0.2)
        o = self.encoding(o, noise_scale)
        shift, scale = torch.chunk(self.output_conv(o), 2, dim=1)
        return shift, scale

    def remove_weight_norm(self):
        nn.utils.remove_weight_norm(self.input_conv)
        nn.utils.remove_weight_norm(self.output_conv)

    def apply_weight_norm(self):
        self.input_conv = weight_norm(self.input_conv)
        self.output_conv = weight_norm(self.output_conv)


@torch.jit.script
def shif_and_scale(x, scale, shift):
    o = shift + scale * x
    return o


class UBlock(nn.Module):
    def __init__(self, input_size, hidden_size, factor, dilation):
        super().__init__()
        assert isinstance(dilation, (list, tuple))
        assert len(dilation) == 4

        self.factor = factor
        self.res_block = Conv1d(input_size, hidden_size, 1)
        self.main_block = nn.ModuleList(
            [
                Conv1d(input_size, hidden_size, 3, dilation=dilation[0], padding=dilation[0]),
                Conv1d(hidden_size, hidden_size, 3, dilation=dilation[1], padding=dilation[1]),
            ]
        )
        self.out_block = nn.ModuleList(
            [
                Conv1d(hidden_size, hidden_size, 3, dilation=dilation[2], padding=dilation[2]),
                Conv1d(hidden_size, hidden_size, 3, dilation=dilation[3], padding=dilation[3]),
            ]
        )

    def forward(self, x, shift, scale):
        x_inter = F.interpolate(x, size=x.shape[-1] * self.factor)
        res = self.res_block(x_inter)
        o = F.leaky_relu(x_inter, 0.2)
        o = F.interpolate(o, size=x.shape[-1] * self.factor)
        o = self.main_block[0](o)
        o = shif_and_scale(o, scale, shift)
        o = F.leaky_relu(o, 0.2)
        o = self.main_block[1](o)
        res2 = res + o
        o = shif_and_scale(res2, scale, shift)
        o = F.leaky_relu(o, 0.2)
        o = self.out_block[0](o)
        o = shif_and_scale(o, scale, shift)
        o = F.leaky_relu(o, 0.2)
        o = self.out_block[1](o)
        o = o + res2
        return o

    def remove_weight_norm(self):
        nn.utils.remove_weight_norm(self.res_block)
        for _, layer in enumerate(self.main_block):
            if len(layer.state_dict()) != 0:
                nn.utils.remove_weight_norm(layer)
        for _, layer in enumerate(self.out_block):
            if len(layer.state_dict()) != 0:
                nn.utils.remove_weight_norm(layer)

    def apply_weight_norm(self):
        self.res_block = weight_norm(self.res_block)
        for idx, layer in enumerate(self.main_block):
            if len(layer.state_dict()) != 0:
                self.main_block[idx] = weight_norm(layer)
        for idx, layer in enumerate(self.out_block):
            if len(layer.state_dict()) != 0:
                self.out_block[idx] = weight_norm(layer)


class DBlock(nn.Module):
    def __init__(self, input_size, hidden_size, factor):
        super().__init__()
        self.factor = factor
        self.res_block = Conv1d(input_size, hidden_size, 1)
        self.main_block = nn.ModuleList(
            [
                Conv1d(input_size, hidden_size, 3, dilation=1, padding=1),
                Conv1d(hidden_size, hidden_size, 3, dilation=2, padding=2),
                Conv1d(hidden_size, hidden_size, 3, dilation=4, padding=4),
            ]
        )

    def forward(self, x):
        size = x.shape[-1] // self.factor
        res = self.res_block(x)
        res = F.interpolate(res, size=size)
        o = F.interpolate(x, size=size)
        for layer in self.main_block:
            o = F.leaky_relu(o, 0.2)
            o = layer(o)
        return o + res

    def remove_weight_norm(self):
        nn.utils.remove_weight_norm(self.res_block)
        for _, layer in enumerate(self.main_block):
            if len(layer.state_dict()) != 0:
                nn.utils.remove_weight_norm(layer)

    def apply_weight_norm(self):
        self.res_block = weight_norm(self.res_block)
        for idx, layer in enumerate(self.main_block):
            if len(layer.state_dict()) != 0:
                self.main_block[idx] = weight_norm(layer)


================================================
FILE: TTS/vocoder/models/__init__.py
================================================
import importlib
import re

from coqpit import Coqpit


def to_camel(text):
    text = text.capitalize()
    return re.sub(r"(?!^)_([a-zA-Z])", lambda m: m.group(1).upper(), text)


def setup_model(config: Coqpit):
    """Load models directly from configuration."""
    if "discriminator_model" in config and "generator_model" in config:
        MyModel = importlib.import_module("TTS.vocoder.models.gan")
        MyModel = getattr(MyModel, "GAN")
    else:
        MyModel = importlib.import_module("TTS.vocoder.models." + config.model.lower())
        if config.model.lower() == "wavernn":
            MyModel = getattr(MyModel, "Wavernn")
        elif config.model.lower() == "gan":
            MyModel = getattr(MyModel, "GAN")
        elif config.model.lower() == "wavegrad":
            MyModel = getattr(MyModel, "Wavegrad")
        else:
            try:
                MyModel = getattr(MyModel, to_camel(config.model))
            except ModuleNotFoundError as e:
                raise ValueError(f"Model {config.model} not exist!") from e
    print(" > Vocoder Model: {}".format(config.model))
    return MyModel.init_from_config(config)


def setup_generator(c):
    """TODO: use config object as arguments"""
    print(" > Generator Model: {}".format(c.generator_model))
    MyModel = importlib.import_module("TTS.vocoder.models." + c.generator_model.lower())
    MyModel = getattr(MyModel, to_camel(c.generator_model))
    # this is to preserve the Wavernn class name (instead of Wavernn)
    if c.generator_model.lower() in "hifigan_generator":
        model = MyModel(in_channels=c.audio["num_mels"], out_channels=1, **c.generator_model_params)
    elif c.generator_model.lower() in "melgan_generator":
        model = MyModel(
            in_channels=c.audio["num_mels"],
            out_channels=1,
            proj_kernel=7,
            base_channels=512,
            upsample_factors=c.generator_model_params["upsample_factors"],
            res_kernel=3,
            num_res_blocks=c.generator_model_params["num_res_blocks"],
        )
    elif c.generator_model in "melgan_fb_generator":
        raise ValueError("melgan_fb_generator is now fullband_melgan_generator")
    elif c.generator_model.lower() in "multiband_melgan_generator":
        model = MyModel(
            in_channels=c.audio["num_mels"],
            out_channels=4,
            proj_kernel=7,
            base_channels=384,
            upsample_factors=c.generator_model_params["upsample_factors"],
            res_kernel=3,
            num_res_blocks=c.generator_model_params["num_res_blocks"],
        )
    elif c.generator_model.lower() in "fullband_melgan_generator":
        model = MyModel(
            in_channels=c.audio["num_mels"],
            out_channels=1,
            proj_kernel=7,
            base_channels=512,
            upsample_factors=c.generator_model_params["upsample_factors"],
            res_kernel=3,
            num_res_blocks=c.generator_model_params["num_res_blocks"],
        )
    elif c.generator_model.lower() in "parallel_wavegan_generator":
        model = MyModel(
            in_channels=1,
            out_channels=1,
            kernel_size=3,
            num_res_blocks=c.generator_model_params["num_res_blocks"],
            stacks=c.generator_model_params["stacks"],
            res_channels=64,
            gate_channels=128,
            skip_channels=64,
            aux_channels=c.audio["num_mels"],
            dropout=0.0,
            bias=True,
            use_weight_norm=True,
            upsample_factors=c.generator_model_params["upsample_factors"],
        )
    elif c.generator_model.lower() in "univnet_generator":
        model = MyModel(**c.generator_model_params)
    else:
        raise NotImplementedError(f"Model {c.generator_model} not implemented!")
    return model


def setup_discriminator(c):
    """TODO: use config objekt as arguments"""
    print(" > Discriminator Model: {}".format(c.discriminator_model))
    if "parallel_wavegan" in c.discriminator_model:
        MyModel = importlib.import_module("TTS.vocoder.models.parallel_wavegan_discriminator")
    else:
        MyModel = importlib.import_module("TTS.vocoder.models." + c.discriminator_model.lower())
    MyModel = getattr(MyModel, to_camel(c.discriminator_model.lower()))
    if c.discriminator_model in "hifigan_discriminator":
        model = MyModel()
    if c.discriminator_model in "random_window_discriminator":
        model = MyModel(
            cond_channels=c.audio["num_mels"],
            hop_length=c.audio["hop_length"],
            uncond_disc_donwsample_factors=c.discriminator_model_params["uncond_disc_donwsample_factors"],
            cond_disc_downsample_factors=c.discriminator_model_params["cond_disc_downsample_factors"],
            cond_disc_out_channels=c.discriminator_model_params["cond_disc_out_channels"],
            window_sizes=c.discriminator_model_params["window_sizes"],
        )
    if c.discriminator_model in "melgan_multiscale_discriminator":
        model = MyModel(
            in_channels=1,
            out_channels=1,
            kernel_sizes=(5, 3),
            base_channels=c.discriminator_model_params["base_channels"],
            max_channels=c.discriminator_model_params["max_channels"],
            downsample_factors=c.discriminator_model_params["downsample_factors"],
        )
    if c.discriminator_model == "residual_parallel_wavegan_discriminator":
        model = MyModel(
            in_channels=1,
            out_channels=1,
            kernel_size=3,
            num_layers=c.discriminator_model_params["num_layers"],
            stacks=c.discriminator_model_params["stacks"],
            res_channels=64,
            gate_channels=128,
            skip_channels=64,
            dropout=0.0,
            bias=True,
            nonlinear_activation="LeakyReLU",
            nonlinear_activation_params={"negative_slope": 0.2},
        )
    if c.discriminator_model == "parallel_wavegan_discriminator":
        model = MyModel(
            in_channels=1,
            out_channels=1,
            kernel_size=3,
            num_layers=c.discriminator_model_params["num_layers"],
            conv_channels=64,
            dilation_factor=1,
            nonlinear_activation="LeakyReLU",
            nonlinear_activation_params={"negative_slope": 0.2},
            bias=True,
        )
    if c.discriminator_model == "univnet_discriminator":
        model = MyModel()
    return model


================================================
FILE: TTS/vocoder/models/base_vocoder.py
================================================
from coqpit import Coqpit

from TTS.model import BaseTrainerModel

# pylint: skip-file


class BaseVocoder(BaseTrainerModel):
    """Base `vocoder` class. Every new `vocoder` model must inherit this.

    It defines `vocoder` specific functions on top of `Model`.

    Notes on input/output tensor shapes:
        Any input or output tensor of the model must be shaped as

        - 3D tensors `batch x time x channels`
        - 2D tensors `batch x channels`
        - 1D tensors `batch x 1`
    """

    MODEL_TYPE = "vocoder"

    def __init__(self, config):
        super().__init__()
        self._set_model_args(config)

    def _set_model_args(self, config: Coqpit):
        """Setup model args based on the config type.

        If the config is for training with a name like "*Config", then the model args are embeded in the
        config.model_args

        If the config is for the model with a name like "*Args", then we assign the directly.
        """
        # don't use isintance not to import recursively
        if "Config" in config.__class__.__name__:
            if "characters" in config:
                _, self.config, num_chars = self.get_characters(config)
                self.config.num_chars = num_chars
                if hasattr(self.config, "model_args"):
                    config.model_args.num_chars = num_chars
                    if "model_args" in config:
                        self.args = self.config.model_args
                    # This is for backward compatibility
                    if "model_params" in config:
                        self.args = self.config.model_params
            else:
                self.config = config
                if "model_args" in config:
                    self.args = self.config.model_args
                # This is for backward compatibility
                if "model_params" in config:
                    self.args = self.config.model_params
        else:
            raise ValueError("config must be either a *Config or *Args")


================================================
FILE: TTS/vocoder/models/fullband_melgan_generator.py
================================================
import torch

from TTS.vocoder.models.melgan_generator import MelganGenerator


class FullbandMelganGenerator(MelganGenerator):
    def __init__(
        self,
        in_channels=80,
        out_channels=1,
        proj_kernel=7,
        base_channels=512,
        upsample_factors=(2, 8, 2, 2),
        res_kernel=3,
        num_res_blocks=4,
    ):
        super().__init__(
            in_channels=in_channels,
            out_channels=out_channels,
            proj_kernel=proj_kernel,
            base_channels=base_channels,
            upsample_factors=upsample_factors,
            res_kernel=res_kernel,
            num_res_blocks=num_res_blocks,
        )

    @torch.no_grad()
    def inference(self, cond_features):
        cond_features = cond_features.to(self.layers[1].weight.device)
        cond_features = torch.nn.functional.pad(
            cond_features, (self.inference_padding, self.inference_padding), "replicate"
        )
        return self.layers(cond_features)


================================================
FILE: TTS/vocoder/models/gan.py
================================================
from inspect import signature
from typing import Dict, List, Tuple

import numpy as np
import torch
from coqpit import Coqpit
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.data.distributed import DistributedSampler
from trainer.trainer_utils import get_optimizer, get_scheduler

from TTS.utils.audio import AudioProcessor
from TTS.utils.io import load_fsspec
from TTS.vocoder.datasets.gan_dataset import GANDataset
from TTS.vocoder.layers.losses import DiscriminatorLoss, GeneratorLoss
from TTS.vocoder.models import setup_discriminator, setup_generator
from TTS.vocoder.models.base_vocoder import BaseVocoder
from TTS.vocoder.utils.generic_utils import plot_results


class GAN(BaseVocoder):
    def __init__(self, config: Coqpit, ap: AudioProcessor = None):
        """Wrap a generator and a discriminator network. It provides a compatible interface for the trainer.
        It also helps mixing and matching different generator and disciminator networks easily.

        To implement a new GAN models, you just need to define the generator and the discriminator networks, the rest
        is handled by the `GAN` class.

        Args:
            config (Coqpit): Model configuration.
            ap (AudioProcessor): 🐸TTS AudioProcessor instance. Defaults to None.

        Examples:
            Initializing the GAN model with HifiGAN generator and discriminator.
            >>> from TTS.vocoder.configs import HifiganConfig
            >>> config = HifiganConfig()
            >>> model = GAN(config)
        """
        super().__init__(config)
        self.config = config
        self.model_g = setup_generator(config)
        self.model_d = setup_discriminator(config)
        self.train_disc = False  # if False, train only the generator.
        self.y_hat_g = None  # the last generator prediction to be passed onto the discriminator
        self.ap = ap

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Run the generator's forward pass.

        Args:
            x (torch.Tensor): Input tensor.

        Returns:
            torch.Tensor: output of the GAN generator network.
        """
        return self.model_g.forward(x)

    def inference(self, x: torch.Tensor) -> torch.Tensor:
        """Run the generator's inference pass.

        Args:
            x (torch.Tensor): Input tensor.
        Returns:
            torch.Tensor: output of the GAN generator network.
        """
        return self.model_g.inference(x)

    def train_step(self, batch: Dict, criterion: Dict, optimizer_idx: int) -> Tuple[Dict, Dict]:
        """Compute model outputs and the loss values. `optimizer_idx` selects the generator or the discriminator for
        network on the current pass.

        Args:
            batch (Dict): Batch of samples returned by the dataloader.
            criterion (Dict): Criterion used to compute the losses.
            optimizer_idx (int): ID of the optimizer in use on the current pass.

        Raises:
            ValueError: `optimizer_idx` is an unexpected value.

        Returns:
            Tuple[Dict, Dict]: model outputs and the computed loss values.
        """
        outputs = {}
        loss_dict = {}

        x = batch["input"]
        y = batch["waveform"]

        if optimizer_idx not in [0, 1]:
            raise ValueError(" [!] Unexpected `optimizer_idx`.")

        if optimizer_idx == 0:
            # DISCRIMINATOR optimization

            # generator pass
            y_hat = self.model_g(x)[:, :, : y.size(2)]

            # cache for generator loss
            # pylint: disable=W0201
            self.y_hat_g = y_hat
            self.y_hat_sub = None
            self.y_sub_g = None

            # PQMF formatting
            if y_hat.shape[1] > 1:
                self.y_hat_sub = y_hat
                y_hat = self.model_g.pqmf_synthesis(y_hat)
                self.y_hat_g = y_hat  # save for generator loss
                self.y_sub_g = self.model_g.pqmf_analysis(y)

            scores_fake, feats_fake, feats_real = None, None, None

            if self.train_disc:
                # use different samples for G and D trainings
                if self.config.diff_samples_for_G_and_D:
                    x_d = batch["input_disc"]
                    y_d = batch["waveform_disc"]
                    # use a different sample than generator
                    with torch.no_grad():
                        y_hat = self.model_g(x_d)

                    # PQMF formatting
                    if y_hat.shape[1] > 1:
                        y_hat = self.model_g.pqmf_synthesis(y_hat)
                else:
                    # use the same samples as generator
                    x_d = x.clone()
                    y_d = y.clone()
                    y_hat = self.y_hat_g

                # run D with or without cond. features
                if len(signature(self.model_d.forward).parameters) == 2:
                    D_out_fake = self.model_d(y_hat.detach().clone(), x_d)
                    D_out_real = self.model_d(y_d, x_d)
                else:
                    D_out_fake = self.model_d(y_hat.detach())
                    D_out_real = self.model_d(y_d)

                # format D outputs
                if isinstance(D_out_fake, tuple):
                    # self.model_d returns scores and features
                    scores_fake, feats_fake = D_out_fake
                    if D_out_real is None:
                        scores_real, feats_real = None, None
                    else:
                        scores_real, feats_real = D_out_real
                else:
                    # model D returns only scores
                    scores_fake = D_out_fake
                    scores_real = D_out_real

                # compute losses
                loss_dict = criterion[optimizer_idx](scores_fake, scores_real)
                outputs = {"model_outputs": y_hat}

        if optimizer_idx == 1:
            # GENERATOR loss
            scores_fake, feats_fake, feats_real = None, None, None
            if self.train_disc:
                if len(signature(self.model_d.forward).parameters) == 2:
                    D_out_fake = self.model_d(self.y_hat_g, x)
                else:
                    D_out_fake = self.model_d(self.y_hat_g)
                D_out_real = None

                if self.config.use_feat_match_loss:
                    with torch.no_grad():
                        D_out_real = self.model_d(y)

                # format D outputs
                if isinstance(D_out_fake, tuple):
                    scores_fake, feats_fake = D_out_fake
                    if D_out_real is None:
                        feats_real = None
                    else:
                        _, feats_real = D_out_real
                else:
                    scores_fake = D_out_fake
                    feats_fake, feats_real = None, None

            # compute losses
            loss_dict = criterion[optimizer_idx](
                self.y_hat_g, y, scores_fake, feats_fake, feats_real, self.y_hat_sub, self.y_sub_g
            )
            outputs = {"model_outputs": self.y_hat_g}
        return outputs, loss_dict

    def _log(self, name: str, ap: AudioProcessor, batch: Dict, outputs: Dict) -> Tuple[Dict, Dict]:
        """Logging shared by the training and evaluation.

        Args:
            name (str): Name of the run. `train` or `eval`,
            ap (AudioProcessor): Audio processor used in training.
            batch (Dict): Batch used in the last train/eval step.
            outputs (Dict): Model outputs from the last train/eval step.

        Returns:
            Tuple[Dict, Dict]: log figures and audio samples.
        """
        y_hat = outputs[0]["model_outputs"] if self.train_disc else outputs[1]["model_outputs"]
        y = batch["waveform"]
        figures = plot_results(y_hat, y, ap, name)
        sample_voice = y_hat[0].squeeze(0).detach().cpu().numpy()
        audios = {f"{name}/audio": sample_voice}
        return figures, audios

    def train_log(
        self, batch: Dict, outputs: Dict, logger: "Logger", assets: Dict, steps: int  # pylint: disable=unused-argument
    ) -> Tuple[Dict, np.ndarray]:
        """Call `_log()` for training."""
        figures, audios = self._log("eval", self.ap, batch, outputs)
        logger.eval_figures(steps, figures)
        logger.eval_audios(steps, audios, self.ap.sample_rate)

    @torch.no_grad()
    def eval_step(self, batch: Dict, criterion: nn.Module, optimizer_idx: int) -> Tuple[Dict, Dict]:
        """Call `train_step()` with `no_grad()`"""
        self.train_disc = True  # Avoid a bug in the Training with the missing discriminator loss
        return self.train_step(batch, criterion, optimizer_idx)

    def eval_log(
        self, batch: Dict, outputs: Dict, logger: "Logger", assets: Dict, steps: int  # pylint: disable=unused-argument
    ) -> Tuple[Dict, np.ndarray]:
        """Call `_log()` for evaluation."""
        figures, audios = self._log("eval", self.ap, batch, outputs)
        logger.eval_figures(steps, figures)
        logger.eval_audios(steps, audios, self.ap.sample_rate)

    def load_checkpoint(
        self,
        config: Coqpit,
        checkpoint_path: str,
        eval: bool = False,  # pylint: disable=unused-argument, redefined-builtin
        cache: bool = False,
    ) -> None:
        """Load a GAN checkpoint and initialize model parameters.

        Args:
            config (Coqpit): Model config.
            checkpoint_path (str): Checkpoint file path.
            eval (bool, optional): If true, load the model for inference. If falseDefaults to False.
        """
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
        # band-aid for older than v0.0.15 GAN models
        if "model_disc" in state:
            self.model_g.load_checkpoint(config, checkpoint_path, eval)
        else:
            self.load_state_dict(state["model"])
            if eval:
                self.model_d = None
                if hasattr(self.model_g, "remove_weight_norm"):
                    self.model_g.remove_weight_norm()

    def on_train_step_start(self, trainer) -> None:
        """Enable the discriminator training based on `steps_to_start_discriminator`

        Args:
            trainer (Trainer): Trainer object.
        """
        self.train_disc = trainer.total_steps_done >= self.config.steps_to_start_discriminator

    def get_optimizer(self) -> List:
        """Initiate and return the GAN optimizers based on the config parameters.

        It returnes 2 optimizers in a list. First one is for the generator and the second one is for the discriminator.

        Returns:
            List: optimizers.
        """
        optimizer1 = get_optimizer(
            self.config.optimizer, self.config.optimizer_params, self.config.lr_gen, self.model_g
        )
        optimizer2 = get_optimizer(
            self.config.optimizer, self.config.optimizer_params, self.config.lr_disc, self.model_d
        )
        return [optimizer2, optimizer1]

    def get_lr(self) -> List:
        """Set the initial learning rates for each optimizer.

        Returns:
            List: learning rates for each optimizer.
        """
        return [self.config.lr_disc, self.config.lr_gen]

    def get_scheduler(self, optimizer) -> List:
        """Set the schedulers for each optimizer.

        Args:
            optimizer (List[`torch.optim.Optimizer`]): List of optimizers.

        Returns:
            List: Schedulers, one for each optimizer.
        """
        scheduler1 = get_scheduler(self.config.lr_scheduler_gen, self.config.lr_scheduler_gen_params, optimizer[0])
        scheduler2 = get_scheduler(self.config.lr_scheduler_disc, self.config.lr_scheduler_disc_params, optimizer[1])
        return [scheduler2, scheduler1]

    @staticmethod
    def format_batch(batch: List) -> Dict:
        """Format the batch for training.

        Args:
            batch (List): Batch out of the dataloader.

        Returns:
            Dict: formatted model inputs.
        """
        if isinstance(batch[0], list):
            x_G, y_G = batch[0]
            x_D, y_D = batch[1]
            return {"input": x_G, "waveform": y_G, "input_disc": x_D, "waveform_disc": y_D}
        x, y = batch
        return {"input": x, "waveform": y}

    def get_data_loader(  # pylint: disable=no-self-use, unused-argument
        self,
        config: Coqpit,
        assets: Dict,
        is_eval: True,
        samples: List,
        verbose: bool,
        num_gpus: int,
        rank: int = None,  # pylint: disable=unused-argument
    ):
        """Initiate and return the GAN dataloader.

        Args:
            config (Coqpit): Model config.
            ap (AudioProcessor): Audio processor.
            is_eval (True): Set the dataloader for evaluation if true.
            samples (List): Data samples.
            verbose (bool): Log information if true.
            num_gpus (int): Number of GPUs in use.
            rank (int): Rank of the current GPU. Defaults to None.

        Returns:
            DataLoader: Torch dataloader.
        """
        dataset = GANDataset(
            ap=self.ap,
            items=samples,
            seq_len=config.seq_len,
            hop_len=self.ap.hop_length,
            pad_short=config.pad_short,
            conv_pad=config.conv_pad,
            return_pairs=config.diff_samples_for_G_and_D if "diff_samples_for_G_and_D" in config else False,
            is_training=not is_eval,
            return_segments=not is_eval,
            use_noise_augment=config.use_noise_augment,
            use_cache=config.use_cache,
            verbose=verbose,
        )
        dataset.shuffle_mapping()
        sampler = DistributedSampler(dataset, shuffle=True) if num_gpus > 1 else None
        loader = DataLoader(
            dataset,
            batch_size=1 if is_eval else config.batch_size,
            shuffle=num_gpus == 0,
            drop_last=False,
            sampler=sampler,
            num_workers=config.num_eval_loader_workers if is_eval else config.num_loader_workers,
            pin_memory=False,
        )
        return loader

    def get_criterion(self):
        """Return criterions for the optimizers"""
        return [DiscriminatorLoss(self.config), GeneratorLoss(self.config)]

    @staticmethod
    def init_from_config(config: Coqpit, verbose=True) -> "GAN":
        ap = AudioProcessor.init_from_config(config, verbose=verbose)
        return GAN(config, ap=ap)


================================================
FILE: TTS/vocoder/models/hifigan_discriminator.py
================================================
# adopted from https://github.com/jik876/hifi-gan/blob/master/models.py
import torch
from torch import nn
from torch.nn import functional as F

LRELU_SLOPE = 0.1


class DiscriminatorP(torch.nn.Module):
    """HiFiGAN Periodic Discriminator

    Takes every Pth value from the input waveform and applied a stack of convoluations.

    Note:
        if `period` is 2
        `waveform = [1, 2, 3, 4, 5, 6 ...] --> [1, 3, 5 ... ] --> convs -> score, feat`

    Args:
        x (Tensor): input waveform.

    Returns:
        [Tensor]: discriminator scores per sample in the batch.
        [List[Tensor]]: list of features from each convolutional layer.

    Shapes:
        x: [B, 1, T]
    """

    def __init__(self, period, kernel_size=5, stride=3, use_spectral_norm=False):
        super().__init__()
        self.period = period
        get_padding = lambda k, d: int((k * d - d) / 2)
        norm_f = nn.utils.spectral_norm if use_spectral_norm else nn.utils.weight_norm
        self.convs = nn.ModuleList(
            [
                norm_f(nn.Conv2d(1, 32, (kernel_size, 1), (stride, 1), padding=(get_padding(kernel_size, 1), 0))),
                norm_f(nn.Conv2d(32, 128, (kernel_size, 1), (stride, 1), padding=(get_padding(kernel_size, 1), 0))),
                norm_f(nn.Conv2d(128, 512, (kernel_size, 1), (stride, 1), padding=(get_padding(kernel_size, 1), 0))),
                norm_f(nn.Conv2d(512, 1024, (kernel_size, 1), (stride, 1), padding=(get_padding(kernel_size, 1), 0))),
                norm_f(nn.Conv2d(1024, 1024, (kernel_size, 1), 1, padding=(2, 0))),
            ]
        )
        self.conv_post = norm_f(nn.Conv2d(1024, 1, (3, 1), 1, padding=(1, 0)))

    def forward(self, x):
        """
        Args:
            x (Tensor): input waveform.

        Returns:
            [Tensor]: discriminator scores per sample in the batch.
            [List[Tensor]]: list of features from each convolutional layer.

        Shapes:
            x: [B, 1, T]
        """
        feat = []

        # 1d to 2d
        b, c, t = x.shape
        if t % self.period != 0:  # pad first
            n_pad = self.period - (t % self.period)
            x = F.pad(x, (0, n_pad), "reflect")
            t = t + n_pad
        x = x.view(b, c, t // self.period, self.period)

        for l in self.convs:
            x = l(x)
            x = F.leaky_relu(x, LRELU_SLOPE)
            feat.append(x)
        x = self.conv_post(x)
        feat.append(x)
        x = torch.flatten(x, 1, -1)

        return x, feat


class MultiPeriodDiscriminator(torch.nn.Module):
    """HiFiGAN Multi-Period Discriminator (MPD)
    Wrapper for the `PeriodDiscriminator` to apply it in different periods.
    Periods are suggested to be prime numbers to reduce the overlap between each discriminator.
    """

    def __init__(self, use_spectral_norm=False):
        super().__init__()
        self.discriminators = nn.ModuleList(
            [
                DiscriminatorP(2, use_spectral_norm=use_spectral_norm),
                DiscriminatorP(3, use_spectral_norm=use_spectral_norm),
                DiscriminatorP(5, use_spectral_norm=use_spectral_norm),
                DiscriminatorP(7, use_spectral_norm=use_spectral_norm),
                DiscriminatorP(11, use_spectral_norm=use_spectral_norm),
            ]
        )

    def forward(self, x):
        """
        Args:
            x (Tensor): input waveform.

        Returns:
        [List[Tensor]]: list of scores from each discriminator.
            [List[List[Tensor]]]: list of list of features from each discriminator's each convolutional layer.

        Shapes:
            x: [B, 1, T]
        """
        scores = []
        feats = []
        for _, d in enumerate(self.discriminators):
            score, feat = d(x)
            scores.append(score)
            feats.append(feat)
        return scores, feats


class DiscriminatorS(torch.nn.Module):
    """HiFiGAN Scale Discriminator.
    It is similar to `MelganDiscriminator` but with a specific architecture explained in the paper.

    Args:
        use_spectral_norm (bool): if `True` swith to spectral norm instead of weight norm.

    """

    def __init__(self, use_spectral_norm=False):
        super().__init__()
        norm_f = nn.utils.spectral_norm if use_spectral_norm else nn.utils.weight_norm
        self.convs = nn.ModuleList(
            [
                norm_f(nn.Conv1d(1, 128, 15, 1, padding=7)),
                norm_f(nn.Conv1d(128, 128, 41, 2, groups=4, padding=20)),
                norm_f(nn.Conv1d(128, 256, 41, 2, groups=16, padding=20)),
                norm_f(nn.Conv1d(256, 512, 41, 4, groups=16, padding=20)),
                norm_f(nn.Conv1d(512, 1024, 41, 4, groups=16, padding=20)),
                norm_f(nn.Conv1d(1024, 1024, 41, 1, groups=16, padding=20)),
                norm_f(nn.Conv1d(1024, 1024, 5, 1, padding=2)),
            ]
        )
        self.conv_post = norm_f(nn.Conv1d(1024, 1, 3, 1, padding=1))

    def forward(self, x):
        """
        Args:
            x (Tensor): input waveform.

        Returns:
            Tensor: discriminator scores.
            List[Tensor]: list of features from the convolutiona layers.
        """
        feat = []
        for l in self.convs:
            x = l(x)
            x = F.leaky_relu(x, LRELU_SLOPE)
            feat.append(x)
        x = self.conv_post(x)
        feat.append(x)
        x = torch.flatten(x, 1, -1)
        return x, feat


class MultiScaleDiscriminator(torch.nn.Module):
    """HiFiGAN Multi-Scale Discriminator.
    It is similar to `MultiScaleMelganDiscriminator` but specially tailored for HiFiGAN as in the paper.
    """

    def __init__(self):
        super().__init__()
        self.discriminators = nn.ModuleList(
            [
                DiscriminatorS(use_spectral_norm=True),
                DiscriminatorS(),
                DiscriminatorS(),
            ]
        )
        self.meanpools = nn.ModuleList([nn.AvgPool1d(4, 2, padding=2), nn.AvgPool1d(4, 2, padding=2)])

    def forward(self, x):
        """
        Args:
            x (Tensor): input waveform.

        Returns:
            List[Tensor]: discriminator scores.
            List[List[Tensor]]: list of list of features from each layers of each discriminator.
        """
        scores = []
        feats = []
        for i, d in enumerate(self.discriminators):
            if i != 0:
                x = self.meanpools[i - 1](x)
            score, feat = d(x)
            scores.append(score)
            feats.append(feat)
        return scores, feats


class HifiganDiscriminator(nn.Module):
    """HiFiGAN discriminator wrapping MPD and MSD."""

    def __init__(self):
        super().__init__()
        self.mpd = MultiPeriodDiscriminator()
        self.msd = MultiScaleDiscriminator()

    def forward(self, x):
        """
        Args:
            x (Tensor): input waveform.

        Returns:
            List[Tensor]: discriminator scores.
            List[List[Tensor]]: list of list of features from each layers of each discriminator.
        """
        scores, feats = self.mpd(x)
        scores_, feats_ = self.msd(x)
        return scores + scores_, feats + feats_


================================================
FILE: TTS/vocoder/models/hifigan_generator.py
================================================
# adopted from https://github.com/jik876/hifi-gan/blob/master/models.py
import torch
from torch import nn
from torch.nn import Conv1d, ConvTranspose1d
from torch.nn import functional as F
from torch.nn.utils import remove_weight_norm, weight_norm

from TTS.utils.io import load_fsspec

LRELU_SLOPE = 0.1


def get_padding(k, d):
    return int((k * d - d) / 2)


class ResBlock1(torch.nn.Module):
    """Residual Block Type 1. It has 3 convolutional layers in each convolutional block.

    Network::

        x -> lrelu -> conv1_1 -> conv1_2 -> conv1_3 -> z -> lrelu -> conv2_1 -> conv2_2 -> conv2_3 -> o -> + -> o
        |--------------------------------------------------------------------------------------------------|


    Args:
        channels (int): number of hidden channels for the convolutional layers.
        kernel_size (int): size of the convolution filter in each layer.
        dilations (list): list of dilation value for each conv layer in a block.
    """

    def __init__(self, channels, kernel_size=3, dilation=(1, 3, 5)):
        super().__init__()
        self.convs1 = nn.ModuleList(
            [
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[0],
                        padding=get_padding(kernel_size, dilation[0]),
                    )
                ),
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[1],
                        padding=get_padding(kernel_size, dilation[1]),
                    )
                ),
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[2],
                        padding=get_padding(kernel_size, dilation[2]),
                    )
                ),
            ]
        )

        self.convs2 = nn.ModuleList(
            [
                weight_norm(
                    Conv1d(channels, channels, kernel_size, 1, dilation=1, padding=get_padding(kernel_size, 1))
                ),
                weight_norm(
                    Conv1d(channels, channels, kernel_size, 1, dilation=1, padding=get_padding(kernel_size, 1))
                ),
                weight_norm(
                    Conv1d(channels, channels, kernel_size, 1, dilation=1, padding=get_padding(kernel_size, 1))
                ),
            ]
        )

    def forward(self, x):
        """
        Args:
            x (Tensor): input tensor.
        Returns:
            Tensor: output tensor.
        Shapes:
            x: [B, C, T]
        """
        for c1, c2 in zip(self.convs1, self.convs2):
            xt = F.leaky_relu(x, LRELU_SLOPE)
            xt = c1(xt)
            xt = F.leaky_relu(xt, LRELU_SLOPE)
            xt = c2(xt)
            x = xt + x
        return x

    def remove_weight_norm(self):
        for l in self.convs1:
            remove_weight_norm(l)
        for l in self.convs2:
            remove_weight_norm(l)


class ResBlock2(torch.nn.Module):
    """Residual Block Type 2. It has 1 convolutional layers in each convolutional block.

    Network::

        x -> lrelu -> conv1-> -> z -> lrelu -> conv2-> o -> + -> o
        |---------------------------------------------------|


    Args:
        channels (int): number of hidden channels for the convolutional layers.
        kernel_size (int): size of the convolution filter in each layer.
        dilations (list): list of dilation value for each conv layer in a block.
    """

    def __init__(self, channels, kernel_size=3, dilation=(1, 3)):
        super().__init__()
        self.convs = nn.ModuleList(
            [
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[0],
                        padding=get_padding(kernel_size, dilation[0]),
                    )
                ),
                weight_norm(
                    Conv1d(
                        channels,
                        channels,
                        kernel_size,
                        1,
                        dilation=dilation[1],
                        padding=get_padding(kernel_size, dilation[1]),
                    )
                ),
            ]
        )

    def forward(self, x):
        for c in self.convs:
            xt = F.leaky_relu(x, LRELU_SLOPE)
            xt = c(xt)
            x = xt + x
        return x

    def remove_weight_norm(self):
        for l in self.convs:
            remove_weight_norm(l)


class HifiganGenerator(torch.nn.Module):
    def __init__(
        self,
        in_channels,
        out_channels,
        resblock_type,
        resblock_dilation_sizes,
        resblock_kernel_sizes,
        upsample_kernel_sizes,
        upsample_initial_channel,
        upsample_factors,
        inference_padding=5,
        cond_channels=0,
        conv_pre_weight_norm=True,
        conv_post_weight_norm=True,
        conv_post_bias=True,
    ):
        r"""HiFiGAN Generator with Multi-Receptive Field Fusion (MRF)

        Network:
            x -> lrelu -> upsampling_layer -> resblock1_k1x1 -> z1 -> + -> z_sum / #resblocks -> lrelu -> conv_post_7x1 -> tanh -> o
                                                 ..          -> zI ---|
                                              resblockN_kNx1 -> zN ---'

        Args:
            in_channels (int): number of input tensor channels.
            out_channels (int): number of output tensor channels.
            resblock_type (str): type of the `ResBlock`. '1' or '2'.
            resblock_dilation_sizes (List[List[int]]): list of dilation values in each layer of a `ResBlock`.
            resblock_kernel_sizes (List[int]): list of kernel sizes for each `ResBlock`.
            upsample_kernel_sizes (List[int]): list of kernel sizes for each transposed convolution.
            upsample_initial_channel (int): number of channels for the first upsampling layer. This is divided by 2
                for each consecutive upsampling layer.
            upsample_factors (List[int]): upsampling factors (stride) for each upsampling layer.
            inference_padding (int): constant padding applied to the input at inference time. Defaults to 5.
        """
        super().__init__()
        self.inference_padding = inference_padding
        self.num_kernels = len(resblock_kernel_sizes)
        self.num_upsamples = len(upsample_factors)
        # initial upsampling layers
        self.conv_pre = weight_norm(Conv1d(in_channels, upsample_initial_channel, 7, 1, padding=3))
        resblock = ResBlock1 if resblock_type == "1" else ResBlock2
        # upsampling layers
        self.ups = nn.ModuleList()
        for i, (u, k) in enumerate(zip(upsample_factors, upsample_kernel_sizes)):
            self.ups.append(
                weight_norm(
                    ConvTranspose1d(
                        upsample_initial_channel // (2**i),
                        upsample_initial_channel // (2 ** (i + 1)),
                        k,
                        u,
                        padding=(k - u) // 2,
                    )
                )
            )
        # MRF blocks
        self.resblocks = nn.ModuleList()
        for i in range(len(self.ups)):
            ch = upsample_initial_channel // (2 ** (i + 1))
            for _, (k, d) in enumerate(zip(resblock_kernel_sizes, resblock_dilation_sizes)):
                self.resblocks.append(resblock(ch, k, d))
        # post convolution layer
        self.conv_post = weight_norm(Conv1d(ch, out_channels, 7, 1, padding=3, bias=conv_post_bias))
        if cond_channels > 0:
            self.cond_layer = nn.Conv1d(cond_channels, upsample_initial_channel, 1)

        if not conv_pre_weight_norm:
            remove_weight_norm(self.conv_pre)

        if not conv_post_weight_norm:
            remove_weight_norm(self.conv_post)

    def forward(self, x, g=None):
        """
        Args:
            x (Tensor): feature input tensor.
            g (Tensor): global conditioning input tensor.

        Returns:
            Tensor: output waveform.

        Shapes:
            x: [B, C, T]
            Tensor: [B, 1, T]
        """
        o = self.conv_pre(x)
        if hasattr(self, "cond_layer"):
            o = o + self.cond_layer(g)
        for i in range(self.num_upsamples):
            o = F.leaky_relu(o, LRELU_SLOPE)
            o = self.ups[i](o)
            z_sum = None
            for j in range(self.num_kernels):
                if z_sum is None:
                    z_sum = self.resblocks[i * self.num_kernels + j](o)
                else:
                    z_sum += self.resblocks[i * self.num_kernels + j](o)
            o = z_sum / self.num_kernels
        o = F.leaky_relu(o)
        o = self.conv_post(o)
        o = torch.tanh(o)
        return o

    @torch.no_grad()
    def inference(self, c):
        """
        Args:
            x (Tensor): conditioning input tensor.

        Returns:
            Tensor: output waveform.

        Shapes:
            x: [B, C, T]
            Tensor: [B, 1, T]
        """
        c = c.to(self.conv_pre.weight.device)
        c = torch.nn.functional.pad(c, (self.inference_padding, self.inference_padding), "replicate")
        return self.forward(c)

    def remove_weight_norm(self):
        print("Removing weight norm...")
        for l in self.ups:
            remove_weight_norm(l)
        for l in self.resblocks:
            l.remove_weight_norm()
        remove_weight_norm(self.conv_pre)
        remove_weight_norm(self.conv_post)

    def load_checkpoint(
        self, config, checkpoint_path, eval=False, cache=False
    ):  # pylint: disable=unused-argument, redefined-builtin
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
        self.load_state_dict(state["model"])
        if eval:
            self.eval()
            assert not self.training
            self.remove_weight_norm()


================================================
FILE: TTS/vocoder/models/melgan_discriminator.py
================================================
import numpy as np
from torch import nn
from torch.nn.utils import weight_norm


class MelganDiscriminator(nn.Module):
    def __init__(
        self,
        in_channels=1,
        out_channels=1,
        kernel_sizes=(5, 3),
        base_channels=16,
        max_channels=1024,
        downsample_factors=(4, 4, 4, 4),
        groups_denominator=4,
    ):
        super().__init__()
        self.layers = nn.ModuleList()

        layer_kernel_size = np.prod(kernel_sizes)
        layer_padding = (layer_kernel_size - 1) // 2

        # initial layer
        self.layers += [
            nn.Sequential(
                nn.ReflectionPad1d(layer_padding),
                weight_norm(nn.Conv1d(in_channels, base_channels, layer_kernel_size, stride=1)),
                nn.LeakyReLU(0.2, inplace=True),
            )
        ]

        # downsampling layers
        layer_in_channels = base_channels
        for downsample_factor in downsample_factors:
            layer_out_channels = min(layer_in_channels * downsample_factor, max_channels)
            layer_kernel_size = downsample_factor * 10 + 1
            layer_padding = (layer_kernel_size - 1) // 2
            layer_groups = layer_in_channels // groups_denominator
            self.layers += [
                nn.Sequential(
                    weight_norm(
                        nn.Conv1d(
                            layer_in_channels,
                            layer_out_channels,
                            kernel_size=layer_kernel_size,
                            stride=downsample_factor,
                            padding=layer_padding,
                            groups=layer_groups,
                        )
                    ),
                    nn.LeakyReLU(0.2, inplace=True),
                )
            ]
            layer_in_channels = layer_out_channels

        # last 2 layers
        layer_padding1 = (kernel_sizes[0] - 1) // 2
        layer_padding2 = (kernel_sizes[1] - 1) // 2
        self.layers += [
            nn.Sequential(
                weight_norm(
                    nn.Conv1d(
                        layer_out_channels,
                        layer_out_channels,
                        kernel_size=kernel_sizes[0],
                        stride=1,
                        padding=layer_padding1,
                    )
                ),
                nn.LeakyReLU(0.2, inplace=True),
            ),
            weight_norm(
                nn.Conv1d(
                    layer_out_channels, out_channels, kernel_size=kernel_sizes[1], stride=1, padding=layer_padding2
                )
            ),
        ]

    def forward(self, x):
        feats = []
        for layer in self.layers:
            x = layer(x)
            feats.append(x)
        return x, feats


================================================
FILE: TTS/vocoder/models/melgan_generator.py
================================================
import torch
from torch import nn
from torch.nn.utils import weight_norm

from TTS.utils.io import load_fsspec
from TTS.vocoder.layers.melgan import ResidualStack


class MelganGenerator(nn.Module):
    def __init__(
        self,
        in_channels=80,
        out_channels=1,
        proj_kernel=7,
        base_channels=512,
        upsample_factors=(8, 8, 2, 2),
        res_kernel=3,
        num_res_blocks=3,
    ):
        super().__init__()

        # assert model parameters
        assert (proj_kernel - 1) % 2 == 0, " [!] proj_kernel should be an odd number."

        # setup additional model parameters
        base_padding = (proj_kernel - 1) // 2
        act_slope = 0.2
        self.inference_padding = 2

        # initial layer
        layers = []
        layers += [
            nn.ReflectionPad1d(base_padding),
            weight_norm(nn.Conv1d(in_channels, base_channels, kernel_size=proj_kernel, stride=1, bias=True)),
        ]

        # upsampling layers and residual stacks
        for idx, upsample_factor in enumerate(upsample_factors):
            layer_in_channels = base_channels // (2**idx)
            layer_out_channels = base_channels // (2 ** (idx + 1))
            layer_filter_size = upsample_factor * 2
            layer_stride = upsample_factor
            layer_output_padding = upsample_factor % 2
            layer_padding = upsample_factor // 2 + layer_output_padding
            layers += [
                nn.LeakyReLU(act_slope),
                weight_norm(
                    nn.ConvTranspose1d(
                        layer_in_channels,
                        layer_out_channels,
                        layer_filter_size,
                        stride=layer_stride,
                        padding=layer_padding,
                        output_padding=layer_output_padding,
                        bias=True,
                    )
                ),
                ResidualStack(channels=layer_out_channels, num_res_blocks=num_res_blocks, kernel_size=res_kernel),
            ]

        layers += [nn.LeakyReLU(act_slope)]

        # final layer
        layers += [
            nn.ReflectionPad1d(base_padding),
            weight_norm(nn.Conv1d(layer_out_channels, out_channels, proj_kernel, stride=1, bias=True)),
            nn.Tanh(),
        ]
        self.layers = nn.Sequential(*layers)

    def forward(self, c):
        return self.layers(c)

    def inference(self, c):
        c = c.to(self.layers[1].weight.device)
        c = torch.nn.functional.pad(c, (self.inference_padding, self.inference_padding), "replicate")
        return self.layers(c)

    def remove_weight_norm(self):
        for _, layer in enumerate(self.layers):
            if len(layer.state_dict()) != 0:
                try:
                    nn.utils.remove_weight_norm(layer)
                except ValueError:
                    layer.remove_weight_norm()

    def load_checkpoint(
        self, config, checkpoint_path, eval=False, cache=False
    ):  # pylint: disable=unused-argument, redefined-builtin
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
        self.load_state_dict(state["model"])
        if eval:
            self.eval()
            assert not self.training
            self.remove_weight_norm()


================================================
FILE: TTS/vocoder/models/melgan_multiscale_discriminator.py
================================================
from torch import nn

from TTS.vocoder.models.melgan_discriminator import MelganDiscriminator


class MelganMultiscaleDiscriminator(nn.Module):
    def __init__(
        self,
        in_channels=1,
        out_channels=1,
        num_scales=3,
        kernel_sizes=(5, 3),
        base_channels=16,
        max_channels=1024,
        downsample_factors=(4, 4, 4),
        pooling_kernel_size=4,
        pooling_stride=2,
        pooling_padding=2,
        groups_denominator=4,
    ):
        super().__init__()

        self.discriminators = nn.ModuleList(
            [
                MelganDiscriminator(
                    in_channels=in_channels,
                    out_channels=out_channels,
                    kernel_sizes=kernel_sizes,
                    base_channels=base_channels,
                    max_channels=max_channels,
                    downsample_factors=downsample_factors,
                    groups_denominator=groups_denominator,
                )
                for _ in range(num_scales)
            ]
        )

        self.pooling = nn.AvgPool1d(
            kernel_size=pooling_kernel_size, stride=pooling_stride, padding=pooling_padding, count_include_pad=False
        )

    def forward(self, x):
        scores = []
        feats = []
        for disc in self.discriminators:
            score, feat = disc(x)
            scores.append(score)
            feats.append(feat)
            x = self.pooling(x)
        return scores, feats


================================================
FILE: TTS/vocoder/models/multiband_melgan_generator.py
================================================
import torch

from TTS.vocoder.layers.pqmf import PQMF
from TTS.vocoder.models.melgan_generator import MelganGenerator


class MultibandMelganGenerator(MelganGenerator):
    def __init__(
        self,
        in_channels=80,
        out_channels=4,
        proj_kernel=7,
        base_channels=384,
        upsample_factors=(2, 8, 2, 2),
        res_kernel=3,
        num_res_blocks=3,
    ):
        super().__init__(
            in_channels=in_channels,
            out_channels=out_channels,
            proj_kernel=proj_kernel,
            base_channels=base_channels,
            upsample_factors=upsample_factors,
            res_kernel=res_kernel,
            num_res_blocks=num_res_blocks,
        )
        self.pqmf_layer = PQMF(N=4, taps=62, cutoff=0.15, beta=9.0)

    def pqmf_analysis(self, x):
        return self.pqmf_layer.analysis(x)

    def pqmf_synthesis(self, x):
        return self.pqmf_layer.synthesis(x)

    @torch.no_grad()
    def inference(self, cond_features):
        cond_features = cond_features.to(self.layers[1].weight.device)
        cond_features = torch.nn.functional.pad(
            cond_features, (self.inference_padding, self.inference_padding), "replicate"
        )
        return self.pqmf_synthesis(self.layers(cond_features))


================================================
FILE: TTS/vocoder/models/parallel_wavegan_discriminator.py
================================================
import math

import torch
from torch import nn

from TTS.vocoder.layers.parallel_wavegan import ResidualBlock


class ParallelWaveganDiscriminator(nn.Module):
    """PWGAN discriminator as in https://arxiv.org/abs/1910.11480.
    It classifies each audio window real/fake and returns a sequence
    of predictions.
        It is a stack of convolutional blocks with dilation.
    """

    # pylint: disable=dangerous-default-value
    def __init__(
        self,
        in_channels=1,
        out_channels=1,
        kernel_size=3,
        num_layers=10,
        conv_channels=64,
        dilation_factor=1,
        nonlinear_activation="LeakyReLU",
        nonlinear_activation_params={"negative_slope": 0.2},
        bias=True,
    ):
        super().__init__()
        assert (kernel_size - 1) % 2 == 0, " [!] does not support even number kernel size."
        assert dilation_factor > 0, " [!] dilation factor must be > 0."
        self.conv_layers = nn.ModuleList()
        conv_in_channels = in_channels
        for i in range(num_layers - 1):
            if i == 0:
                dilation = 1
            else:
                dilation = i if dilation_factor == 1 else dilation_factor**i
                conv_in_channels = conv_channels
            padding = (kernel_size - 1) // 2 * dilation
            conv_layer = [
                nn.Conv1d(
                    conv_in_channels,
                    conv_channels,
                    kernel_size=kernel_size,
                    padding=padding,
                    dilation=dilation,
                    bias=bias,
                ),
                getattr(nn, nonlinear_activation)(inplace=True, **nonlinear_activation_params),
            ]
            self.conv_layers += conv_layer
        padding = (kernel_size - 1) // 2
        last_conv_layer = nn.Conv1d(conv_in_channels, out_channels, kernel_size=kernel_size, padding=padding, bias=bias)
        self.conv_layers += [last_conv_layer]
        self.apply_weight_norm()

    def forward(self, x):
        """
            x : (B, 1, T).
        Returns:
            Tensor: (B, 1, T)
        """
        for f in self.conv_layers:
            x = f(x)
        return x

    def apply_weight_norm(self):
        def _apply_weight_norm(m):
            if isinstance(m, (torch.nn.Conv1d, torch.nn.Conv2d)):
                torch.nn.utils.weight_norm(m)

        self.apply(_apply_weight_norm)

    def remove_weight_norm(self):
        def _remove_weight_norm(m):
            try:
                # print(f"Weight norm is removed from {m}.")
                nn.utils.remove_weight_norm(m)
            except ValueError:  # this module didn't have weight norm
                return

        self.apply(_remove_weight_norm)


class ResidualParallelWaveganDiscriminator(nn.Module):
    # pylint: disable=dangerous-default-value
    def __init__(
        self,
        in_channels=1,
        out_channels=1,
        kernel_size=3,
        num_layers=30,
        stacks=3,
        res_channels=64,
        gate_channels=128,
        skip_channels=64,
        dropout=0.0,
        bias=True,
        nonlinear_activation="LeakyReLU",
        nonlinear_activation_params={"negative_slope": 0.2},
    ):
        super().__init__()
        assert (kernel_size - 1) % 2 == 0, "Not support even number kernel size."

        self.in_channels = in_channels
        self.out_channels = out_channels
        self.num_layers = num_layers
        self.stacks = stacks
        self.kernel_size = kernel_size
        self.res_factor = math.sqrt(1.0 / num_layers)

        # check the number of num_layers and stacks
        assert num_layers % stacks == 0
        layers_per_stack = num_layers // stacks

        # define first convolution
        self.first_conv = nn.Sequential(
            nn.Conv1d(in_channels, res_channels, kernel_size=1, padding=0, dilation=1, bias=True),
            getattr(nn, nonlinear_activation)(inplace=True, **nonlinear_activation_params),
        )

        # define residual blocks
        self.conv_layers = nn.ModuleList()
        for layer in range(num_layers):
            dilation = 2 ** (layer % layers_per_stack)
            conv = ResidualBlock(
                kernel_size=kernel_size,
                res_channels=res_channels,
                gate_channels=gate_channels,
                skip_channels=skip_channels,
                aux_channels=-1,
                dilation=dilation,
                dropout=dropout,
                bias=bias,
                use_causal_conv=False,
            )
            self.conv_layers += [conv]

        # define output layers
        self.last_conv_layers = nn.ModuleList(
            [
                getattr(nn, nonlinear_activation)(inplace=True, **nonlinear_activation_params),
                nn.Conv1d(skip_channels, skip_channels, kernel_size=1, padding=0, dilation=1, bias=True),
                getattr(nn, nonlinear_activation)(inplace=True, **nonlinear_activation_params),
                nn.Conv1d(skip_channels, out_channels, kernel_size=1, padding=0, dilation=1, bias=True),
            ]
        )

        # apply weight norm
        self.apply_weight_norm()

    def forward(self, x):
        """
        x: (B, 1, T).
        """
        x = self.first_conv(x)

        skips = 0
        for f in self.conv_layers:
            x, h = f(x, None)
            skips += h
        skips *= self.res_factor

        # apply final layers
        x = skips
        for f in self.last_conv_layers:
            x = f(x)
        return x

    def apply_weight_norm(self):
        def _apply_weight_norm(m):
            if isinstance(m, (torch.nn.Conv1d, torch.nn.Conv2d)):
                torch.nn.utils.weight_norm(m)

        self.apply(_apply_weight_norm)

    def remove_weight_norm(self):
        def _remove_weight_norm(m):
            try:
                print(f"Weight norm is removed from {m}.")
                nn.utils.remove_weight_norm(m)
            except ValueError:  # this module didn't have weight norm
                return

        self.apply(_remove_weight_norm)


================================================
FILE: TTS/vocoder/models/parallel_wavegan_generator.py
================================================
import math

import numpy as np
import torch

from TTS.utils.io import load_fsspec
from TTS.vocoder.layers.parallel_wavegan import ResidualBlock
from TTS.vocoder.layers.upsample import ConvUpsample


class ParallelWaveganGenerator(torch.nn.Module):
    """PWGAN generator as in https://arxiv.org/pdf/1910.11480.pdf.
    It is similar to WaveNet with no causal convolution.
        It is conditioned on an aux feature (spectrogram) to generate
    an output waveform from an input noise.
    """

    # pylint: disable=dangerous-default-value
    def __init__(
        self,
        in_channels=1,
        out_channels=1,
        kernel_size=3,
        num_res_blocks=30,
        stacks=3,
        res_channels=64,
        gate_channels=128,
        skip_channels=64,
        aux_channels=80,
        dropout=0.0,
        bias=True,
        use_weight_norm=True,
        upsample_factors=[4, 4, 4, 4],
        inference_padding=2,
    ):
        super().__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.aux_channels = aux_channels
        self.num_res_blocks = num_res_blocks
        self.stacks = stacks
        self.kernel_size = kernel_size
        self.upsample_factors = upsample_factors
        self.upsample_scale = np.prod(upsample_factors)
        self.inference_padding = inference_padding
        self.use_weight_norm = use_weight_norm

        # check the number of layers and stacks
        assert num_res_blocks % stacks == 0
        layers_per_stack = num_res_blocks // stacks

        # define first convolution
        self.first_conv = torch.nn.Conv1d(in_channels, res_channels, kernel_size=1, bias=True)

        # define conv + upsampling network
        self.upsample_net = ConvUpsample(upsample_factors=upsample_factors)

        # define residual blocks
        self.conv_layers = torch.nn.ModuleList()
        for layer in range(num_res_blocks):
            dilation = 2 ** (layer % layers_per_stack)
            conv = ResidualBlock(
                kernel_size=kernel_size,
                res_channels=res_channels,
                gate_channels=gate_channels,
                skip_channels=skip_channels,
                aux_channels=aux_channels,
                dilation=dilation,
                dropout=dropout,
                bias=bias,
            )
            self.conv_layers += [conv]

        # define output layers
        self.last_conv_layers = torch.nn.ModuleList(
            [
                torch.nn.ReLU(inplace=True),
                torch.nn.Conv1d(skip_channels, skip_channels, kernel_size=1, bias=True),
                torch.nn.ReLU(inplace=True),
                torch.nn.Conv1d(skip_channels, out_channels, kernel_size=1, bias=True),
            ]
        )

        # apply weight norm
        if use_weight_norm:
            self.apply_weight_norm()

    def forward(self, c):
        """
        c: (B, C ,T').
        o: Output tensor (B, out_channels, T)
        """
        # random noise
        x = torch.randn([c.shape[0], 1, c.shape[2] * self.upsample_scale])
        x = x.to(self.first_conv.bias.device)

        # perform upsampling
        if c is not None and self.upsample_net is not None:
            c = self.upsample_net(c)
            assert (
                c.shape[-1] == x.shape[-1]
            ), f" [!] Upsampling scale does not match the expected output. {c.shape} vs {x.shape}"

        # encode to hidden representation
        x = self.first_conv(x)
        skips = 0
        for f in self.conv_layers:
            x, h = f(x, c)
            skips += h
        skips *= math.sqrt(1.0 / len(self.conv_layers))

        # apply final layers
        x = skips
        for f in self.last_conv_layers:
            x = f(x)

        return x

    @torch.no_grad()
    def inference(self, c):
        c = c.to(self.first_conv.weight.device)
        c = torch.nn.functional.pad(c, (self.inference_padding, self.inference_padding), "replicate")
        return self.forward(c)

    def remove_weight_norm(self):
        def _remove_weight_norm(m):
            try:
                # print(f"Weight norm is removed from {m}.")
                torch.nn.utils.remove_weight_norm(m)
            except ValueError:  # this module didn't have weight norm
                return

        self.apply(_remove_weight_norm)

    def apply_weight_norm(self):
        def _apply_weight_norm(m):
            if isinstance(m, (torch.nn.Conv1d, torch.nn.Conv2d)):
                torch.nn.utils.weight_norm(m)
                # print(f"Weight norm is applied to {m}.")

        self.apply(_apply_weight_norm)

    @staticmethod
    def _get_receptive_field_size(layers, stacks, kernel_size, dilation=lambda x: 2**x):
        assert layers % stacks == 0
        layers_per_cycle = layers // stacks
        dilations = [dilation(i % layers_per_cycle) for i in range(layers)]
        return (kernel_size - 1) * sum(dilations) + 1

    @property
    def receptive_field_size(self):
        return self._get_receptive_field_size(self.layers, self.stacks, self.kernel_size)

    def load_checkpoint(
        self, config, checkpoint_path, eval=False, cache=False
    ):  # pylint: disable=unused-argument, redefined-builtin
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
        self.load_state_dict(state["model"])
        if eval:
            self.eval()
            assert not self.training
            if self.use_weight_norm:
                self.remove_weight_norm()


================================================
FILE: TTS/vocoder/models/random_window_discriminator.py
================================================
import numpy as np
from torch import nn


class GBlock(nn.Module):
    def __init__(self, in_channels, cond_channels, downsample_factor):
        super().__init__()

        self.in_channels = in_channels
        self.cond_channels = cond_channels
        self.downsample_factor = downsample_factor

        self.start = nn.Sequential(
            nn.AvgPool1d(downsample_factor, stride=downsample_factor),
            nn.ReLU(),
            nn.Conv1d(in_channels, in_channels * 2, kernel_size=3, padding=1),
        )
        self.lc_conv1d = nn.Conv1d(cond_channels, in_channels * 2, kernel_size=1)
        self.end = nn.Sequential(
            nn.ReLU(), nn.Conv1d(in_channels * 2, in_channels * 2, kernel_size=3, dilation=2, padding=2)
        )
        self.residual = nn.Sequential(
            nn.Conv1d(in_channels, in_channels * 2, kernel_size=1),
            nn.AvgPool1d(downsample_factor, stride=downsample_factor),
        )

    def forward(self, inputs, conditions):
        outputs = self.start(inputs) + self.lc_conv1d(conditions)
        outputs = self.end(outputs)
        residual_outputs = self.residual(inputs)
        outputs = outputs + residual_outputs

        return outputs


class DBlock(nn.Module):
    def __init__(self, in_channels, out_channels, downsample_factor):
        super().__init__()

        self.in_channels = in_channels
        self.downsample_factor = downsample_factor
        self.out_channels = out_channels

        self.donwsample_layer = nn.AvgPool1d(downsample_factor, stride=downsample_factor)
        self.layers = nn.Sequential(
            nn.ReLU(),
            nn.Conv1d(in_channels, out_channels, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv1d(out_channels, out_channels, kernel_size=3, dilation=2, padding=2),
        )
        self.residual = nn.Sequential(
            nn.Conv1d(in_channels, out_channels, kernel_size=1),
        )

    def forward(self, inputs):
        if self.downsample_factor > 1:
            outputs = self.layers(self.donwsample_layer(inputs)) + self.donwsample_layer(self.residual(inputs))
        else:
            outputs = self.layers(inputs) + self.residual(inputs)
        return outputs


class ConditionalDiscriminator(nn.Module):
    def __init__(self, in_channels, cond_channels, downsample_factors=(2, 2, 2), out_channels=(128, 256)):
        super().__init__()

        assert len(downsample_factors) == len(out_channels) + 1

        self.in_channels = in_channels
        self.cond_channels = cond_channels
        self.downsample_factors = downsample_factors
        self.out_channels = out_channels

        self.pre_cond_layers = nn.ModuleList()
        self.post_cond_layers = nn.ModuleList()

        # layers before condition features
        self.pre_cond_layers += [DBlock(in_channels, 64, 1)]
        in_channels = 64
        for i, channel in enumerate(out_channels):
            self.pre_cond_layers.append(DBlock(in_channels, channel, downsample_factors[i]))
            in_channels = channel

        # condition block
        self.cond_block = GBlock(in_channels, cond_channels, downsample_factors[-1])

        # layers after condition block
        self.post_cond_layers += [
            DBlock(in_channels * 2, in_channels * 2, 1),
            DBlock(in_channels * 2, in_channels * 2, 1),
            nn.AdaptiveAvgPool1d(1),
            nn.Conv1d(in_channels * 2, 1, kernel_size=1),
        ]

    def forward(self, inputs, conditions):
        batch_size = inputs.size()[0]
        outputs = inputs.view(batch_size, self.in_channels, -1)
        for layer in self.pre_cond_layers:
            outputs = layer(outputs)
        outputs = self.cond_block(outputs, conditions)
        for layer in self.post_cond_layers:
            outputs = layer(outputs)

        return outputs


class UnconditionalDiscriminator(nn.Module):
    def __init__(self, in_channels, base_channels=64, downsample_factors=(8, 4), out_channels=(128, 256)):
        super().__init__()

        self.downsample_factors = downsample_factors
        self.in_channels = in_channels
        self.downsample_factors = downsample_factors
        self.out_channels = out_channels

        self.layers = nn.ModuleList()
        self.layers += [DBlock(self.in_channels, base_channels, 1)]
        in_channels = base_channels
        for i, factor in enumerate(downsample_factors):
            self.layers.append(DBlock(in_channels, out_channels[i], factor))
            in_channels *= 2
        self.layers += [
            DBlock(in_channels, in_channels, 1),
            DBlock(in_channels, in_channels, 1),
            nn.AdaptiveAvgPool1d(1),
            nn.Conv1d(in_channels, 1, kernel_size=1),
        ]

    def forward(self, inputs):
        batch_size = inputs.size()[0]
        outputs = inputs.view(batch_size, self.in_channels, -1)
        for layer in self.layers:
            outputs = layer(outputs)
        return outputs


class RandomWindowDiscriminator(nn.Module):
    """Random Window Discriminator as described in
    http://arxiv.org/abs/1909.11646"""

    def __init__(
        self,
        cond_channels,
        hop_length,
        uncond_disc_donwsample_factors=(8, 4),
        cond_disc_downsample_factors=((8, 4, 2, 2, 2), (8, 4, 2, 2), (8, 4, 2), (8, 4), (4, 2, 2)),
        cond_disc_out_channels=((128, 128, 256, 256), (128, 256, 256), (128, 256), (256,), (128, 256)),
        window_sizes=(512, 1024, 2048, 4096, 8192),
    ):
        super().__init__()
        self.cond_channels = cond_channels
        self.window_sizes = window_sizes
        self.hop_length = hop_length
        self.base_window_size = self.hop_length * 2
        self.ks = [ws // self.base_window_size for ws in window_sizes]

        # check arguments
        assert len(cond_disc_downsample_factors) == len(cond_disc_out_channels) == len(window_sizes)
        for ws in window_sizes:
            assert ws % hop_length == 0

        for idx, cf in enumerate(cond_disc_downsample_factors):
            assert np.prod(cf) == hop_length // self.ks[idx]

        # define layers
        self.unconditional_discriminators = nn.ModuleList([])
        for k in self.ks:
            layer = UnconditionalDiscriminator(
                in_channels=k, base_channels=64, downsample_factors=uncond_disc_donwsample_factors
            )
            self.unconditional_discriminators.append(layer)

        self.conditional_discriminators = nn.ModuleList([])
        for idx, k in enumerate(self.ks):
            layer = ConditionalDiscriminator(
                in_channels=k,
                cond_channels=cond_channels,
                downsample_factors=cond_disc_downsample_factors[idx],
                out_channels=cond_disc_out_channels[idx],
            )
            self.conditional_discriminators.append(layer)

    def forward(self, x, c):
        scores = []
        feats = []
        # unconditional pass
        for window_size, layer in zip(self.window_sizes, self.unconditional_discriminators):
            index = np.random.randint(x.shape[-1] - window_size)

            score = layer(x[:, :, index : index + window_size])
            scores.append(score)

        # conditional pass
        for window_size, layer in zip(self.window_sizes, self.conditional_discriminators):
            frame_size = window_size // self.hop_length
            lc_index = np.random.randint(c.shape[-1] - frame_size)
            sample_index = lc_index * self.hop_length
            x_sub = x[:, :, sample_index : (lc_index + frame_size) * self.hop_length]
            c_sub = c[:, :, lc_index : lc_index + frame_size]

            score = layer(x_sub, c_sub)
            scores.append(score)
        return scores, feats


================================================
FILE: TTS/vocoder/models/univnet_discriminator.py
================================================
import torch
import torch.nn.functional as F
from torch import nn
from torch.nn.utils import spectral_norm, weight_norm

from TTS.utils.audio.torch_transforms import TorchSTFT
from TTS.vocoder.models.hifigan_discriminator import MultiPeriodDiscriminator

LRELU_SLOPE = 0.1


class SpecDiscriminator(nn.Module):
    """docstring for Discriminator."""

    def __init__(self, fft_size=1024, hop_length=120, win_length=600, use_spectral_norm=False):
        super().__init__()
        norm_f = weight_norm if use_spectral_norm is False else spectral_norm
        self.fft_size = fft_size
        self.hop_length = hop_length
        self.win_length = win_length
        self.stft = TorchSTFT(fft_size, hop_length, win_length)
        self.discriminators = nn.ModuleList(
            [
                norm_f(nn.Conv2d(1, 32, kernel_size=(3, 9), padding=(1, 4))),
                norm_f(nn.Conv2d(32, 32, kernel_size=(3, 9), stride=(1, 2), padding=(1, 4))),
                norm_f(nn.Conv2d(32, 32, kernel_size=(3, 9), stride=(1, 2), padding=(1, 4))),
                norm_f(nn.Conv2d(32, 32, kernel_size=(3, 9), stride=(1, 2), padding=(1, 4))),
                norm_f(nn.Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))),
            ]
        )

        self.out = norm_f(nn.Conv2d(32, 1, 3, 1, 1))

    def forward(self, y):
        fmap = []
        with torch.no_grad():
            y = y.squeeze(1)
            y = self.stft(y)
        y = y.unsqueeze(1)
        for _, d in enumerate(self.discriminators):
            y = d(y)
            y = F.leaky_relu(y, LRELU_SLOPE)
            fmap.append(y)

        y = self.out(y)
        fmap.append(y)

        return torch.flatten(y, 1, -1), fmap


class MultiResSpecDiscriminator(torch.nn.Module):
    def __init__(  # pylint: disable=dangerous-default-value
        self, fft_sizes=[1024, 2048, 512], hop_sizes=[120, 240, 50], win_lengths=[600, 1200, 240], window="hann_window"
    ):
        super().__init__()
        self.discriminators = nn.ModuleList(
            [
                SpecDiscriminator(fft_sizes[0], hop_sizes[0], win_lengths[0], window),
                SpecDiscriminator(fft_sizes[1], hop_sizes[1], win_lengths[1], window),
                SpecDiscriminator(fft_sizes[2], hop_sizes[2], win_lengths[2], window),
            ]
        )

    def forward(self, x):
        scores = []
        feats = []
        for d in self.discriminators:
            score, feat = d(x)
            scores.append(score)
            feats.append(feat)

        return scores, feats


class UnivnetDiscriminator(nn.Module):
    """Univnet discriminator wrapping MPD and MSD."""

    def __init__(self):
        super().__init__()
        self.mpd = MultiPeriodDiscriminator()
        self.msd = MultiResSpecDiscriminator()

    def forward(self, x):
        """
        Args:
            x (Tensor): input waveform.

        Returns:
            List[Tensor]: discriminator scores.
            List[List[Tensor]]: list of list of features from each layers of each discriminator.
        """
        scores, feats = self.mpd(x)
        scores_, feats_ = self.msd(x)
        return scores + scores_, feats + feats_


================================================
FILE: TTS/vocoder/models/univnet_generator.py
================================================
from typing import List

import numpy as np
import torch
import torch.nn.functional as F

from TTS.vocoder.layers.lvc_block import LVCBlock

LRELU_SLOPE = 0.1


class UnivnetGenerator(torch.nn.Module):
    def __init__(
        self,
        in_channels: int,
        out_channels: int,
        hidden_channels: int,
        cond_channels: int,
        upsample_factors: List[int],
        lvc_layers_each_block: int,
        lvc_kernel_size: int,
        kpnet_hidden_channels: int,
        kpnet_conv_size: int,
        dropout: float,
        use_weight_norm=True,
    ):
        """Univnet Generator network.

        Paper: https://arxiv.org/pdf/2106.07889.pdf

        Args:
            in_channels (int): Number of input tensor channels.
            out_channels (int): Number of channels of the output tensor.
            hidden_channels (int): Number of hidden network channels.
            cond_channels (int): Number of channels of the conditioning tensors.
            upsample_factors (List[int]): List of uplsample factors for the upsampling layers.
            lvc_layers_each_block (int): Number of LVC layers in each block.
            lvc_kernel_size (int): Kernel size of the LVC layers.
            kpnet_hidden_channels (int): Number of hidden channels in the key-point network.
            kpnet_conv_size (int): Number of convolution channels in the key-point network.
            dropout (float): Dropout rate.
            use_weight_norm (bool, optional): Enable/disable weight norm. Defaults to True.
        """

        super().__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.cond_channels = cond_channels
        self.upsample_scale = np.prod(upsample_factors)
        self.lvc_block_nums = len(upsample_factors)

        # define first convolution
        self.first_conv = torch.nn.Conv1d(
            in_channels, hidden_channels, kernel_size=7, padding=(7 - 1) // 2, dilation=1, bias=True
        )

        # define residual blocks
        self.lvc_blocks = torch.nn.ModuleList()
        cond_hop_length = 1
        for n in range(self.lvc_block_nums):
            cond_hop_length = cond_hop_length * upsample_factors[n]
            lvcb = LVCBlock(
                in_channels=hidden_channels,
                cond_channels=cond_channels,
                upsample_ratio=upsample_factors[n],
                conv_layers=lvc_layers_each_block,
                conv_kernel_size=lvc_kernel_size,
                cond_hop_length=cond_hop_length,
                kpnet_hidden_channels=kpnet_hidden_channels,
                kpnet_conv_size=kpnet_conv_size,
                kpnet_dropout=dropout,
            )
            self.lvc_blocks += [lvcb]

        # define output layers
        self.last_conv_layers = torch.nn.ModuleList(
            [
                torch.nn.Conv1d(
                    hidden_channels, out_channels, kernel_size=7, padding=(7 - 1) // 2, dilation=1, bias=True
                ),
            ]
        )

        # apply weight norm
        if use_weight_norm:
            self.apply_weight_norm()

    def forward(self, c):
        """Calculate forward propagation.
        Args:
            c (Tensor): Local conditioning auxiliary features (B, C ,T').
        Returns:
            Tensor: Output tensor (B, out_channels, T)
        """
        # random noise
        x = torch.randn([c.shape[0], self.in_channels, c.shape[2]])
        x = x.to(self.first_conv.bias.device)
        x = self.first_conv(x)

        for n in range(self.lvc_block_nums):
            x = self.lvc_blocks[n](x, c)

        # apply final layers
        for f in self.last_conv_layers:
            x = F.leaky_relu(x, LRELU_SLOPE)
            x = f(x)
        x = torch.tanh(x)
        return x

    def remove_weight_norm(self):
        """Remove weight normalization module from all of the layers."""

        def _remove_weight_norm(m):
            try:
                # print(f"Weight norm is removed from {m}.")
                torch.nn.utils.remove_weight_norm(m)
            except ValueError:  # this module didn't have weight norm
                return

        self.apply(_remove_weight_norm)

    def apply_weight_norm(self):
        """Apply weight normalization module from all of the layers."""

        def _apply_weight_norm(m):
            if isinstance(m, (torch.nn.Conv1d, torch.nn.Conv2d)):
                torch.nn.utils.weight_norm(m)
                # print(f"Weight norm is applied to {m}.")

        self.apply(_apply_weight_norm)

    @staticmethod
    def _get_receptive_field_size(layers, stacks, kernel_size, dilation=lambda x: 2**x):
        assert layers % stacks == 0
        layers_per_cycle = layers // stacks
        dilations = [dilation(i % layers_per_cycle) for i in range(layers)]
        return (kernel_size - 1) * sum(dilations) + 1

    @property
    def receptive_field_size(self):
        """Return receptive field size."""
        return self._get_receptive_field_size(self.layers, self.stacks, self.kernel_size)

    @torch.no_grad()
    def inference(self, c):
        """Perform inference.
        Args:
            c (Tensor): Local conditioning auxiliary features :math:`(B, C, T)`.
        Returns:
            Tensor: Output tensor (T, out_channels)
        """
        x = torch.randn([c.shape[0], self.in_channels, c.shape[2]])
        x = x.to(self.first_conv.bias.device)

        c = c.to(next(self.parameters()))
        return self.forward(c)


================================================
FILE: TTS/vocoder/models/wavegrad.py
================================================
from dataclasses import dataclass, field
from typing import Dict, List, Tuple

import numpy as np
import torch
from coqpit import Coqpit
from torch import nn
from torch.nn.utils import weight_norm
from torch.utils.data import DataLoader
from torch.utils.data.distributed import DistributedSampler
from trainer.trainer_utils import get_optimizer, get_scheduler

from TTS.utils.io import load_fsspec
from TTS.vocoder.datasets import WaveGradDataset
from TTS.vocoder.layers.wavegrad import Conv1d, DBlock, FiLM, UBlock
from TTS.vocoder.models.base_vocoder import BaseVocoder
from TTS.vocoder.utils.generic_utils import plot_results


@dataclass
class WavegradArgs(Coqpit):
    in_channels: int = 80
    out_channels: int = 1
    use_weight_norm: bool = False
    y_conv_channels: int = 32
    x_conv_channels: int = 768
    dblock_out_channels: List[int] = field(default_factory=lambda: [128, 128, 256, 512])
    ublock_out_channels: List[int] = field(default_factory=lambda: [512, 512, 256, 128, 128])
    upsample_factors: List[int] = field(default_factory=lambda: [4, 4, 4, 2, 2])
    upsample_dilations: List[List[int]] = field(
        default_factory=lambda: [[1, 2, 1, 2], [1, 2, 1, 2], [1, 2, 4, 8], [1, 2, 4, 8], [1, 2, 4, 8]]
    )


class Wavegrad(BaseVocoder):
    """🐸 🌊 WaveGrad 🌊 model.
    Paper - https://arxiv.org/abs/2009.00713

    Examples:
        Initializing the model.

        >>> from TTS.vocoder.configs import WavegradConfig
        >>> config = WavegradConfig()
        >>> model = Wavegrad(config)

    Paper Abstract:
        This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the
        data density. The model is built on prior work on score matching and diffusion probabilistic models. It starts
        from a Gaussian white noise signal and iteratively refines the signal via a gradient-based sampler conditioned
        on the mel-spectrogram. WaveGrad offers a natural way to trade inference speed for sample quality by adjusting
        the number of refinement steps, and bridges the gap between non-autoregressive and autoregressive models in
        terms of audio quality. We find that it can generate high fidelity audio samples using as few as six iterations.
        Experiments reveal WaveGrad to generate high fidelity audio, outperforming adversarial non-autoregressive
        baselines and matching a strong likelihood-based autoregressive baseline using fewer sequential operations.
        Audio samples are available at this https URL.
    """

    # pylint: disable=dangerous-default-value
    def __init__(self, config: Coqpit):
        super().__init__(config)
        self.config = config
        self.use_weight_norm = config.model_params.use_weight_norm
        self.hop_len = np.prod(config.model_params.upsample_factors)
        self.noise_level = None
        self.num_steps = None
        self.beta = None
        self.alpha = None
        self.alpha_hat = None
        self.c1 = None
        self.c2 = None
        self.sigma = None

        # dblocks
        self.y_conv = Conv1d(1, config.model_params.y_conv_channels, 5, padding=2)
        self.dblocks = nn.ModuleList([])
        ic = config.model_params.y_conv_channels
        for oc, df in zip(config.model_params.dblock_out_channels, reversed(config.model_params.upsample_factors)):
            self.dblocks.append(DBlock(ic, oc, df))
            ic = oc

        # film
        self.film = nn.ModuleList([])
        ic = config.model_params.y_conv_channels
        for oc in reversed(config.model_params.ublock_out_channels):
            self.film.append(FiLM(ic, oc))
            ic = oc

        # ublocksn
        self.ublocks = nn.ModuleList([])
        ic = config.model_params.x_conv_channels
        for oc, uf, ud in zip(
            config.model_params.ublock_out_channels,
            config.model_params.upsample_factors,
            config.model_params.upsample_dilations,
        ):
            self.ublocks.append(UBlock(ic, oc, uf, ud))
            ic = oc

        self.x_conv = Conv1d(config.model_params.in_channels, config.model_params.x_conv_channels, 3, padding=1)
        self.out_conv = Conv1d(oc, config.model_params.out_channels, 3, padding=1)

        if config.model_params.use_weight_norm:
            self.apply_weight_norm()

    def forward(self, x, spectrogram, noise_scale):
        shift_and_scale = []

        x = self.y_conv(x)
        shift_and_scale.append(self.film[0](x, noise_scale))

        for film, layer in zip(self.film[1:], self.dblocks):
            x = layer(x)
            shift_and_scale.append(film(x, noise_scale))

        x = self.x_conv(spectrogram)
        for layer, (film_shift, film_scale) in zip(self.ublocks, reversed(shift_and_scale)):
            x = layer(x, film_shift, film_scale)
        x = self.out_conv(x)
        return x

    def load_noise_schedule(self, path):
        beta = np.load(path, allow_pickle=True).item()["beta"]  # pylint: disable=unexpected-keyword-arg
        self.compute_noise_level(beta)

    @torch.no_grad()
    def inference(self, x, y_n=None):
        """
        Shapes:
            x: :math:`[B, C , T]`
            y_n: :math:`[B, 1, T]`
        """
        if y_n is None:
            y_n = torch.randn(x.shape[0], 1, self.hop_len * x.shape[-1])
        else:
            y_n = torch.FloatTensor(y_n).unsqueeze(0).unsqueeze(0)
        y_n = y_n.type_as(x)
        sqrt_alpha_hat = self.noise_level.to(x)
        for n in range(len(self.alpha) - 1, -1, -1):
            y_n = self.c1[n] * (y_n - self.c2[n] * self.forward(y_n, x, sqrt_alpha_hat[n].repeat(x.shape[0])))
            if n > 0:
                z = torch.randn_like(y_n)
                y_n += self.sigma[n - 1] * z
            y_n.clamp_(-1.0, 1.0)
        return y_n

    def compute_y_n(self, y_0):
        """Compute noisy audio based on noise schedule"""
        self.noise_level = self.noise_level.to(y_0)
        if len(y_0.shape) == 3:
            y_0 = y_0.squeeze(1)
        s = torch.randint(0, self.num_steps - 1, [y_0.shape[0]])
        l_a, l_b = self.noise_level[s], self.noise_level[s + 1]
        noise_scale = l_a + torch.rand(y_0.shape[0]).to(y_0) * (l_b - l_a)
        noise_scale = noise_scale.unsqueeze(1)
        noise = torch.randn_like(y_0)
        noisy_audio = noise_scale * y_0 + (1.0 - noise_scale**2) ** 0.5 * noise
        return noise.unsqueeze(1), noisy_audio.unsqueeze(1), noise_scale[:, 0]

    def compute_noise_level(self, beta):
        """Compute noise schedule parameters"""
        self.num_steps = len(beta)
        alpha = 1 - beta
        alpha_hat = np.cumprod(alpha)
        noise_level = np.concatenate([[1.0], alpha_hat**0.5], axis=0)
        noise_level = alpha_hat**0.5

        # pylint: disable=not-callable
        self.beta = torch.tensor(beta.astype(np.float32))
        self.alpha = torch.tensor(alpha.astype(np.float32))
        self.alpha_hat = torch.tensor(alpha_hat.astype(np.float32))
        self.noise_level = torch.tensor(noise_level.astype(np.float32))

        self.c1 = 1 / self.alpha**0.5
        self.c2 = (1 - self.alpha) / (1 - self.alpha_hat) ** 0.5
        self.sigma = ((1.0 - self.alpha_hat[:-1]) / (1.0 - self.alpha_hat[1:]) * self.beta[1:]) ** 0.5

    def remove_weight_norm(self):
        for _, layer in enumerate(self.dblocks):
            if len(layer.state_dict()) != 0:
                try:
                    nn.utils.remove_weight_norm(layer)
                except ValueError:
                    layer.remove_weight_norm()

        for _, layer in enumerate(self.film):
            if len(layer.state_dict()) != 0:
                try:
                    nn.utils.remove_weight_norm(layer)
                except ValueError:
                    layer.remove_weight_norm()

        for _, layer in enumerate(self.ublocks):
            if len(layer.state_dict()) != 0:
                try:
                    nn.utils.remove_weight_norm(layer)
                except ValueError:
                    layer.remove_weight_norm()

        nn.utils.remove_weight_norm(self.x_conv)
        nn.utils.remove_weight_norm(self.out_conv)
        nn.utils.remove_weight_norm(self.y_conv)

    def apply_weight_norm(self):
        for _, layer in enumerate(self.dblocks):
            if len(layer.state_dict()) != 0:
                layer.apply_weight_norm()

        for _, layer in enumerate(self.film):
            if len(layer.state_dict()) != 0:
                layer.apply_weight_norm()

        for _, layer in enumerate(self.ublocks):
            if len(layer.state_dict()) != 0:
                layer.apply_weight_norm()

        self.x_conv = weight_norm(self.x_conv)
        self.out_conv = weight_norm(self.out_conv)
        self.y_conv = weight_norm(self.y_conv)

    def load_checkpoint(
        self, config, checkpoint_path, eval=False, cache=False
    ):  # pylint: disable=unused-argument, redefined-builtin
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
        self.load_state_dict(state["model"])
        if eval:
            self.eval()
            assert not self.training
            if self.config.model_params.use_weight_norm:
                self.remove_weight_norm()
            betas = np.linspace(
                config["test_noise_schedule"]["min_val"],
                config["test_noise_schedule"]["max_val"],
                config["test_noise_schedule"]["num_steps"],
            )
            self.compute_noise_level(betas)
        else:
            betas = np.linspace(
                config["train_noise_schedule"]["min_val"],
                config["train_noise_schedule"]["max_val"],
                config["train_noise_schedule"]["num_steps"],
            )
            self.compute_noise_level(betas)

    def train_step(self, batch: Dict, criterion: Dict) -> Tuple[Dict, Dict]:
        # format data
        x = batch["input"]
        y = batch["waveform"]

        # set noise scale
        noise, x_noisy, noise_scale = self.compute_y_n(y)

        # forward pass
        noise_hat = self.forward(x_noisy, x, noise_scale)

        # compute losses
        loss = criterion(noise, noise_hat)
        return {"model_output": noise_hat}, {"loss": loss}

    def train_log(  # pylint: disable=no-self-use
        self, batch: Dict, outputs: Dict, logger: "Logger", assets: Dict, steps: int  # pylint: disable=unused-argument
    ) -> Tuple[Dict, np.ndarray]:
        pass

    @torch.no_grad()
    def eval_step(self, batch: Dict, criterion: nn.Module) -> Tuple[Dict, Dict]:
        return self.train_step(batch, criterion)

    def eval_log(  # pylint: disable=no-self-use
        self, batch: Dict, outputs: Dict, logger: "Logger", assets: Dict, steps: int  # pylint: disable=unused-argument
    ) -> None:
        pass

    def test(self, assets: Dict, test_loader: "DataLoader", outputs=None):  # pylint: disable=unused-argument
        # setup noise schedule and inference
        ap = assets["audio_processor"]
        noise_schedule = self.config["test_noise_schedule"]
        betas = np.linspace(noise_schedule["min_val"], noise_schedule["max_val"], noise_schedule["num_steps"])
        self.compute_noise_level(betas)
        samples = test_loader.dataset.load_test_samples(1)
        for sample in samples:
            x = sample[0]
            x = x[None, :, :].to(next(self.parameters()).device)
            y = sample[1]
            y = y[None, :]
            # compute voice
            y_pred = self.inference(x)
            # compute spectrograms
            figures = plot_results(y_pred, y, ap, "test")
            # Sample audio
            sample_voice = y_pred[0].squeeze(0).detach().cpu().numpy()
        return figures, {"test/audio": sample_voice}

    def get_optimizer(self):
        return get_optimizer(self.config.optimizer, self.config.optimizer_params, self.config.lr, self)

    def get_scheduler(self, optimizer):
        return get_scheduler(self.config.lr_scheduler, self.config.lr_scheduler_params, optimizer)

    @staticmethod
    def get_criterion():
        return torch.nn.L1Loss()

    @staticmethod
    def format_batch(batch: Dict) -> Dict:
        # return a whole audio segment
        m, y = batch[0], batch[1]
        y = y.unsqueeze(1)
        return {"input": m, "waveform": y}

    def get_data_loader(self, config: Coqpit, assets: Dict, is_eval: True, samples: List, verbose: bool, num_gpus: int):
        ap = assets["audio_processor"]
        dataset = WaveGradDataset(
            ap=ap,
            items=samples,
            seq_len=self.config.seq_len,
            hop_len=ap.hop_length,
            pad_short=self.config.pad_short,
            conv_pad=self.config.conv_pad,
            is_training=not is_eval,
            return_segments=True,
            use_noise_augment=False,
            use_cache=config.use_cache,
            verbose=verbose,
        )
        sampler = DistributedSampler(dataset) if num_gpus > 1 else None
        loader = DataLoader(
            dataset,
            batch_size=self.config.batch_size,
            shuffle=num_gpus <= 1,
            drop_last=False,
            sampler=sampler,
            num_workers=self.config.num_eval_loader_workers if is_eval else self.config.num_loader_workers,
            pin_memory=False,
        )
        return loader

    def on_epoch_start(self, trainer):  # pylint: disable=unused-argument
        noise_schedule = self.config["train_noise_schedule"]
        betas = np.linspace(noise_schedule["min_val"], noise_schedule["max_val"], noise_schedule["num_steps"])
        self.compute_noise_level(betas)

    @staticmethod
    def init_from_config(config: "WavegradConfig"):
        return Wavegrad(config)


================================================
FILE: TTS/vocoder/models/wavernn.py
================================================
import sys
import time
from dataclasses import dataclass, field
from typing import Dict, List, Tuple

import numpy as np
import torch
import torch.nn.functional as F
from coqpit import Coqpit
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.data.distributed import DistributedSampler

from TTS.tts.utils.visual import plot_spectrogram
from TTS.utils.audio import AudioProcessor
from TTS.utils.io import load_fsspec
from TTS.vocoder.datasets.wavernn_dataset import WaveRNNDataset
from TTS.vocoder.layers.losses import WaveRNNLoss
from TTS.vocoder.models.base_vocoder import BaseVocoder
from TTS.vocoder.utils.distribution import sample_from_discretized_mix_logistic, sample_from_gaussian


def stream(string, variables):
    sys.stdout.write(f"\r{string}" % variables)


# pylint: disable=abstract-method
# relates https://github.com/pytorch/pytorch/issues/42305
class ResBlock(nn.Module):
    def __init__(self, dims):
        super().__init__()
        self.conv1 = nn.Conv1d(dims, dims, kernel_size=1, bias=False)
        self.conv2 = nn.Conv1d(dims, dims, kernel_size=1, bias=False)
        self.batch_norm1 = nn.BatchNorm1d(dims)
        self.batch_norm2 = nn.BatchNorm1d(dims)

    def forward(self, x):
        residual = x
        x = self.conv1(x)
        x = self.batch_norm1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = self.batch_norm2(x)
        return x + residual


class MelResNet(nn.Module):
    def __init__(self, num_res_blocks, in_dims, compute_dims, res_out_dims, pad):
        super().__init__()
        k_size = pad * 2 + 1
        self.conv_in = nn.Conv1d(in_dims, compute_dims, kernel_size=k_size, bias=False)
        self.batch_norm = nn.BatchNorm1d(compute_dims)
        self.layers = nn.ModuleList()
        for _ in range(num_res_blocks):
            self.layers.append(ResBlock(compute_dims))
        self.conv_out = nn.Conv1d(compute_dims, res_out_dims, kernel_size=1)

    def forward(self, x):
        x = self.conv_in(x)
        x = self.batch_norm(x)
        x = F.relu(x)
        for f in self.layers:
            x = f(x)
        x = self.conv_out(x)
        return x


class Stretch2d(nn.Module):
    def __init__(self, x_scale, y_scale):
        super().__init__()
        self.x_scale = x_scale
        self.y_scale = y_scale

    def forward(self, x):
        b, c, h, w = x.size()
        x = x.unsqueeze(-1).unsqueeze(3)
        x = x.repeat(1, 1, 1, self.y_scale, 1, self.x_scale)
        return x.view(b, c, h * self.y_scale, w * self.x_scale)


class UpsampleNetwork(nn.Module):
    def __init__(
        self,
        feat_dims,
        upsample_scales,
        compute_dims,
        num_res_blocks,
        res_out_dims,
        pad,
        use_aux_net,
    ):
        super().__init__()
        self.total_scale = np.cumproduct(upsample_scales)[-1]
        self.indent = pad * self.total_scale
        self.use_aux_net = use_aux_net
        if use_aux_net:
            self.resnet = MelResNet(num_res_blocks, feat_dims, compute_dims, res_out_dims, pad)
            self.resnet_stretch = Stretch2d(self.total_scale, 1)
        self.up_layers = nn.ModuleList()
        for scale in upsample_scales:
            k_size = (1, scale * 2 + 1)
            padding = (0, scale)
            stretch = Stretch2d(scale, 1)
            conv = nn.Conv2d(1, 1, kernel_size=k_size, padding=padding, bias=False)
            conv.weight.data.fill_(1.0 / k_size[1])
            self.up_layers.append(stretch)
            self.up_layers.append(conv)

    def forward(self, m):
        if self.use_aux_net:
            aux = self.resnet(m).unsqueeze(1)
            aux = self.resnet_stretch(aux)
            aux = aux.squeeze(1)
            aux = aux.transpose(1, 2)
        else:
            aux = None
        m = m.unsqueeze(1)
        for f in self.up_layers:
            m = f(m)
        m = m.squeeze(1)[:, :, self.indent : -self.indent]
        return m.transpose(1, 2), aux


class Upsample(nn.Module):
    def __init__(self, scale, pad, num_res_blocks, feat_dims, compute_dims, res_out_dims, use_aux_net):
        super().__init__()
        self.scale = scale
        self.pad = pad
        self.indent = pad * scale
        self.use_aux_net = use_aux_net
        self.resnet = MelResNet(num_res_blocks, feat_dims, compute_dims, res_out_dims, pad)

    def forward(self, m):
        if self.use_aux_net:
            aux = self.resnet(m)
            aux = torch.nn.functional.interpolate(aux, scale_factor=self.scale, mode="linear", align_corners=True)
            aux = aux.transpose(1, 2)
        else:
            aux = None
        m = torch.nn.functional.interpolate(m, scale_factor=self.scale, mode="linear", align_corners=True)
        m = m[:, :, self.indent : -self.indent]
        m = m * 0.045  # empirically found

        return m.transpose(1, 2), aux


@dataclass
class WavernnArgs(Coqpit):
    """🐸 WaveRNN model arguments.

    rnn_dims (int):
        Number of hidden channels in RNN layers. Defaults to 512.
    fc_dims (int):
        Number of hidden channels in fully-conntected layers. Defaults to 512.
    compute_dims (int):
        Number of hidden channels in the feature ResNet. Defaults to 128.
    res_out_dim (int):
        Number of hidden channels in the feature ResNet output. Defaults to 128.
    num_res_blocks (int):
        Number of residual blocks in the ResNet. Defaults to 10.
    use_aux_net (bool):
        enable/disable the feature ResNet. Defaults to True.
    use_upsample_net (bool):
        enable/ disable the upsampling networl. If False, basic upsampling is used. Defaults to True.
    upsample_factors (list):
        Upsampling factors. The multiply of the values must match the `hop_length`. Defaults to ```[4, 8, 8]```.
    mode (str):
        Output mode of the WaveRNN vocoder. `mold` for Mixture of Logistic Distribution, `gauss` for a single
        Gaussian Distribution and `bits` for quantized bits as the model's output.
    mulaw (bool):
        enable / disable the use of Mulaw quantization for training. Only applicable if `mode == 'bits'`. Defaults
        to `True`.
    pad (int):
            Padding applied to the input feature frames against the convolution layers of the feature network.
            Defaults to 2.
    """

    rnn_dims: int = 512
    fc_dims: int = 512
    compute_dims: int = 128
    res_out_dims: int = 128
    num_res_blocks: int = 10
    use_aux_net: bool = True
    use_upsample_net: bool = True
    upsample_factors: List[int] = field(default_factory=lambda: [4, 8, 8])
    mode: str = "mold"  # mold [string], gauss [string], bits [int]
    mulaw: bool = True  # apply mulaw if mode is bits
    pad: int = 2
    feat_dims: int = 80


class Wavernn(BaseVocoder):
    def __init__(self, config: Coqpit):
        """🐸 WaveRNN model.
        Original paper - https://arxiv.org/abs/1802.08435
        Official implementation - https://github.com/fatchord/WaveRNN

        Args:
            config (Coqpit): [description]

        Raises:
            RuntimeError: [description]

        Examples:
            >>> from TTS.vocoder.configs import WavernnConfig
            >>> config = WavernnConfig()
            >>> model = Wavernn(config)

        Paper Abstract:
            Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to
            both estimating the data distribution and generating high-quality samples. Efficient sampling for this
            class of models has however remained an elusive problem. With a focus on text-to-speech synthesis, we
            describe a set of general techniques for reducing sampling time while maintaining high output quality.
            We first describe a single-layer recurrent neural network, the WaveRNN, with a dual softmax layer that
            matches the quality of the state-of-the-art WaveNet model. The compact form of the network makes it
            possible to generate 24kHz 16-bit audio 4x faster than real time on a GPU. Second, we apply a weight
            pruning technique to reduce the number of weights in the WaveRNN. We find that, for a constant number of
            parameters, large sparse networks perform better than small dense networks and this relationship holds for
            sparsity levels beyond 96%. The small number of weights in a Sparse WaveRNN makes it possible to sample
            high-fidelity audio on a mobile CPU in real time. Finally, we propose a new generation scheme based on
            subscaling that folds a long sequence into a batch of shorter sequences and allows one to generate multiple
            samples at once. The Subscale WaveRNN produces 16 samples per step without loss of quality and offers an
            orthogonal method for increasing sampling efficiency.
        """
        super().__init__(config)

        if isinstance(self.args.mode, int):
            self.n_classes = 2**self.args.mode
        elif self.args.mode == "mold":
            self.n_classes = 3 * 10
        elif self.args.mode == "gauss":
            self.n_classes = 2
        else:
            raise RuntimeError("Unknown model mode value - ", self.args.mode)

        self.ap = AudioProcessor(**config.audio.to_dict())
        self.aux_dims = self.args.res_out_dims // 4

        if self.args.use_upsample_net:
            assert (
                np.cumproduct(self.args.upsample_factors)[-1] == config.audio.hop_length
            ), " [!] upsample scales needs to be equal to hop_length"
            self.upsample = UpsampleNetwork(
                self.args.feat_dims,
                self.args.upsample_factors,
                self.args.compute_dims,
                self.args.num_res_blocks,
                self.args.res_out_dims,
                self.args.pad,
                self.args.use_aux_net,
            )
        else:
            self.upsample = Upsample(
                config.audio.hop_length,
                self.args.pad,
                self.args.num_res_blocks,
                self.args.feat_dims,
                self.args.compute_dims,
                self.args.res_out_dims,
                self.args.use_aux_net,
            )
        if self.args.use_aux_net:
            self.I = nn.Linear(self.args.feat_dims + self.aux_dims + 1, self.args.rnn_dims)
            self.rnn1 = nn.GRU(self.args.rnn_dims, self.args.rnn_dims, batch_first=True)
            self.rnn2 = nn.GRU(self.args.rnn_dims + self.aux_dims, self.args.rnn_dims, batch_first=True)
            self.fc1 = nn.Linear(self.args.rnn_dims + self.aux_dims, self.args.fc_dims)
            self.fc2 = nn.Linear(self.args.fc_dims + self.aux_dims, self.args.fc_dims)
            self.fc3 = nn.Linear(self.args.fc_dims, self.n_classes)
        else:
            self.I = nn.Linear(self.args.feat_dims + 1, self.args.rnn_dims)
            self.rnn1 = nn.GRU(self.args.rnn_dims, self.args.rnn_dims, batch_first=True)
            self.rnn2 = nn.GRU(self.args.rnn_dims, self.args.rnn_dims, batch_first=True)
            self.fc1 = nn.Linear(self.args.rnn_dims, self.args.fc_dims)
            self.fc2 = nn.Linear(self.args.fc_dims, self.args.fc_dims)
            self.fc3 = nn.Linear(self.args.fc_dims, self.n_classes)

    def forward(self, x, mels):
        bsize = x.size(0)
        h1 = torch.zeros(1, bsize, self.args.rnn_dims).to(x.device)
        h2 = torch.zeros(1, bsize, self.args.rnn_dims).to(x.device)
        mels, aux = self.upsample(mels)

        if self.args.use_aux_net:
            aux_idx = [self.aux_dims * i for i in range(5)]
            a1 = aux[:, :, aux_idx[0] : aux_idx[1]]
            a2 = aux[:, :, aux_idx[1] : aux_idx[2]]
            a3 = aux[:, :, aux_idx[2] : aux_idx[3]]
            a4 = aux[:, :, aux_idx[3] : aux_idx[4]]

        x = (
            torch.cat([x.unsqueeze(-1), mels, a1], dim=2)
            if self.args.use_aux_net
            else torch.cat([x.unsqueeze(-1), mels], dim=2)
        )
        x = self.I(x)
        res = x
        self.rnn1.flatten_parameters()
        x, _ = self.rnn1(x, h1)

        x = x + res
        res = x
        x = torch.cat([x, a2], dim=2) if self.args.use_aux_net else x
        self.rnn2.flatten_parameters()
        x, _ = self.rnn2(x, h2)

        x = x + res
        x = torch.cat([x, a3], dim=2) if self.args.use_aux_net else x
        x = F.relu(self.fc1(x))

        x = torch.cat([x, a4], dim=2) if self.args.use_aux_net else x
        x = F.relu(self.fc2(x))
        return self.fc3(x)

    def inference(self, mels, batched=None, target=None, overlap=None):
        self.eval()
        output = []
        start = time.time()
        rnn1 = self.get_gru_cell(self.rnn1)
        rnn2 = self.get_gru_cell(self.rnn2)

        with torch.no_grad():
            if isinstance(mels, np.ndarray):
                mels = torch.FloatTensor(mels).to(str(next(self.parameters()).device))

            if mels.ndim == 2:
                mels = mels.unsqueeze(0)
            wave_len = (mels.size(-1) - 1) * self.config.audio.hop_length

            mels = self.pad_tensor(mels.transpose(1, 2), pad=self.args.pad, side="both")
            mels, aux = self.upsample(mels.transpose(1, 2))

            if batched:
                mels = self.fold_with_overlap(mels, target, overlap)
                if aux is not None:
                    aux = self.fold_with_overlap(aux, target, overlap)

            b_size, seq_len, _ = mels.size()

            h1 = torch.zeros(b_size, self.args.rnn_dims).type_as(mels)
            h2 = torch.zeros(b_size, self.args.rnn_dims).type_as(mels)
            x = torch.zeros(b_size, 1).type_as(mels)

            if self.args.use_aux_net:
                d = self.aux_dims
                aux_split = [aux[:, :, d * i : d * (i + 1)] for i in range(4)]

            for i in range(seq_len):
                m_t = mels[:, i, :]

                if self.args.use_aux_net:
                    a1_t, a2_t, a3_t, a4_t = (a[:, i, :] for a in aux_split)

                x = torch.cat([x, m_t, a1_t], dim=1) if self.args.use_aux_net else torch.cat([x, m_t], dim=1)
                x = self.I(x)
                h1 = rnn1(x, h1)

                x = x + h1
                inp = torch.cat([x, a2_t], dim=1) if self.args.use_aux_net else x
                h2 = rnn2(inp, h2)

                x = x + h2
                x = torch.cat([x, a3_t], dim=1) if self.args.use_aux_net else x
                x = F.relu(self.fc1(x))

                x = torch.cat([x, a4_t], dim=1) if self.args.use_aux_net else x
                x = F.relu(self.fc2(x))

                logits = self.fc3(x)

                if self.args.mode == "mold":
                    sample = sample_from_discretized_mix_logistic(logits.unsqueeze(0).transpose(1, 2))
                    output.append(sample.view(-1))
                    x = sample.transpose(0, 1).type_as(mels)
                elif self.args.mode == "gauss":
                    sample = sample_from_gaussian(logits.unsqueeze(0).transpose(1, 2))
                    output.append(sample.view(-1))
                    x = sample.transpose(0, 1).type_as(mels)
                elif isinstance(self.args.mode, int):
                    posterior = F.softmax(logits, dim=1)
                    distrib = torch.distributions.Categorical(posterior)

                    sample = 2 * distrib.sample().float() / (self.n_classes - 1.0) - 1.0
                    output.append(sample)
                    x = sample.unsqueeze(-1)
                else:
                    raise RuntimeError("Unknown model mode value - ", self.args.mode)

                if i % 100 == 0:
                    self.gen_display(i, seq_len, b_size, start)

        output = torch.stack(output).transpose(0, 1)
        output = output.cpu()
        if batched:
            output = output.numpy()
            output = output.astype(np.float64)

            output = self.xfade_and_unfold(output, target, overlap)
        else:
            output = output[0]

        if self.args.mulaw and isinstance(self.args.mode, int):
            output = AudioProcessor.mulaw_decode(output, self.args.mode)

        # Fade-out at the end to avoid signal cutting out suddenly
        fade_out = np.linspace(1, 0, 20 * self.config.audio.hop_length)
        output = output[:wave_len]

        if wave_len > len(fade_out):
            output[-20 * self.config.audio.hop_length :] *= fade_out

        self.train()
        return output

    def gen_display(self, i, seq_len, b_size, start):
        gen_rate = (i + 1) / (time.time() - start) * b_size / 1000
        realtime_ratio = gen_rate * 1000 / self.config.audio.sample_rate
        stream(
            "%i/%i -- batch_size: %i -- gen_rate: %.1f kHz -- x_realtime: %.1f  ",
            (i * b_size, seq_len * b_size, b_size, gen_rate, realtime_ratio),
        )

    def fold_with_overlap(self, x, target, overlap):
        """Fold the tensor with overlap for quick batched inference.
            Overlap will be used for crossfading in xfade_and_unfold()
        Args:
            x (tensor)    : Upsampled conditioning features.
                            shape=(1, timesteps, features)
            target (int)  : Target timesteps for each index of batch
            overlap (int) : Timesteps for both xfade and rnn warmup
        Return:
            (tensor) : shape=(num_folds, target + 2 * overlap, features)
        Details:
            x = [[h1, h2, ... hn]]
            Where each h is a vector of conditioning features
            Eg: target=2, overlap=1 with x.size(1)=10
            folded = [[h1, h2, h3, h4],
                      [h4, h5, h6, h7],
                      [h7, h8, h9, h10]]
        """

        _, total_len, features = x.size()

        # Calculate variables needed
        num_folds = (total_len - overlap) // (target + overlap)
        extended_len = num_folds * (overlap + target) + overlap
        remaining = total_len - extended_len

        # Pad if some time steps poking out
        if remaining != 0:
            num_folds += 1
            padding = target + 2 * overlap - remaining
            x = self.pad_tensor(x, padding, side="after")

        folded = torch.zeros(num_folds, target + 2 * overlap, features).to(x.device)

        # Get the values for the folded tensor
        for i in range(num_folds):
            start = i * (target + overlap)
            end = start + target + 2 * overlap
            folded[i] = x[:, start:end, :]

        return folded

    @staticmethod
    def get_gru_cell(gru):
        gru_cell = nn.GRUCell(gru.input_size, gru.hidden_size)
        gru_cell.weight_hh.data = gru.weight_hh_l0.data
        gru_cell.weight_ih.data = gru.weight_ih_l0.data
        gru_cell.bias_hh.data = gru.bias_hh_l0.data
        gru_cell.bias_ih.data = gru.bias_ih_l0.data
        return gru_cell

    @staticmethod
    def pad_tensor(x, pad, side="both"):
        # NB - this is just a quick method i need right now
        # i.e., it won't generalise to other shapes/dims
        b, t, c = x.size()
        total = t + 2 * pad if side == "both" else t + pad
        padded = torch.zeros(b, total, c).to(x.device)
        if side in ("before", "both"):
            padded[:, pad : pad + t, :] = x
        elif side == "after":
            padded[:, :t, :] = x
        return padded

    @staticmethod
    def xfade_and_unfold(y, target, overlap):
        """Applies a crossfade and unfolds into a 1d array.
        Args:
            y (ndarry)    : Batched sequences of audio samples
                            shape=(num_folds, target + 2 * overlap)
                            dtype=np.float64
            overlap (int) : Timesteps for both xfade and rnn warmup
        Return:
            (ndarry) : audio samples in a 1d array
                       shape=(total_len)
                       dtype=np.float64
        Details:
            y = [[seq1],
                 [seq2],
                 [seq3]]
            Apply a gain envelope at both ends of the sequences
            y = [[seq1_in, seq1_target, seq1_out],
                 [seq2_in, seq2_target, seq2_out],
                 [seq3_in, seq3_target, seq3_out]]
            Stagger and add up the groups of samples:
            [seq1_in, seq1_target, (seq1_out + seq2_in), seq2_target, ...]
        """

        num_folds, length = y.shape
        target = length - 2 * overlap
        total_len = num_folds * (target + overlap) + overlap

        # Need some silence for the rnn warmup
        silence_len = overlap // 2
        fade_len = overlap - silence_len
        silence = np.zeros((silence_len), dtype=np.float64)

        # Equal power crossfade
        t = np.linspace(-1, 1, fade_len, dtype=np.float64)
        fade_in = np.sqrt(0.5 * (1 + t))
        fade_out = np.sqrt(0.5 * (1 - t))

        # Concat the silence to the fades
        fade_in = np.concatenate([silence, fade_in])
        fade_out = np.concatenate([fade_out, silence])

        # Apply the gain to the overlap samples
        y[:, :overlap] *= fade_in
        y[:, -overlap:] *= fade_out

        unfolded = np.zeros((total_len), dtype=np.float64)

        # Loop to add up all the samples
        for i in range(num_folds):
            start = i * (target + overlap)
            end = start + target + 2 * overlap
            unfolded[start:end] += y[i]

        return unfolded

    def load_checkpoint(
        self, config, checkpoint_path, eval=False, cache=False
    ):  # pylint: disable=unused-argument, redefined-builtin
        state = load_fsspec(checkpoint_path, map_location=torch.device("cpu"), cache=cache)
        self.load_state_dict(state["model"])
        if eval:
            self.eval()
            assert not self.training

    def train_step(self, batch: Dict, criterion: Dict) -> Tuple[Dict, Dict]:
        mels = batch["input"]
        waveform = batch["waveform"]
        waveform_coarse = batch["waveform_coarse"]

        y_hat = self.forward(waveform, mels)
        if isinstance(self.args.mode, int):
            y_hat = y_hat.transpose(1, 2).unsqueeze(-1)
        else:
            waveform_coarse = waveform_coarse.float()
        waveform_coarse = waveform_coarse.unsqueeze(-1)
        # compute losses
        loss_dict = criterion(y_hat, waveform_coarse)
        return {"model_output": y_hat}, loss_dict

    def eval_step(self, batch: Dict, criterion: Dict) -> Tuple[Dict, Dict]:
        return self.train_step(batch, criterion)

    @torch.no_grad()
    def test(
        self, assets: Dict, test_loader: "DataLoader", output: Dict  # pylint: disable=unused-argument
    ) -> Tuple[Dict, Dict]:
        ap = self.ap
        figures = {}
        audios = {}
        samples = test_loader.dataset.load_test_samples(1)
        for idx, sample in enumerate(samples):
            x = torch.FloatTensor(sample[0])
            x = x.to(next(self.parameters()).device)
            y_hat = self.inference(x, self.config.batched, self.config.target_samples, self.config.overlap_samples)
            x_hat = ap.melspectrogram(y_hat)
            figures.update(
                {
                    f"test_{idx}/ground_truth": plot_spectrogram(x.T),
                    f"test_{idx}/prediction": plot_spectrogram(x_hat.T),
                }
            )
            audios.update({f"test_{idx}/audio": y_hat})
            # audios.update({f"real_{idx}/audio": y_hat})
        return figures, audios

    def test_log(
        self, outputs: Dict, logger: "Logger", assets: Dict, steps: int  # pylint: disable=unused-argument
    ) -> Tuple[Dict, np.ndarray]:
        figures, audios = outputs
        logger.eval_figures(steps, figures)
        logger.eval_audios(steps, audios, self.ap.sample_rate)

    @staticmethod
    def format_batch(batch: Dict) -> Dict:
        waveform = batch[0]
        mels = batch[1]
        waveform_coarse = batch[2]
        return {"input": mels, "waveform": waveform, "waveform_coarse": waveform_coarse}

    def get_data_loader(  # pylint: disable=no-self-use
        self,
        config: Coqpit,
        assets: Dict,
        is_eval: True,
        samples: List,
        verbose: bool,
        num_gpus: int,
    ):
        ap = self.ap
        dataset = WaveRNNDataset(
            ap=ap,
            items=samples,
            seq_len=config.seq_len,
            hop_len=ap.hop_length,
            pad=config.model_args.pad,
            mode=config.model_args.mode,
            mulaw=config.model_args.mulaw,
            is_training=not is_eval,
            verbose=verbose,
        )
        sampler = DistributedSampler(dataset, shuffle=True) if num_gpus > 1 else None
        loader = DataLoader(
            dataset,
            batch_size=1 if is_eval else config.batch_size,
            shuffle=num_gpus == 0,
            collate_fn=dataset.collate,
            sampler=sampler,
            num_workers=config.num_eval_loader_workers if is_eval else config.num_loader_workers,
            pin_memory=True,
        )
        return loader

    def get_criterion(self):
        # define train functions
        return WaveRNNLoss(self.args.mode)

    @staticmethod
    def init_from_config(config: "WavernnConfig"):
        return Wavernn(config)


================================================
FILE: TTS/vocoder/utils/__init__.py
================================================


================================================
FILE: TTS/vocoder/utils/distribution.py
================================================
import math

import numpy as np
import torch
import torch.nn.functional as F
from torch.distributions.normal import Normal


def gaussian_loss(y_hat, y, log_std_min=-7.0):
    assert y_hat.dim() == 3
    assert y_hat.size(2) == 2
    mean = y_hat[:, :, :1]
    log_std = torch.clamp(y_hat[:, :, 1:], min=log_std_min)
    # TODO: replace with pytorch dist
    log_probs = -0.5 * (-math.log(2.0 * math.pi) - 2.0 * log_std - torch.pow(y - mean, 2) * torch.exp((-2.0 * log_std)))
    return log_probs.squeeze().mean()


def sample_from_gaussian(y_hat, log_std_min=-7.0, scale_factor=1.0):
    assert y_hat.size(2) == 2
    mean = y_hat[:, :, :1]
    log_std = torch.clamp(y_hat[:, :, 1:], min=log_std_min)
    dist = Normal(
        mean,
        torch.exp(log_std),
    )
    sample = dist.sample()
    sample = torch.clamp(torch.clamp(sample, min=-scale_factor), max=scale_factor)
    del dist
    return sample


def log_sum_exp(x):
    """numerically stable log_sum_exp implementation that prevents overflow"""
    # TF ordering
    axis = len(x.size()) - 1
    m, _ = torch.max(x, dim=axis)
    m2, _ = torch.max(x, dim=axis, keepdim=True)
    return m + torch.log(torch.sum(torch.exp(x - m2), dim=axis))


# It is adapted from https://github.com/r9y9/wavenet_vocoder/blob/master/wavenet_vocoder/mixture.py
def discretized_mix_logistic_loss(y_hat, y, num_classes=65536, log_scale_min=None, reduce=True):
    if log_scale_min is None:
        log_scale_min = float(np.log(1e-14))
    y_hat = y_hat.permute(0, 2, 1)
    assert y_hat.dim() == 3
    assert y_hat.size(1) % 3 == 0
    nr_mix = y_hat.size(1) // 3

    # (B x T x C)
    y_hat = y_hat.transpose(1, 2)

    # unpack parameters. (B, T, num_mixtures) x 3
    logit_probs = y_hat[:, :, :nr_mix]
    means = y_hat[:, :, nr_mix : 2 * nr_mix]
    log_scales = torch.clamp(y_hat[:, :, 2 * nr_mix : 3 * nr_mix], min=log_scale_min)

    # B x T x 1 -> B x T x num_mixtures
    y = y.expand_as(means)

    centered_y = y - means
    inv_stdv = torch.exp(-log_scales)
    plus_in = inv_stdv * (centered_y + 1.0 / (num_classes - 1))
    cdf_plus = torch.sigmoid(plus_in)
    min_in = inv_stdv * (centered_y - 1.0 / (num_classes - 1))
    cdf_min = torch.sigmoid(min_in)

    # log probability for edge case of 0 (before scaling)
    # equivalent: torch.log(F.sigmoid(plus_in))
    log_cdf_plus = plus_in - F.softplus(plus_in)

    # log probability for edge case of 255 (before scaling)
    # equivalent: (1 - F.sigmoid(min_in)).log()
    log_one_minus_cdf_min = -F.softplus(min_in)

    # probability for all other cases
    cdf_delta = cdf_plus - cdf_min

    mid_in = inv_stdv * centered_y
    # log probability in the center of the bin, to be used in extreme cases
    # (not actually used in our code)
    log_pdf_mid = mid_in - log_scales - 2.0 * F.softplus(mid_in)

    # tf equivalent

    # log_probs = tf.where(x < -0.999, log_cdf_plus,
    #                      tf.where(x > 0.999, log_one_minus_cdf_min,
    #                               tf.where(cdf_delta > 1e-5,
    #                                        tf.log(tf.maximum(cdf_delta, 1e-12)),
    #                                        log_pdf_mid - np.log(127.5))))

    # TODO: cdf_delta <= 1e-5 actually can happen. How can we choose the value
    # for num_classes=65536 case? 1e-7? not sure..
    inner_inner_cond = (cdf_delta > 1e-5).float()

    inner_inner_out = inner_inner_cond * torch.log(torch.clamp(cdf_delta, min=1e-12)) + (1.0 - inner_inner_cond) * (
        log_pdf_mid - np.log((num_classes - 1) / 2)
    )
    inner_cond = (y > 0.999).float()
    inner_out = inner_cond * log_one_minus_cdf_min + (1.0 - inner_cond) * inner_inner_out
    cond = (y < -0.999).float()
    log_probs = cond * log_cdf_plus + (1.0 - cond) * inner_out

    log_probs = log_probs + F.log_softmax(logit_probs, -1)

    if reduce:
        return -torch.mean(log_sum_exp(log_probs))
    return -log_sum_exp(log_probs).unsqueeze(-1)


def sample_from_discretized_mix_logistic(y, log_scale_min=None):
    """
    Sample from discretized mixture of logistic distributions
    Args:
        y (Tensor): :math:`[B, C, T]`
        log_scale_min (float): Log scale minimum value
    Returns:
        Tensor: sample in range of [-1, 1].
    """
    if log_scale_min is None:
        log_scale_min = float(np.log(1e-14))
    assert y.size(1) % 3 == 0
    nr_mix = y.size(1) // 3

    # B x T x C
    y = y.transpose(1, 2)
    logit_probs = y[:, :, :nr_mix]

    # sample mixture indicator from softmax
    temp = logit_probs.data.new(logit_probs.size()).uniform_(1e-5, 1.0 - 1e-5)
    temp = logit_probs.data - torch.log(-torch.log(temp))
    _, argmax = temp.max(dim=-1)

    # (B, T) -> (B, T, nr_mix)
    one_hot = to_one_hot(argmax, nr_mix)
    # select logistic parameters
    means = torch.sum(y[:, :, nr_mix : 2 * nr_mix] * one_hot, dim=-1)
    log_scales = torch.clamp(torch.sum(y[:, :, 2 * nr_mix : 3 * nr_mix] * one_hot, dim=-1), min=log_scale_min)
    # sample from logistic & clip to interval
    # we don't actually round to the nearest 8bit value when sampling
    u = means.data.new(means.size()).uniform_(1e-5, 1.0 - 1e-5)
    x = means + torch.exp(log_scales) * (torch.log(u) - torch.log(1.0 - u))

    x = torch.clamp(torch.clamp(x, min=-1.0), max=1.0)

    return x


def to_one_hot(tensor, n, fill_with=1.0):
    # we perform one hot encore with respect to the last axis
    one_hot = torch.FloatTensor(tensor.size() + (n,)).zero_().type_as(tensor)
    one_hot.scatter_(len(tensor.size()), tensor.unsqueeze(-1), fill_with)
    return one_hot


================================================
FILE: TTS/vocoder/utils/generic_utils.py
================================================
from typing import Dict

import numpy as np
import torch
from matplotlib import pyplot as plt

from TTS.tts.utils.visual import plot_spectrogram
from TTS.utils.audio import AudioProcessor


def interpolate_vocoder_input(scale_factor, spec):
    """Interpolate spectrogram by the scale factor.
    It is mainly used to match the sampling rates of
    the tts and vocoder models.

    Args:
        scale_factor (float): scale factor to interpolate the spectrogram
        spec (np.array): spectrogram to be interpolated

    Returns:
        torch.tensor: interpolated spectrogram.
    """
    print(" > before interpolation :", spec.shape)
    spec = torch.tensor(spec).unsqueeze(0).unsqueeze(0)  # pylint: disable=not-callable
    spec = torch.nn.functional.interpolate(
        spec, scale_factor=scale_factor, recompute_scale_factor=True, mode="bilinear", align_corners=False
    ).squeeze(0)
    print(" > after interpolation :", spec.shape)
    return spec


def plot_results(y_hat: torch.tensor, y: torch.tensor, ap: AudioProcessor, name_prefix: str = None) -> Dict:
    """Plot the predicted and the real waveform and their spectrograms.

    Args:
        y_hat (torch.tensor): Predicted waveform.
        y (torch.tensor): Real waveform.
        ap (AudioProcessor): Audio processor used to process the waveform.
        name_prefix (str, optional): Name prefix used to name the figures. Defaults to None.

    Returns:
        Dict: output figures keyed by the name of the figures.
    """ """Plot vocoder model results"""
    if name_prefix is None:
        name_prefix = ""

    # select an instance from batch
    y_hat = y_hat[0].squeeze().detach().cpu().numpy()
    y = y[0].squeeze().detach().cpu().numpy()

    spec_fake = ap.melspectrogram(y_hat).T
    spec_real = ap.melspectrogram(y).T
    spec_diff = np.abs(spec_fake - spec_real)

    # plot figure and save it
    fig_wave = plt.figure()
    plt.subplot(2, 1, 1)
    plt.plot(y)
    plt.title("groundtruth speech")
    plt.subplot(2, 1, 2)
    plt.plot(y_hat)
    plt.title("generated speech")
    plt.tight_layout()
    plt.close()

    figures = {
        name_prefix + "spectrogram/fake": plot_spectrogram(spec_fake),
        name_prefix + "spectrogram/real": plot_spectrogram(spec_real),
        name_prefix + "spectrogram/diff": plot_spectrogram(spec_diff),
        name_prefix + "speech_comparison": fig_wave,
    }
    return figures


================================================
FILE: TTS_additional_material/.gitignore
================================================
WadaSNR/
.idea/
*.pyc
.DS_Store
./__init__.py
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/

# Translations
*.mo
*.pot

# Django stuff:
*.log
.static_storage/
.media/
local_settings.py

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/

# vim
*.swp
*.swm
*.swn
*.swo

# pytorch models
*.pth
*.pth.tar
!dummy_speakers.pth
result/

# setup.py
version.py

# jupyter dummy files
core

# ignore local datasets
recipes/WIP/*
recipes/ljspeech/LJSpeech-1.1/*
recipes/vctk/VCTK/*
recipes/**/*.npy
recipes/**/*.json
VCTK-Corpus-removed-silence/*

# ignore training logs
trainer_*_log.txt

# files used internally for dev, test etc.
tests/outputs/*
tests/train_outputs/*
TODO.txt
.vscode/*
data/*
notebooks/data/*
TTS/tts/utils/monotonic_align/core.c
.vscode-upload.json
temp_build/*
events.out*
old_configs/*
model_importers/*
model_profiling/*
docs/source/TODO/*
.noseids
.dccache
log.txt
umap.png
*.out
SocialMedia.txt
output.wav
tts_output.wav
deps.json
speakers.json
internal/*
*_pitch.npy
*_phoneme.npy
wandb
depot/*
coqui_recipes/*
local_scripts/*


================================================
FILE: TTS_additional_material/.pre-commit-config.yaml
================================================
repos:
  - repo: 'https://github.com/pre-commit/pre-commit-hooks'
    rev: v2.3.0
    hooks:
      - id: check-yaml
      - id: end-of-file-fixer
      - id: trailing-whitespace
  - repo: 'https://github.com/psf/black'
    rev: 22.3.0
    hooks:
      - id: black
        language_version: python3
  - repo: https://github.com/pycqa/isort
    rev: 5.8.0
    hooks:
      - id: isort
        name: isort (python)
      - id: isort
        name: isort (cython)
        types: [cython]
      - id: isort
        name: isort (pyi)
        types: [pyi]
  - repo: https://github.com/pycqa/pylint
    rev: v2.8.2
    hooks:
    -   id: pylint


================================================
FILE: TTS_additional_material/.pylintrc
================================================
[MASTER]

# A comma-separated list of package or module names from where C extensions may
# be loaded. Extensions are loading into the active Python interpreter and may
# run arbitrary code.
extension-pkg-whitelist=

# Add files or directories to the blacklist. They should be base names, not
# paths.
ignore=CVS

# Add files or directories matching the regex patterns to the blacklist. The
# regex matches against base names, not paths.
ignore-patterns=

# Python code to execute, usually for sys.path manipulation such as
# pygtk.require().
#init-hook=

# Use multiple processes to speed up Pylint. Specifying 0 will auto-detect the
# number of processors available to use.
jobs=1

# Control the amount of potential inferred values when inferring a single
# object. This can help the performance when dealing with large functions or
# complex, nested conditions.
limit-inference-results=100

# List of plugins (as comma separated values of python modules names) to load,
# usually to register additional checkers.
load-plugins=

# Pickle collected data for later comparisons.
persistent=yes

# Specify a configuration file.
#rcfile=

# When enabled, pylint would attempt to guess common misconfiguration and emit
# user-friendly hints instead of false-positive error messages.
suggestion-mode=yes

# Allow loading of arbitrary C extensions. Extensions are imported into the
# active Python interpreter and may run arbitrary code.
unsafe-load-any-extension=no


[MESSAGES CONTROL]

# Only show warnings with the listed confidence levels. Leave empty to show
# all. Valid levels: HIGH, INFERENCE, INFERENCE_FAILURE, UNDEFINED.
confidence=

# Disable the message, report, category or checker with the given id(s). You
# can either give multiple identifiers separated by comma (,) or put this
# option multiple times (only on the command line, not in the configuration
# file where it should appear only once). You can also use "--disable=all" to
# disable everything first and then reenable specific checks. For example, if
# you want to run only the similarities checker, you can use "--disable=all
# --enable=similarities". If you want to run only the classes checker, but have
# no Warning level messages displayed, use "--disable=all --enable=classes
# --disable=W".
disable=missing-docstring,
        too-many-public-methods,
        too-many-lines,
        bare-except,
        ## for avoiding weird p3.6 CI linter error
        ## TODO: see later if we can remove this
        assigning-non-slot,
        unsupported-assignment-operation,
        ## end
        line-too-long,
        fixme,
        wrong-import-order,
        ungrouped-imports,
        wrong-import-position,
        import-error,
        invalid-name,
        too-many-instance-attributes,
        arguments-differ,
        arguments-renamed,
        no-name-in-module,
        no-member,
        unsubscriptable-object,
        print-statement,
        parameter-unpacking,
        unpacking-in-except,
        old-raise-syntax,
        backtick,
        long-suffix,
        old-ne-operator,
        old-octal-literal,
        import-star-module-level,
        non-ascii-bytes-literal,
        raw-checker-failed,
        bad-inline-option,
        locally-disabled,
        file-ignored,
        suppressed-message,
        useless-suppression,
        deprecated-pragma,
        use-symbolic-message-instead,
        useless-object-inheritance,
        too-few-public-methods,
        too-many-branches,
        too-many-arguments,
        too-many-locals,
        too-many-statements,
        apply-builtin,
        basestring-builtin,
        buffer-builtin,
        cmp-builtin,
        coerce-builtin,
        execfile-builtin,
        file-builtin,
        long-builtin,
        raw_input-builtin,
        reduce-builtin,
        standarderror-builtin,
        unicode-builtin,
        xrange-builtin,
        coerce-method,
        delslice-method,
        getslice-method,
        setslice-method,
        no-absolute-import,
        old-division,
        dict-iter-method,
        dict-view-method,
        next-method-called,
        metaclass-assignment,
        indexing-exception,
        raising-string,
        reload-builtin,
        oct-method,
        hex-method,
        nonzero-method,
        cmp-method,
        input-builtin,
        round-builtin,
        intern-builtin,
        unichr-builtin,
        map-builtin-not-iterating,
        zip-builtin-not-iterating,
        range-builtin-not-iterating,
        filter-builtin-not-iterating,
        using-cmp-argument,
        eq-without-hash,
        div-method,
        idiv-method,
        rdiv-method,
        exception-message-attribute,
        invalid-str-codec,
        sys-max-int,
        bad-python3-import,
        deprecated-string-function,
        deprecated-str-translate-call,
        deprecated-itertools-function,
        deprecated-types-field,
        next-method-defined,
        dict-items-not-iterating,
        dict-keys-not-iterating,
        dict-values-not-iterating,
        deprecated-operator-function,
        deprecated-urllib-function,
        xreadlines-attribute,
        deprecated-sys-function,
        exception-escape,
        comprehension-escape,
        duplicate-code,
        not-callable,
        import-outside-toplevel

# Enable the message, report, category or checker with the given id(s). You can
# either give multiple identifier separated by comma (,) or put this option
# multiple time (only on the command line, not in the configuration file where
# it should appear only once). See also the "--disable" option for examples.
enable=c-extension-no-member


[REPORTS]

# Python expression which should return a note less than 10 (10 is the highest
# note). You have access to the variables errors warning, statement which
# respectively contain the number of errors / warnings messages and the total
# number of statements analyzed. This is used by the global evaluation report
# (RP0004).
evaluation=10.0 - ((float(5 * error + warning + refactor + convention) / statement) * 10)

# Template used to display messages. This is a python new-style format string
# used to format the message information. See doc for all details.
#msg-template=

# Set the output format. Available formats are text, parseable, colorized, json
# and msvs (visual studio). You can also give a reporter class, e.g.
# mypackage.mymodule.MyReporterClass.
output-format=text

# Tells whether to display a full report or only the messages.
reports=no

# Activate the evaluation score.
score=yes


[REFACTORING]

# Maximum number of nested blocks for function / method body
max-nested-blocks=5

# Complete name of functions that never returns. When checking for
# inconsistent-return-statements if a never returning function is called then
# it will be considered as an explicit return statement and no message will be
# printed.
never-returning-functions=sys.exit


[LOGGING]

# Format style used to check logging format string. `old` means using %
# formatting, while `new` is for `{}` formatting.
logging-format-style=old

# Logging modules to check that the string format arguments are in logging
# function parameter format.
logging-modules=logging


[SPELLING]

# Limits count of emitted suggestions for spelling mistakes.
max-spelling-suggestions=4

# Spelling dictionary name. Available dictionaries: none. To make it working
# install python-enchant package..
spelling-dict=

# List of comma separated words that should not be checked.
spelling-ignore-words=

# A path to a file that contains private dictionary; one word per line.
spelling-private-dict-file=

# Tells whether to store unknown words to indicated private dictionary in
# --spelling-private-dict-file option instead of raising a message.
spelling-store-unknown-words=no


[MISCELLANEOUS]

# List of note tags to take in consideration, separated by a comma.
notes=FIXME,
      XXX,
      TODO


[TYPECHECK]

# List of decorators that produce context managers, such as
# contextlib.contextmanager. Add to this list to register other decorators that
# produce valid context managers.
contextmanager-decorators=contextlib.contextmanager

# List of members which are set dynamically and missed by pylint inference
# system, and so shouldn't trigger E1101 when accessed. Python regular
# expressions are accepted.
generated-members=numpy.*,torch.*

# Tells whether missing members accessed in mixin class should be ignored. A
# mixin class is detected if its name ends with "mixin" (case insensitive).
ignore-mixin-members=yes

# Tells whether to warn about missing members when the owner of the attribute
# is inferred to be None.
ignore-none=yes

# This flag controls whether pylint should warn about no-member and similar
# checks whenever an opaque object is returned when inferring. The inference
# can return multiple potential results while evaluating a Python object, but
# some branches might not be evaluated, which results in partial inference. In
# that case, it might be useful to still emit no-member and other checks for
# the rest of the inferred objects.
ignore-on-opaque-inference=yes

# List of class names for which member attributes should not be checked (useful
# for classes with dynamically set attributes). This supports the use of
# qualified names.
ignored-classes=optparse.Values,thread._local,_thread._local

# List of module names for which member attributes should not be checked
# (useful for modules/projects where namespaces are manipulated during runtime
# and thus existing member attributes cannot be deduced by static analysis. It
# supports qualified module names, as well as Unix pattern matching.
ignored-modules=

# Show a hint with possible names when a member name was not found. The aspect
# of finding the hint is based on edit distance.
missing-member-hint=yes

# The minimum edit distance a name should have in order to be considered a
# similar match for a missing member name.
missing-member-hint-distance=1

# The total number of similar names that should be taken in consideration when
# showing a hint for a missing member.
missing-member-max-choices=1


[VARIABLES]

# List of additional names supposed to be defined in builtins. Remember that
# you should avoid defining new builtins when possible.
additional-builtins=

# Tells whether unused global variables should be treated as a violation.
allow-global-unused-variables=yes

# List of strings which can identify a callback function by name. A callback
# name must start or end with one of those strings.
callbacks=cb_,
          _cb

# A regular expression matching the name of dummy variables (i.e. expected to
# not be used).
dummy-variables-rgx=_+$|(_[a-zA-Z0-9_]*[a-zA-Z0-9]+?$)|dummy|^ignored_|^unused_

# Argument names that match this expression will be ignored. Default to name
# with leading underscore.
ignored-argument-names=_.*|^ignored_|^unused_

# Tells whether we should check for unused import in __init__ files.
init-import=no

# List of qualified module names which can have objects that can redefine
# builtins.
redefining-builtins-modules=six.moves,past.builtins,future.builtins,builtins,io


[FORMAT]

# Expected format of line ending, e.g. empty (any line ending), LF or CRLF.
expected-line-ending-format=

# Regexp for a line that is allowed to be longer than the limit.
ignore-long-lines=^\s*(# )?<?https?://\S+>?$

# Number of spaces of indent required inside a hanging or continued line.
indent-after-paren=4

# String used as indentation unit. This is usually "    " (4 spaces) or "\t" (1
# tab).
indent-string='    '

# Maximum number of characters on a single line.
max-line-length=120

# Maximum number of lines in a module.
max-module-lines=1000

# List of optional constructs for which whitespace checking is disabled. `dict-
# separator` is used to allow tabulation in dicts, etc.: {1  : 1,\n222: 2}.
# `trailing-comma` allows a space between comma and closing bracket: (a, ).
# `empty-line` allows space-only lines.
no-space-check=trailing-comma,
               dict-separator

# Allow the body of a class to be on the same line as the declaration if body
# contains single statement.
single-line-class-stmt=no

# Allow the body of an if to be on the same line as the test if there is no
# else.
single-line-if-stmt=no


[SIMILARITIES]

# Ignore comments when computing similarities.
ignore-comments=yes

# Ignore docstrings when computing similarities.
ignore-docstrings=yes

# Ignore imports when computing similarities.
ignore-imports=no

# Minimum lines number of a similarity.
min-similarity-lines=4


[BASIC]

# Naming style matching correct argument names.
argument-naming-style=snake_case

# Regular expression matching correct argument names. Overrides argument-
# naming-style.
argument-rgx=[a-z_][a-z0-9_]{0,30}$

# Naming style matching correct attribute names.
attr-naming-style=snake_case

# Regular expression matching correct attribute names. Overrides attr-naming-
# style.
#attr-rgx=

# Bad variable names which should always be refused, separated by a comma.
bad-names=

# Naming style matching correct class attribute names.
class-attribute-naming-style=any

# Regular expression matching correct class attribute names. Overrides class-
# attribute-naming-style.
#class-attribute-rgx=

# Naming style matching correct class names.
class-naming-style=PascalCase

# Regular expression matching correct class names. Overrides class-naming-
# style.
#class-rgx=

# Naming style matching correct constant names.
const-naming-style=UPPER_CASE

# Regular expression matching correct constant names. Overrides const-naming-
# style.
#const-rgx=

# Minimum line length for functions/classes that require docstrings, shorter
# ones are exempt.
docstring-min-length=-1

# Naming style matching correct function names.
function-naming-style=snake_case

# Regular expression matching correct function names. Overrides function-
# naming-style.
#function-rgx=

# Good variable names which should always be accepted, separated by a comma.
good-names=i,
           j,
           k,
           x,
           ex,
           Run,
           _

# Include a hint for the correct naming format with invalid-name.
include-naming-hint=no

# Naming style matching correct inline iteration names.
inlinevar-naming-style=any

# Regular expression matching correct inline iteration names. Overrides
# inlinevar-naming-style.
#inlinevar-rgx=

# Naming style matching correct method names.
method-naming-style=snake_case

# Regular expression matching correct method names. Overrides method-naming-
# style.
#method-rgx=

# Naming style matching correct module names.
module-naming-style=snake_case

# Regular expression matching correct module names. Overrides module-naming-
# style.
#module-rgx=

# Colon-delimited sets of names that determine each other's naming style when
# the name regexes allow several styles.
name-group=

# Regular expression which should only match function or class names that do
# not require a docstring.
no-docstring-rgx=^_

# List of decorators that produce properties, such as abc.abstractproperty. Add
# to this list to register other decorators that produce valid properties.
# These decorators are taken in consideration only for invalid-name.
property-classes=abc.abstractproperty

# Naming style matching correct variable names.
variable-naming-style=snake_case

# Regular expression matching correct variable names. Overrides variable-
# naming-style.
variable-rgx=[a-z_][a-z0-9_]{0,30}$


[STRING]

# This flag controls whether the implicit-str-concat-in-sequence should
# generate a warning on implicit string concatenation in sequences defined over
# several lines.
check-str-concat-over-line-jumps=no


[IMPORTS]

# Allow wildcard imports from modules that define __all__.
allow-wildcard-with-all=no

# Analyse import fallback blocks. This can be used to support both Python 2 and
# 3 compatible code, which means that the block might have code that exists
# only in one or another interpreter, leading to false positives when analysed.
analyse-fallback-blocks=no

# Deprecated modules which should not be used, separated by a comma.
deprecated-modules=optparse,tkinter.tix

# Create a graph of external dependencies in the given file (report RP0402 must
# not be disabled).
ext-import-graph=

# Create a graph of every (i.e. internal and external) dependencies in the
# given file (report RP0402 must not be disabled).
import-graph=

# Create a graph of internal dependencies in the given file (report RP0402 must
# not be disabled).
int-import-graph=

# Force import order to recognize a module as part of the standard
# compatibility libraries.
known-standard-library=

# Force import order to recognize a module as part of a third party library.
known-third-party=enchant


[CLASSES]

# List of method names used to declare (i.e. assign) instance attributes.
defining-attr-methods=__init__,
                      __new__,
                      setUp

# List of member names, which should be excluded from the protected access
# warning.
exclude-protected=_asdict,
                  _fields,
                  _replace,
                  _source,
                  _make

# List of valid names for the first argument in a class method.
valid-classmethod-first-arg=cls

# List of valid names for the first argument in a metaclass class method.
valid-metaclass-classmethod-first-arg=cls


[DESIGN]

# Maximum number of arguments for function / method.
max-args=5

# Maximum number of attributes for a class (see R0902).
max-attributes=7

# Maximum number of boolean expressions in an if statement.
max-bool-expr=5

# Maximum number of branch for function / method body.
max-branches=12

# Maximum number of locals for function / method body.
max-locals=15

# Maximum number of parents for a class (see R0901).
max-parents=15

# Maximum number of public methods for a class (see R0904).
max-public-methods=20

# Maximum number of return / yield for function / method body.
max-returns=6

# Maximum number of statements in function / method body.
max-statements=50

# Minimum number of public methods for a class (see R0903).
min-public-methods=2


[EXCEPTIONS]

# Exceptions that will emit a warning when being caught. Defaults to
# "BaseException, Exception".
overgeneral-exceptions=BaseException,
                       Exception


================================================
FILE: TTS_additional_material/.readthedocs.yml
================================================
# .readthedocs.yml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Build documentation in the docs/ directory with Sphinx
sphinx:
  builder: html
  configuration: docs/source/conf.py

# Optionally set the version of Python and requirements required to build your docs
python:
  version: 3.7
  install:
    - requirements: docs/requirements.txt
    - requirements: requirements.txt

================================================
FILE: TTS_additional_material/CODE_OF_CONDUCT.md
================================================

# Contributor Covenant Code of Conduct

## Our Pledge

We as members, contributors, and leaders pledge to make participation in our
community a harassment-free experience for everyone, regardless of age, body
size, visible or invisible disability, ethnicity, sex characteristics, gender
identity and expression, level of experience, education, socio-economic status,
nationality, personal appearance, race, caste, color, religion, or sexual identity
and orientation.

We pledge to act and interact in ways that contribute to an open, welcoming,
diverse, inclusive, and healthy community.

## Our Standards

Examples of behavior that contributes to a positive environment for our
community include:

* Demonstrating empathy and kindness toward other people
* Being respectful of differing opinions, viewpoints, and experiences
* Giving and gracefully accepting constructive feedback
* Accepting responsibility and apologizing to those affected by our mistakes,
  and learning from the experience
* Focusing on what is best not just for us as individuals, but for the
  overall community

Examples of unacceptable behavior include:

* The use of sexualized language or imagery, and sexual attention or
  advances of any kind
* Trolling, insulting or derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or email
  address, without their explicit permission
* Other conduct which could reasonably be considered inappropriate in a
  professional setting

## Enforcement Responsibilities

Community leaders are responsible for clarifying and enforcing our standards of
acceptable behavior and will take appropriate and fair corrective action in
response to any behavior that they deem inappropriate, threatening, offensive,
or harmful.

Community leaders have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions that are
not aligned to this Code of Conduct, and will communicate reasons for moderation
decisions when appropriate.

## Scope

This Code of Conduct applies within all community spaces, and also applies when
an individual is officially representing the community in public spaces.
Examples of representing our community include using an official e-mail address,
posting via an official social media account, or acting as an appointed
representative at an online or offline event.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported to the community leaders responsible for enforcement at
coc-report@coqui.ai.
All complaints will be reviewed and investigated promptly and fairly.

All community leaders are obligated to respect the privacy and security of the
reporter of any incident.

## Enforcement Guidelines

Community leaders will follow these Community Impact Guidelines in determining
the consequences for any action they deem in violation of this Code of Conduct:

### 1. Correction

**Community Impact**: Use of inappropriate language or other behavior deemed
unprofessional or unwelcome in the community.

**Consequence**: A private, written warning from community leaders, providing
clarity around the nature of the violation and an explanation of why the
behavior was inappropriate. A public apology may be requested.

### 2. Warning

**Community Impact**: A violation through a single incident or series
of actions.

**Consequence**: A warning with consequences for continued behavior. No
interaction with the people involved, including unsolicited interaction with
those enforcing the Code of Conduct, for a specified period of time. This
includes avoiding interactions in community spaces as well as external channels
like social media. Violating these terms may lead to a temporary or
permanent ban.

### 3. Temporary Ban

**Community Impact**: A serious violation of community standards, including
sustained inappropriate behavior.

**Consequence**: A temporary ban from any sort of interaction or public
communication with the community for a specified period of time. No public or
private interaction with the people involved, including unsolicited interaction
with those enforcing the Code of Conduct, is allowed during this period.
Violating these terms may lead to a permanent ban.

### 4. Permanent Ban

**Community Impact**: Demonstrating a pattern of violation of community
standards, including sustained inappropriate behavior,  harassment of an
individual, or aggression toward or disparagement of classes of individuals.

**Consequence**: A permanent ban from any sort of public interaction within
the community.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage],
version 2.0, available at
[https://www.contributor-covenant.org/version/2/0/code_of_conduct.html][v2.0].

Community Impact Guidelines were inspired by 
[Mozilla's code of conduct enforcement ladder][Mozilla CoC].

For answers to common questions about this code of conduct, see the FAQ at
[https://www.contributor-covenant.org/faq][FAQ]. Translations are available 
at [https://www.contributor-covenant.org/translations][translations].

[homepage]: https://www.contributor-covenant.org
[v2.0]: https://www.contributor-covenant.org/version/2/0/code_of_conduct.html
[Mozilla CoC]: https://github.com/mozilla/diversity
[FAQ]: https://www.contributor-covenant.org/faq
[translations]: https://www.contributor-covenant.org/translations


================================================
FILE: TTS_additional_material/README.md
================================================
<img src="https://raw.githubusercontent.com/coqui-ai/TTS/main/images/coqui-log-green-TTS.png" height="56"/>

----

### 📣 Clone your voice with a single click on [🐸Coqui.ai](https://app.coqui.ai/auth/signin)

----

🐸TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality.
🐸TTS comes with pretrained models, tools for measuring dataset quality and already used in **20+ languages** for products and research projects.

[![Dicord](https://img.shields.io/discord/1037326658807533628?color=%239B59B6&label=chat%20on%20discord)](https://discord.gg/5eXr5seRrv)
[![License](<https://img.shields.io/badge/License-MPL%202.0-brightgreen.svg>)](https://opensource.org/licenses/MPL-2.0)
[![PyPI version](https://badge.fury.io/py/TTS.svg)](https://badge.fury.io/py/TTS)
[![Covenant](https://camo.githubusercontent.com/7d620efaa3eac1c5b060ece5d6aacfcc8b81a74a04d05cd0398689c01c4463bb/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f436f6e7472696275746f72253230436f76656e616e742d76322e3025323061646f707465642d6666363962342e737667)](https://github.com/coqui-ai/TTS/blob/master/CODE_OF_CONDUCT.md)
[![Downloads](https://pepy.tech/badge/tts)](https://pepy.tech/project/tts)
[![DOI](https://zenodo.org/badge/265612440.svg)](https://zenodo.org/badge/latestdoi/265612440)

![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/aux_tests.yml/badge.svg)
![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/data_tests.yml/badge.svg)
![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/docker.yaml/badge.svg)
![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/inference_tests.yml/badge.svg)
![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/style_check.yml/badge.svg)
![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/text_tests.yml/badge.svg)
![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/tts_tests.yml/badge.svg)
![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/vocoder_tests.yml/badge.svg)
![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/zoo_tests0.yml/badge.svg)
![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/zoo_tests1.yml/badge.svg)
![GithubActions](https://github.com/coqui-ai/TTS/actions/workflows/zoo_tests2.yml/badge.svg)
[![Docs](<https://readthedocs.org/projects/tts/badge/?version=latest&style=plastic>)](https://tts.readthedocs.io/en/latest/)

📰 [**Subscribe to 🐸Coqui.ai Newsletter**](https://coqui.ai/?subscription=true)

📢 [English Voice Samples](https://erogol.github.io/ddc-samples/) and [SoundCloud playlist](https://soundcloud.com/user-565970875/pocket-article-wavernn-and-tacotron2)

📄 [Text-to-Speech paper collection](https://github.com/erogol/TTS-papers)

<img src="https://static.scarf.sh/a.png?x-pxid=cf317fe7-2188-4721-bc01-124bb5d5dbb2" />

## 💬 Where to ask questions
Please use our dedicated channels for questions and discussion. Help is much more valuable if it's shared publicly so that more people can benefit from it.

| Type                            | Platforms                               |
| ------------------------------- | --------------------------------------- |
| 🚨 **Bug Reports**              | [GitHub Issue Tracker]                  |
| 🎁 **Feature Requests & Ideas** | [GitHub Issue Tracker]                  |
| 👩‍💻 **Usage Questions**          | [GitHub Discussions]                    |
| 🗯 **General Discussion**       | [GitHub Discussions] or [Discord]   |

[github issue tracker]: https://github.com/coqui-ai/tts/issues
[github discussions]: https://github.com/coqui-ai/TTS/discussions
[discord]: https://discord.gg/5eXr5seRrv
[Tutorials and Examples]: https://github.com/coqui-ai/TTS/wiki/TTS-Notebooks-and-Tutorials


## 🔗 Links and Resources
| Type                            | Links                               |
| ------------------------------- | --------------------------------------- |
| 💼 **Documentation**              | [ReadTheDocs](https://tts.readthedocs.io/en/latest/)
| 💾 **Installation**               | [TTS/README.md](https://github.com/coqui-ai/TTS/tree/dev#install-tts)|
| 👩‍💻 **Contributing**               | [CONTRIBUTING.md](https://github.com/coqui-ai/TTS/blob/main/CONTRIBUTING.md)|
| 📌 **Road Map**                   | [Main Development Plans](https://github.com/coqui-ai/TTS/issues/378)
| 🚀 **Released Models**            | [TTS Releases](https://github.com/coqui-ai/TTS/releases) and [Experimental Models](https://github.com/coqui-ai/TTS/wiki/Experimental-Released-Models)|

## 🥇 TTS Performance
<p align="center"><img src="https://raw.githubusercontent.com/coqui-ai/TTS/main/images/TTS-performance.png" width="800" /></p>

Underlined "TTS*" and "Judy*" are 🐸TTS models
<!-- [Details...](https://github.com/coqui-ai/TTS/wiki/Mean-Opinion-Score-Results) -->

## Features
- High-performance Deep Learning models for Text2Speech tasks.
    - Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech).
    - Speaker Encoder to compute speaker embeddings efficiently.
    - Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN)
- Fast and efficient model training.
- Detailed training logs on the terminal and Tensorboard.
- Support for Multi-speaker TTS.
- Efficient, flexible, lightweight but feature complete `Trainer API`.
- Released and ready-to-use models.
- Tools to curate Text2Speech datasets under```dataset_analysis```.
- Utilities to use and test your models.
- Modular (but not too much) code base enabling easy implementation of new ideas.

## Implemented Models
### Spectrogram models
- Tacotron: [paper](https://arxiv.org/abs/1703.10135)
- Tacotron2: [paper](https://arxiv.org/abs/1712.05884)
- Glow-TTS: [paper](https://arxiv.org/abs/2005.11129)
- Speedy-Speech: [paper](https://arxiv.org/abs/2008.03802)
- Align-TTS: [paper](https://arxiv.org/abs/2003.01950)
- FastPitch: [paper](https://arxiv.org/pdf/2006.06873.pdf)
- FastSpeech: [paper](https://arxiv.org/abs/1905.09263)
- FastSpeech2: [paper](https://arxiv.org/abs/2006.04558)
- SC-GlowTTS: [paper](https://arxiv.org/abs/2104.05557)
- Capacitron: [paper](https://arxiv.org/abs/1906.03402)
- OverFlow: [paper](https://arxiv.org/abs/2211.06892)
- Neural HMM TTS: [paper](https://arxiv.org/abs/2108.13320)

### End-to-End Models
- VITS: [paper](https://arxiv.org/pdf/2106.06103)
- YourTTS: [paper](https://arxiv.org/abs/2112.02418)

### Attention Methods
- Guided Attention: [paper](https://arxiv.org/abs/1710.08969)
- Forward Backward Decoding: [paper](https://arxiv.org/abs/1907.09006)
- Graves Attention: [paper](https://arxiv.org/abs/1910.10288)
- Double Decoder Consistency: [blog](https://erogol.com/solving-attention-problems-of-tts-models-with-double-decoder-consistency/)
- Dynamic Convolutional Attention: [paper](https://arxiv.org/pdf/1910.10288.pdf)
- Alignment Network: [paper](https://arxiv.org/abs/2108.10447)

### Speaker Encoder
- GE2E: [paper](https://arxiv.org/abs/1710.10467)
- Angular Loss: [paper](https://arxiv.org/pdf/2003.11982.pdf)

### Vocoders
- MelGAN: [paper](https://arxiv.org/abs/1910.06711)
- MultiBandMelGAN: [paper](https://arxiv.org/abs/2005.05106)
- ParallelWaveGAN: [paper](https://arxiv.org/abs/1910.11480)
- GAN-TTS discriminators: [paper](https://arxiv.org/abs/1909.11646)
- WaveRNN: [origin](https://github.com/fatchord/WaveRNN/)
- WaveGrad: [paper](https://arxiv.org/abs/2009.00713)
- HiFiGAN: [paper](https://arxiv.org/abs/2010.05646)
- UnivNet: [paper](https://arxiv.org/abs/2106.07889)

You can also help us implement more models.

## Install TTS
🐸TTS is tested on Ubuntu 18.04 with **python >= 3.7, < 3.11.**.

If you are only interested in [synthesizing speech](https://tts.readthedocs.io/en/latest/inference.html) with the released 🐸TTS models, installing from PyPI is the easiest option.

```bash
pip install TTS
```

If you plan to code or train models, clone 🐸TTS and install it locally.

```bash
git clone https://github.com/coqui-ai/TTS
pip install -e .[all,dev,notebooks]  # Select the relevant extras
```

If you are on Ubuntu (Debian), you can also run following commands for installation.

```bash
$ make system-deps  # intended to be used on Ubuntu (Debian). Let us know if you have a different OS.
$ make install
```

If you are on Windows, 👑@GuyPaddock wrote installation instructions [here](https://stackoverflow.com/questions/66726331/how-can-i-run-mozilla-tts-coqui-tts-training-with-cuda-on-a-windows-system).


## Docker Image
You can also try TTS without install with the docker image.
Simply run the following command and you will be able to run TTS without installing it.

```bash
docker run --rm -it -p 5002:5002 --entrypoint /bin/bash ghcr.io/coqui-ai/tts-cpu
python3 TTS/server/server.py --list_models #To get the list of available models
python3 TTS/server/server.py --model_name tts_models/en/vctk/vits # To start a server
```

You can then enjoy the TTS server [here](http://[::1]:5002/)
More details about the docker images (like GPU support) can be found [here](https://tts.readthedocs.io/en/latest/docker_images.html)


## Synthesizing speech by 🐸TTS

### 🐍 Python API

```python
from TTS.api import TTS

# Running a multi-speaker and multi-lingual model

# List available 🐸TTS models and choose the first one
model_name = TTS.list_models()[0]
# Init TTS
tts = TTS(model_name)
# Run TTS
# ❗ Since this model is multi-speaker and multi-lingual, we must set the target speaker and the language
# Text to speech with a numpy output
wav = tts.tts("This is a test! This is also a test!!", speaker=tts.speakers[0], language=tts.languages[0])
# Text to speech to a file
tts.tts_to_file(text="Hello world!", speaker=tts.speakers[0], language=tts.languages[0], file_path="output.wav")

# Running a single speaker model

# Init TTS with the target model name
tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False, gpu=False)
# Run TTS
tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path=OUTPUT_PATH)

# Example voice cloning with YourTTS in English, French and Portuguese:
tts = TTS(model_name="tts_models/multilingual/multi-dataset/your_tts", progress_bar=False, gpu=True)
tts.tts_to_file("This is voice cloning.", speaker_wav="my/cloning/audio.wav", language="en", file_path="output.wav")
tts.tts_to_file("C'est le clonage de la voix.", speaker_wav="my/cloning/audio.wav", language="fr", file_path="output.wav")
tts.tts_to_file("Isso é clonagem de voz.", speaker_wav="my/cloning/audio.wav", language="pt", file_path="output.wav")
```

### Command line `tts`
#### Single Speaker Models

- List provided models:

    ```
    $ tts --list_models
    ```
- Get model info (for both tts_models and vocoder_models):
    - Query by type/name:
        The model_info_by_name uses the name as it from the --list_models.
        ```
        $ tts --model_info_by_name "<model_type>/<language>/<dataset>/<model_name>"
        ```
        For example:

        ```
        $ tts --model_info_by_name tts_models/tr/common-voice/glow-tts
        ```
        ```
        $ tts --model_info_by_name vocoder_models/en/ljspeech/hifigan_v2
        ```
    - Query by type/idx:
        The model_query_idx uses the corresponding idx from --list_models.
        ```
        $ tts --model_info_by_idx "<model_type>/<model_query_idx>"
        ```
        For example:

        ```
        $ tts --model_info_by_idx tts_models/3
        ```

- Run TTS with default models:

    ```
    $ tts --text "Text for TTS" --out_path output/path/speech.wav
    ```

- Run a TTS model with its default vocoder model:

    ```
    $ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav
    ```
  For example:

    ```
    $ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --out_path output/path/speech.wav
    ```

- Run with specific TTS and vocoder models from the list:

    ```
    $ tts --text "Text for TTS" --model_name "<model_type>/<language>/<dataset>/<model_name>" --vocoder_name "<model_type>/<language>/<dataset>/<model_name>" --out_path output/path/speech.wav
    ```

  For example:

    ```
    $ tts --text "Text for TTS" --model_name "tts_models/en/ljspeech/glow-tts" --vocoder_name "vocoder_models/en/ljspeech/univnet" --out_path output/path/speech.wav
    ```


- Run your own TTS model (Using Griffin-Lim Vocoder):

    ```
    $ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav
    ```

- Run your own TTS and Vocoder models:
    ```
    $ tts --text "Text for TTS" --model_path path/to/model.pth --config_path path/to/config.json --out_path output/path/speech.wav
        --vocoder_path path/to/vocoder.pth --vocoder_config_path path/to/vocoder_config.json
    ```

#### Multi-speaker Models

- List the available speakers and choose as <speaker_id> among them:

    ```
    $ tts --model_name "<language>/<dataset>/<model_name>"  --list_speaker_idxs
    ```

- Run the multi-speaker TTS model with the target speaker ID:

    ```
    $ tts --text "Text for TTS." --out_path output/path/speech.wav --model_name "<language>/<dataset>/<model_name>"  --speaker_idx <speaker_id>
    ```

- Run your own multi-speaker TTS model:

    ```
    $ tts --text "Text for TTS" --out_path output/path/speech.wav --model_path path/to/model.pth --config_path path/to/config.json --speakers_file_path path/to/speaker.json --speaker_idx <speaker_id>
    ```

## Directory Structure
```
|- notebooks/       (Jupyter Notebooks for model evaluation, parameter selection and data analysis.)
|- utils/           (common utilities.)
|- TTS
    |- bin/             (folder for all the executables.)
      |- train*.py                  (train your target model.)
      |- ...
    |- tts/             (text to speech models)
        |- layers/          (model layer definitions)
        |- models/          (model definitions)
        |- utils/           (model specific utilities.)
    |- speaker_encoder/ (Speaker Encoder models.)
        |- (same)
    |- vocoder/         (Vocoder models.)
        |- (same)
```


================================================
FILE: TTS_additional_material/hubconf.py
================================================
dependencies = [
    'torch', 'gdown', 'pysbd', 'gruut', 'anyascii', 'pypinyin', 'coqpit', 'mecab-python3', 'unidic-lite'
]
import torch

from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer


def tts(model_name='tts_models/en/ljspeech/tacotron2-DCA',
        vocoder_name=None,
        use_cuda=False):
    """TTS entry point for PyTorch Hub that provides a Synthesizer object to synthesize speech from a give text.

    Example:
        >>> synthesizer = torch.hub.load('coqui-ai/TTS', 'tts', source='github')
        >>> wavs = synthesizer.tts("This is a test! This is also a test!!")
            wavs - is a list of values of the synthesized speech.

    Args:
        model_name (str, optional): One of the model names from .model.json. Defaults to 'tts_models/en/ljspeech/tacotron2-DCA'.
        vocoder_name (str, optional): One of the model names from .model.json. Defaults to 'vocoder_models/en/ljspeech/multiband-melgan'.
        pretrained (bool, optional): [description]. Defaults to True.

    Returns:
        TTS.utils.synthesizer.Synthesizer: Synthesizer object wrapping both vocoder and tts models.
    """
    manager = ModelManager()

    model_path, config_path, model_item = manager.download_model(model_name)
    vocoder_name = model_item[
        'default_vocoder'] if vocoder_name is None else vocoder_name
    vocoder_path, vocoder_config_path, _ = manager.download_model(vocoder_name)

    # create synthesizer
    synt = Synthesizer(tts_checkpoint=model_path,
                       tts_config_path=config_path,
                       vocoder_checkpoint=vocoder_path,
                       vocoder_config=vocoder_config_path,
                       use_cuda=use_cuda)
    return synt


if __name__ == '__main__':
    synthesizer = torch.hub.load('coqui-ai/TTS:dev', 'tts', source='github')
    synthesizer.tts("This is a test!")


================================================
FILE: TTS_additional_material/requirements.txt
================================================
# core deps
numpy==1.21.6;python_version<"3.10"
numpy;python_version=="3.10"
cython==0.29.28
scipy>=1.4.0
torch>=1.7
torchaudio
soundfile
librosa==0.8.0
numba==0.55.1;python_version<"3.9"
numba==0.56.4;python_version>="3.9"
inflect==5.6.0
tqdm
anyascii
pyyaml
fsspec>=2021.04.0
aiohttp
packaging
# deps for examples
flask
# deps for inference
pysbd
# deps for notebooks
umap-learn==0.5.1
pandas
# deps for training
matplotlib
# coqui stack
trainer==0.0.20
# config management
coqpit>=0.0.16
# chinese g2p deps
jieba
pypinyin
# japanese g2p deps
mecab-python3==1.0.5
unidic-lite==1.0.8
# gruut+supported langs
gruut[de]==2.2.3
# deps for korean
jamo
nltk
g2pkk>=0.1.1


================================================
FILE: UpdateHistory.md
================================================
# Update History
---
## JULY 14th 2023 UPDATE: Research Mode
I can finnaly share the first draft of the Research Mode. This modality was thought for people often dealing with research papers. 
- Switch to research mode by saying *'Switch to Research Mode'*
- :star: Initialize a new workspace like this: *'Initialize a new workspace about Carbon Fiber Applications in the Spacecraft industry'*. A workspace is a folder that collects and organize the results of the research. This protocol is subdivided into 3 sub-routines:
   1. Core Paper identification: Use the **Semantic Scholar API** to identify some strongly relevant papers;
   2. Core Expansion: for each paper, finds some suggestions, then keep only the suggestions that appear to be similar to at least 2 paper;
   3. Refy Expansion: use the refy suggestion package to enlarge the results;
- Find suggestions like: *'find suggestions that are sililar to the paper with title ...'*
- Download: *'download the paper with title ...'*
- :star: Query your database like: *'what is the author of the paper with title ...?'*  *'what are the experimental conditions set for the paper with title ...?'*

PS: This mode is not super stable and needs to be worked on<br>

*PPS: This project will be discontinued for some time since I'll be working on my thesis until 2024. However there are already so many things that can be improved so I'll be back!*

---
## APRIL 25th 2023 UPDATE: Jarvis can surf
By integrating LangChain into the project I am happy to bring some useful capabilities to Jarvis like accessing the web. Different LangChain Agents are now instructed to perform complex tasks like finding files, accessing the web, and extracting content from local resources...<br>
 - **Offline tasks**: the experience is now fully handled by AI so you don't need to guide jarvis in the tasks anymore. Earlier: *'Jarvis, find a file'* triggered the ```find_file``` function and then you had to provide keywords; now you can just say *'How many files are talking about X? Make a summary of one of them...'* and the system makes an action plan to satisfy your requests using different functions. The best part is that the agent in charge of this can realize if a file is relevant to the question by opening it and questioning itself.<br>
 - **Online tasks**: a different agent is instructed to surf the web searching for answers. These agents will answer a question like *'How s the weather like?'* or *'What is the latest news about stocks?'*<br>

So to summarize: Jarvis is in charge of keeping the conversation ongoing at a higher level and decides to delegate to the agents if needed; the offline agent puts its hands on files and PC stuff; the online agent surfs the web. I found this separation of tasks to work better especially with the main objective being to chat. If you make a conversational agent with tools the result is that it will mostly look for the answer online or among files consuming credit and time. 
<br>

While this solution brings some stability to the conversation, it s not still ideal for processing scientific papers and the system does not produce any material yet. I am working on a sort of *project mode* that is focused on actions rather than chatting. Ideally I'd like jarvis to shift from *chat mode* to *project mode* when I'll be asking *"help me to understand topic X from the paper Y"* and then other similar papers should be downloaded, summarized and compared.

---

---
## APRIL 15th 2023 UPDATE: Vicuna Integration and offline modality
Worked hard to bring a local, free, alternative to OpenAI GPT models since lately some open-source competition is arising. Vicuna is a free GPT model that is claimed to be 90% as good as ChatGPT4. My plan is to **integrate** this model with ChatGPT rather than straight-out substitute it. This is because OpenAI API is more reliable, faster and doesn't seem to suffer from hallucinations (i.e. when a conversational AI generates a response to a prompt that is either false or irrelevant to the original request). The installation of this model is one-click but, since the model is hardware-dependant, the response time will vary according to your hardware capabilities. I made a more [in-depth analysis](https://github.com/gia-guar/JARVIS-ChatGPT/tree/main/Vicuna/README.md) on whether should you install it or just stick to OpenAI.<br>
The gist comes down to how much vRAM and RAM you have. The oogabooga ui backed is pretty efficient in dynamically allocating the models.
- ! Move the files from the ```whisper_edits``` folder to the ```.venv\lib\site-packages\whisper``` ! <span style="color:grey"> this is needed to allocate better the whisper model btween GPU vRAM and RAM;</span><br> 
- Added measures to improve GPU memory allocation; see ```get_answer(optimize_cuda=Ture)``` <span style="color:grey"> this is beta, I
 still need to integrate in a more built-in way with the rest of the scripts;</span>
- Added OFFLINE toggle to run exclusively locally and avoid credit consumption <span style="color:grey"> this is beta, I still need to integrate in a more built-in way with the rest of the scripts;</span>
---
---
## APRIL 11th 2023 UPDATE: Overall improvement to search engine, update README.md
new: ```pip install argostranslate pvporcupine python-dotenv```
- Upgrading to Python 3.8 and CUDA 11.7 (!)
- Lately, the ```translator``` package was taking too long to work (~20 seconds to get a translation), so I added another translator package that works instantly and it's offline;
- The 'Jarvis' wake-up keyword was added from the ```picovoice``` package. It requires a free key you can get at https://picovoice.ai/platform/porcupine;
- Fundamental improvements to the local search engine in terms of speed and credit consumption. With this update, accessing information from past conversations gets easier. When the search is completed the AI will summarize the text;
- Using dotenv for easier authenthication; 
<br>
---
---
## APRIL 5th 2023 UPDATE: New Voice Models (F.R.I.D.A.Y), Expanding Local Search Engine and More
I finally decided to upgrade the voice model from @ConrentinJ to [@CoquiAI](https://github.com/coqui-ai/tts). This model works on the same principle (Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis) but is much faster, more versatile and offers more options to explore. Right now, I just want to push this version live, it works by default with one of the models offered by the TTS package. In the future, I'll explore the differences and strengths of all the other models (you can do it by changing the name of the model when the Voice is initialized inside ``Voice.py``, as shown in the ``tts_demo.py``). Moreover, this model is multilanguage so if you find any clean interviews of voice actors you can use them as models when the answer needs to be spoken in your (or any) language.
<br>
Secondly, I've made some improvements to the Local Search Engine. Now it can be accessed with voice. In particular:

 1. Once you've made a prompt, ```LocalSearchEngine.analyze_prompt()``` will try to interpret the prompt and it will produce a flag, ranging from 1 to N (where N is the number of Actions the Assistant can make). The prompt analyzer make use of a sort of *semantic if/else*. The idea is: *if* the **meaning** of the prompt is equal (has high cosine similarity) to the action, *then* return ```True``` *else* return ```False```; 
 2. If the flag corresponds to **"1"** the associated action will be **"Look for a file"** and that protocol will be triggered;
 3. The system will first communicate its intentions and if you confirm, the assistant will ask you to provide some search keywords;
 4. The system will utilize a pandas DataFrame, where some topic tags are associated to the conversation, to detect relevant discussions;
 5. Finally, the system will rank all the files from the most relevant to the least pertinent;
 6. The natural following step would be to recover one of the files, but this is still a work in progress;
<br>

Minor updates:
 - Bug fixes;
 - Added ``langid``, ``TextBlob`` and ``translators`` to get faster translations and reduce GPT credit usage;
 - Improved Speech-to-text by reducing the possible languages to the ones specified in the Assistant model;
<br>

---
---
## April 1st 2023 UPDATE: Introducing the Local Search Engine, sounds and more
I managed to build some tools that are capable of reading and abstracting information from textual files (.txt). This tool might be precious in futire when voice commands that handle the Assistant memory will be introduced. The idea is to have specific commands like "open the last conversation about topic X" or "I remember something you said about topic Y can you make a summary of that conversation?". The LocalSearchEngine can find sort the discussions by relevancy (``cosine_similarity``) making use of embeddings: *an embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness. [OpenAI - what are embeddings](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings). You can find these inside ``Assistant\tools.py``*

The LocalSearchEngine adopt a 4 step algorithm to compute the relevancy between a key and a text:
1. **Tags extractions**: ask ChatGPT to extract up to 10 topics (tags) that are discussed in the text. This will reduce the noise in the conversation leading to more relevant, high value content;
2. **Translation**: OpenAI Embeddings work in any language but, to maximize the pterormance, the tags are translated in the same language of the key;
3. **Emebdding computation**: using ``text-embedding-ada-002`` to extract the Embeddings from both keys and translated tags;
4. **Cosine Similarity Computation**: use OpenAI ``cosine_similarity()`` to compute the similarity index. Alternatively you coul also ask naively to ChatGPT where some text is relevant to a key or not but the results weren't as good. Sill you could ask for a quick search ``quick_search()``;

<p align="center">
  <img src="https://user-images.githubusercontent.com/49094051/229243205-337b7bfa-2e7b-43b1-a770-62b524367dc6.PNG" /><br>
  <i><span style="color:grey">Custom DataFrame showing the search results with key "physics and space exploration" between some conversation I had.<br> Here you can also see the tags that were extrapolated. Notice that some of them are in italian, since the conversation was held in my native language </span></i> 
 </p>


Furthermore, a ``Translator`` class was implemented as addidtional tool. 

other important updates"
- introduced a ``VirtualAsssistant`` object to allow a more intuitive ``main`` flow and handle most of the requests ``Assistant\VirtualAssistant.py``;
- rearranged the directory structure to 'hide' some backend code;

other minor updates:
- introduced sounds! Now you can have sound feedback of what is happening with the Assistant. Seeing is beliving;
- Made overall code slightly more efficient; 
<br>
<br>
---
---
## March 26 2026 UPDATE: Background execution and Hands Free control
This update is focused on making the assistant more viable in everyday life. 
 - You can now follow some instructions you can find at <span style="color:green"> BootJARVIS.pdf </span>. to run the main script automaitcally when the system boots;
 - There no more need to press Ctrl+C to deliver mic recordings to Wisper. Here hows *Hands Free* and *Summoning* work like:
    1. By default the assistant is at sleep. A python script will however run passively to listen for a special keyword to summon JARVIS. The function is hence ```PassiveListen```, you'll find it inside get_audio.py. It leverage the SpeechRecognition python package that is faster and more versatile than whisper but less accurate. 
    2. When the keyword (by default ```'elephant'```* ) is spoken, the assistant wakes up and the following audio is fed to Whisper. You'll need to summon the assistance only once, every time you begin a conversation.
    3. When a conversation begins, the assistant will listen, after you deliver a prompt some time will pass, **3 seconds** (```RESPONSE_TIME``` inside the ```record()``` function in get_audio.py ) before the audio is listened and transcribed by whisper. A conversation is closed when the user say a Closing Keyword (for now 'thanks'). This will trigger the chat saving and will put the assistant back to sleep, waiting for a awakening keyword. 
    4. In alternative, the system will go back to sleep automatically if **30 seconds** (variable ```SLEEP_DELAY``` inside the ```record()``` function in get_audio.py ) pass after the response is repoduced.
 - minor improvements:
    1. Adding the package pytts3x for text to speech when IBM is un-available (monthly usage expired);
    2. improved CLI outputs;
    3. improved descriptions;

(*) ```'elephant'``` was chosen as default keyword becaus it' uncommon and understandable. You can change it to 'Hey Jarvis' o 'Hey Siri' but you need to be sure the system catch what you are saying (SpeechRecognition is not that good with fancy names) maybe in future better ways to summon will be thought.
<br>
<br>

---
---
## March 13 2023 UPDATE: JARVIS VOICE IS HERE!
**How i did it**: I spent a huge amount of time on @CorentinJ github https://github.com/CorentinJ/Real-Time-Voice-Cloning which provides an interface to generate audio from text using a pretrained text-to-speech model. The GUI is pretty clever and I admire his work, however, using the model in a python script is not straight foward! I first edited the toolbox to save **embeddings**, which are the beginning of  the generation process,. They are the "voice ID" of the targeted people, expressed in terms of matrix. With this edit, I used the toolbox to generate Paul Bettany's voice embedding. <br>
Then, I wrote down a trimmed version of CorentinJ's toolbox, `JARVIS.py`. This version can load the embedding learned from Jarvis voice and do basic oprations like Synth and vocode upon request from any script. 

![toolbox](https://user-images.githubusercontent.com/49094051/224836993-ee7b4964-e518-46f4-85b1-b25f48f1a78c.PNG)
<p align="center"> Original Toolbox interface: you can see the embedding </p>


================================================
FILE: Vicuna/README.md
================================================
# OOBABOOGA-UI & VICUNA INSTALLATION
Guide on how to run a ChatGPT alternative on your PC.

## 1. What are the OOBABOOGA-UI and Vicuna?
The [oobabooga text generation web ui](https://github.com/oobabooga/text-generation-webui) is a User Interface that can utilize many open-source Large Language Models like Alpaca, Llama and many others. **It's 100% free and runs on your pc so no connection is required**. The powerful aspect of the Oobabooga interface is that it provides a way to interface with the model through an Application Program Interface, which is handy for this project. <br>
<p>
    <img width="2348" alt="Vicuna" src="https://raw.githubusercontent.com/oobabooga/screenshots/main/cai3.png" width="10">
</p>
<p align='center'>
    <span style="color:grey"> Example of conversation in Oobabooga UI </span>
</p>

[Vicuna 13-b](https://vicuna.lmsys.org/) is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation using GPT-4 as a judge shows Vicuna achieves more than 90% quality of OpenAI ChatGPT. 

## 2. Should I install this? (*)
Processing large amounts of information can be quite expensive if you are running on a pay-as-you-go account. Secondly, having a backup solution for when our connection isn't reliable or just when there is too much traffic on OpenAI servers has its pros and cons. For this project, I am planning on using offline, free text processing for PDFs and long website analysis. In this way It'll be possible to let the computer elaborate huge amounts of material in the background while I'll be doing other stuff.

## 3. (*) Pros and cons
Pros:
 1. **local**, no internet needed;
 2. **free**;
 3. reasonably good *(see cons: hallucinations);
 4. runs on CPU and/or GPU;
 5. You have some freedom in choosing what opensource model works best for you;

Cons:
 1. **Size**: these are **LARGE** language models. They are designed to take as much RAM/vRAM on your devices. This means that, depending on the performance of your hardware, you might face Out Of Memory errors.
 <p align="center">
    <img width="2348" alt="Vicuna" src="https://user-images.githubusercontent.com/49094051/232077149-882178eb-c73e-4834-b82e-44cafa941666.PNG">
</p>

> This snap was taken running Whisper's "large" model (10GB) and the GPU Vicuna model. In particular, you can see the RAM getting filled when the model begins to process.

 2. **Speed**: OpenAI services are generally faster. The speed of the answer will depend on your hardware;
 3. **Multitasking**: due to their size and RAM footprint, multitasking with these software running might be difficult;
 4. Hallucinations: these models are quite raw and they tend to have long discussions... with themselves! Coding can put a patch on the problem but right now it's still an issue and, sometimes, answers might be *weird*. <span style="color:grey"> For example, Vicuna might ask itself a question after it gave an answer and go on like that until it consumes all tokens!</span>

# INSTALLATION
## 1. Make a folder anywhere on your computer and click on the folder path:
 <p align="center">
    <img width="2348" alt="Vicuna" src="https://user-images.githubusercontent.com/49094051/232081647-5c5ccc3e-1fc0-45d8-905b-4c91ac67e77f.png">
    <span style='color:grey'> Here <i>ChatGPT</i> is where I am keeping the project, while <i>vicuna</i> is the folder I chose to install Vicuna in </span>
</p>


## 3. Type ```powershell``` and hit enter

## 4. run ```iex (irm vicuna.tc.ht)``` from powershell
This command will run a One-Click-Installation provided by [TroubleChute](https://hub.tcno.co/ai/text-ai/vicuna/). You can follow the instructions you'll see on screen or check his [step-by-step tutorial](https://youtu.be/d4dk_7FptXk). 

## 5. Select CPU or GPU (or both)
The Vicuna CPU model ([eachadea/ggml-vicuna-13b-4bit](https://huggingface.co/eachadea/legacy-ggml-vicuna-13b-4bit)) weighs ~15 GB. It can run only on CPU (RAM);<br>
The Vicuna GPU model ([anon8231489123/vicuna-13b-GPTQ-4bit-128g](https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g)) weighs ~7.5 GB. It can dynamically allocate itself between RAM and vRAM. <br>

You can also download manually using the following syntax on CMD: <br>
```cd oobabooga-windows\text-generation-webui```<br>
```python download-model.py facebook/opt-1.3b```<br>
[Hugging Face](https://huggingface.co/models?pipeline_tag=text-generation&sort=downloads) is the main place to download models. These are some examples:
* [Pythia](https://huggingface.co/models?sort=downloads&search=eleutherai%2Fpythia+deduped)
* [OPT](https://huggingface.co/models?search=facebook/opt)
* [GALACTICA](https://huggingface.co/models?search=facebook/galactica)
* [GPT-J 6B](https://huggingface.co/EleutherAI/gpt-j-6B/tree/main)


## 6. Edit the ```start-webui-vicuna.bat``` or run ```start-webui-vicuna-gpu.bat```
change the last line with:<br>
 - if you are using GPU:
``` call python server.py --wbits 4 --groupsize 128 --listen --no-stream --model anon8231489123_vicuna-13b-GPTQ-4bit-128g --notebook --extension api```
 - If you are using CPU: ```call python server.py --model eachadea_ggml-vicuna-13b-4bit --listen --no-stream --notebook --extension api```
<br>
This edit is already applied in the file ```start-webui-vicuna-gpu.bat``` you'll find in this folder
You should be able to run the ```.bat``` file with no errors and the following message should appear: <br>Running on local URL:  http://0.0.0.0:786<br>
To create a public link, set `share=True` in `launch()`.


================================================
FILE: Vicuna/start-webui-vicuna-gpu.bat
================================================
@echo off

@echo Starting the web UI...

cd /D "%~dp0"

set MAMBA_ROOT_PREFIX=%cd%\installer_files\mamba
set INSTALL_ENV_DIR=%cd%\installer_files\env

if not exist "%MAMBA_ROOT_PREFIX%\condabin\micromamba.bat" (
  call "%MAMBA_ROOT_PREFIX%\micromamba.exe" shell hook >nul 2>&1
)
call "%MAMBA_ROOT_PREFIX%\condabin\micromamba.bat" activate "%INSTALL_ENV_DIR%" || ( echo MicroMamba hook not found. && goto end )
cd text-generation-webui

call python server.py --wbits 4 --groupsize 128 --listen --no-stream --model anon8231489123_vicuna-13b-GPTQ-4bit-128g --extensions api

:end
pause


================================================
FILE: Vicuna/vicuna.ps1
================================================
# Copyright (C) 2023 TroubleChute (Wesley Pyburn)
# Licensed under the GNU General Public License v3.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.gnu.org/licenses/gpl-3.0.en.html
#
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.
#    
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#    
#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see <https://www.gnu.org/licenses/>.
#
# ----------------------------------------
# This script:
# 1. Check if current directory is oobabooga-windows, or oobabooga-windows is in directory
# 2. Run my install script for obabooga/text-generation-webui
# 3. Tells you how to download the vicuna model, and opens the model downloader.
# 4. Run the model downloader (Unless CPU-Only mode was selected in install, in which case the CPU model is downloaded)
# 5. Replace commands in the start-webui.bat file
# 6. Create desktop shortcuts
# 7. Run the webui
# ----------------------------------------

Write-Host "Welcome to TroubleChute's Vicuna installer!" -ForegroundColor Cyan
Write-Host "Vicuna as well as all of its other dependencies and a model should now be installed..." -ForegroundColor Cyan
Write-Host "[Version 2023-04-11]`n`n" -ForegroundColor Cyan

# 1. Check if current directory is oobabooga-windows, or oobabooga-windows is in directory
# If it is, CD back a folder.
$currentDir = (Get-Item -Path ".\" -Verbose).FullName
if ($currentDir -like "*\oobabooga-windows") {
    Set-Location ../
}

$containsFolder = Get-ChildItem -Path ".\" -Directory -Name | Select-String -Pattern "oobabooga-windows"
if ($containsFolder) {
    Write-Host "The 'oobabooga-windows' folder already exists." -ForegroundColor Cyan
    $downloadAgain = Read-Host "Do you want to download it again? (Y/N)"

    if ($downloadAgain -eq "Y" -or $downloadAgain -eq "y") {
        # Perform the download again
        $containsFolder = $False
    }
}

if (-not $containsFolder) {
    Write-Host "I'll start by installing Oobabooga first, then we'll get to the model...`n`n"
    
    #2. Choose CPU or GPU installation
    Write-Host "Do you have an NVIDIA GPU?" -ForegroundColor Cyan
    Write-Host "Enter anything but y or n to skip." -ForegroundColor Yellow

    $choice = Read-Host "Answer (y/n)"

    $skip_model = 1
    $skip_start = 1

    if ($choice -eq "Y" -or $choice -eq "y") {
        Write-Host "Installing GPU & CPU compatible version" -ForegroundColor Cyan
        Write-Host "If this fails, please delete the folder and choose 'N'" -ForegroundColor Cyan
        $gpu = "Yes"
        # 2. Run my install script for obabooga/text-generation-webui
        iex (irm ooba.tc.ht)
    }
    elseif ($choice -eq "N" -or $choice -eq "n") {
        Write-Host "Installing CPU-Only version" -ForegroundColor Cyan
        $gpu = "No"
        # 2. Run my install script for obabooga/text-generation-webui
        iex (irm ooba.tc.ht)
    }

} else {
    # CD into folder anyway
    Set-Location "./oobabooga-windows"
}

function Get-VicunaCPU() {
    # Download CPU model (only the updated one)
    # If downloaded using model downloader, another 8.14 GB download will be run...
    $url = "https://huggingface.co/eachadea/ggml-vicuna-13b-4bit/resolve/main/ggml-vicuna-13b-4bit-rev1.bin"
    $outputPath = "text-generation-webui\models\eachadea_ggml-vicuna-13b-4bit\ggml-vicuna-13b-4bit-rev1.bin"

    # Download the file from the URL
    Write-Host "Downloading: eachadea/ggml-vicuna-13b-4bit (CPU model)" -ForegroundColor Cyan
    Get-Aria2File -Url $url -OutputPath $outputPath
    Write-Host "`nDone!`n"
}
function Get-VicunaGPU() {
    # Download GPU/CUDA model
    $blob = "https://huggingface.co/anon8231489123/vicuna-13b-GPTQ-4bit-128g/resolve/main"
    $outputPath = "text-generation-webui\models\anon8231489123_vicuna-13b-GPTQ-4bit-128g"

    # Download the file from the URL
    Write-Host "Downloading: anon8231489123/vicuna-13b-GPTQ-4bit-128g (GPU/CUDA model)" -ForegroundColor Cyan
    $files = @(
        "vicuna-13b-4bit-128g.safetensors",
        "tokenizer_config.json",
        "tokenizer.model",
        "special_tokens_map.json",
        "pytorch_model.bin.index.json",
        "generation_config.json",
        "config.json"
    )

    Get-Aria2Files -Url $blob -OutputPath $outputPath -Files $files
    Write-Host "`nDone!`n"
}

# Allow importing remote functions
iex (irm Import-RemoteFunction.tc.ht)
Import-FunctionIfNotExists -Command Get-Aria2File -ScriptUri "File-DownloadMethods.tc.ht"

# Create the output folder if it does not exist
New-Item -ItemType Directory -Force -Path (Split-Path -Parent "text-generation-webui\models\eachadea_ggml-vicuna-13b-4bit") | Out-Null
if (-not $?) {
    Write-Error "Failed to create directory."
}

if ($gpu -eq "No") {
    Get-VicunaCPU
} else {
    Write-Host "`n`nPick which models to download:" -ForegroundColor Cyan
    Write-Host -NoNewline "CPU (7.5GB): " -ForegroundColor Red
    Write-Host "1" -ForegroundColor Green
    Write-Host -NoNewline "GPU [Nvidia] (6.9GB): " -ForegroundColor Red
    Write-Host "2" -ForegroundColor Green
    Write-Host -NoNewline "CPU + GPU [Nvidia] (14.4GB): " -ForegroundColor Red
    Write-Host "3" -ForegroundColor Green
    
    $num = Read-Host "Enter a number"
    if ($num -eq "1") {
        Get-VicunaCPU
    } elseif ($num -eq "2") {
        Get-VicunaGPU
    } elseif ($num -eq "3") {
        Get-VicunaCPU
        Get-VicunaGPU
    }
}

# 5. Replace commands in the start-webui.bat file
# Create CPU and GPU versions
Copy-Item "start-webui.bat" "start-webui-vicuna.bat"

if (-not ($gpu -eq "No")) {
    (Get-Content -Path "start-webui-vicuna.bat") | ForEach-Object {
        $_ -replace
            'call python server\.py --auto-devices --cai-chat',
            'call python server.py --auto-devices --chat --model anon8231489123_vicuna-13b-GPTQ-4bit-128g --wbits 4 --groupsize 128'
    } | Set-Content -Path "start-webui-vicuna-gpu.bat"
}

(Get-Content -Path "start-webui-vicuna.bat") | ForEach-Object {
    $_ -replace
        'call python server\.py --auto-devices --cai-chat',
        'call python server.py --auto-devices --chat --model eachadea_ggml-vicuna-13b-4bit'
} | Set-Content -Path "start-webui-vicuna.bat"

# 6. Create desktop shortcuts
if ($gpu -eq "No") {
    Write-Host "`n`nCreate desktop shortcuts for 'Vicuna (CPU)'" -ForegroundColor Cyan
} else {
    Write-Host "`n`nCreate desktop shortcuts for 'Vicuna' and 'Vicuna (CPU)'" -ForegroundColor Cyan
}
$shortcuts = Read-Host "Do you want desktop shortcuts? (Y/N)"

if ($shortcuts -eq "Y" -or $shortcuts -eq "y") {
    iex (irm Import-RemoteFunction.tc.ht) # Get RemoteFunction importer
    Import-RemoteFunction -ScriptUri "https://New-Shortcut.tc.ht" # Import function to create a shortcut
    
    Write-Host "Downloading Vicuna icon..."
    Invoke-WebRequest -Uri 'https://tc.ht/PowerShell/AI/vicuna.ico' -OutFile 'vicuna.ico'
    if (-not ($gpu -eq "No")) {
        Write-Host "`nCreating shortcuts on desktop..." -ForegroundColor Cyan
        $shortcutName = "Vicuna oobabooga"
        $targetPath = "start-webui-vicuna-gpu.bat"
        $IconLocation = 'vicuna.ico'
        New-Shortcut -ShortcutName $shortcutName -TargetPath $targetPath -IconLocation $IconLocation
    }

    $shortcutName = "Vicuna (CPU) oobabooga"
    $targetPath = "start-webui-vicuna.bat"
    $IconLocation = 'vicuna.ico'
    New-Shortcut -ShortcutName $shortcutName -TargetPath $targetPath -IconLocation $IconLocation
    
}

# 7. Run the webui
if ($gpu -eq "No") {
    Start-Process ".\start-webui-vicuna.bat"
} else {
    # Ask user if they want to launch the CPU or GPU version
    Write-Host "`n`nEnter 1 to launch CPU version, or 2 to launch GPU version" -ForegroundColor Cyan
    
    $choice = Read-Host "1 (CPU) or 2 (GPU)"
    
    if ($choice -eq "1") {
        Start-Process ".\start-webui-vicuna.bat"
    }
    elseif ($choice -eq "2") {
        Start-Process ".\start-webui-vicuna-gpu.bat"
    }
    else {
        Write-Host "Invalid choice. Please enter 1 or 2."
    }
}


================================================
FILE: demos/chat_with_keyboard.py
================================================
import openai
import os
from dotenv import load_dotenv
load_dotenv()
"""
have chats with CHATGPT straight from your command line (useful for code)
"""


CHAT =  [{"role": "system", "content": "You are a helpful assistant. You can make question to make the conversation entertaining."}]

# Set your openai api key
openai.api_key = os.getenv('OPENAI_API_KEY')

while True:
    try:
        question = input('[user]: ')
        CHAT.append({"role":"user","content":f"{str(question)}"})
        API_response = openai.ChatCompletion.create(
                    model='gpt-3.5-turbo',
                    messages=CHAT)
                
        answer = API_response['choices'][0]['message']['content']
        print(f"[assistant]: {answer}")

        CHAT.append({"role":"assistant","content":f"{answer}"})
    except Exception as e:
        print(e)
        quit()

================================================
FILE: demos/demo_da_vinci.py
================================================
import openai

## SET YOUR API KEY:
openai.apikey('')

def generate_single_response(prompt, model_engine="text-davinci-003", temp=0.5):
    openai.api_key = 'your openai api key'  
    prompt = (f"{prompt}")
    completions = openai.Completion.create(
        engine=model_engine,
        prompt=prompt,
        max_tokens=1024,
        temperature=temp
    )
    return completions.choices[0].text

def get_answer(message):
    CHAT = [{"role": "system", "content": """You are a prompt manager. 
    You have access to a suite of actions and decide if a prompt require one of them to be executed. Your actions:
    1 - find a file: file can store past conversations, textual information;
    2 - respond: the user wants to have a general answer to a topic;
    3 - adjust settings;
    4 - find a directory inside the Computer;
    5 - save this conversation for the future
    6 - check on internet
    7 - None of the above
    
    You can answer only with numbers. A number must always be present in your answer"""},{"role":"user", "content":"PROMPT: "+message}]

    response = openai.ChatCompletion.create(
                model="gpt-3.5-turbo",
                temperature=0,
                max_tokens=1,
                messages=CHAT)
    
    return response


if __name__ == "__main__":
    
    while True:
        message = input("[user]: ")
        answer = get_answer(message=message)
        print(answer['choices'][0]['message']['content'])

================================================
FILE: demos/demo_elevenlabs.py
================================================
import elevenlabslib
from dotenv import load_dotenv
import os
from pydub import AudioSegment
import io
import pygame

load_dotenv()

api_key = os.getenv('ELEVENLABS_API_KEY')

user = elevenlabslib.ElevenLabsUser(api_key)

text = "This is the documentation for the ElevenLabs API. You can use this API to use our service programmatically, this is done by using your xi-api-key"
write_dir = os.path.join('Assistant', 'answers')

elevenlabs_voice = user.get_voices_by_name('Antoni')[0]
audio = elevenlabs_voice.generate_audio_bytes(text)

audio = AudioSegment.from_file(io.BytesIO(audio), format="mp3")
audio.export(os.path.join(write_dir, "speech.wav"), format="wav")
if pygame.mixer.get_init() is None: pygame.mixer.init()
try:
    pygame.mixer.music.load(os.path.join(write_dir, "speech.wav"))
except Exception as e:
    print(e)

pygame.mixer.music.set_volume(0.5)
pygame.mixer.music.play()
while(pygame.mixer.music.get_busy()):pass

================================================
FILE: demos/demo_google_search.py
================================================
from langchain.utilities import GoogleSerperAPIWrapper
from dotenv import load_dotenv
load_dotenv()
from langchain.memory import ConversationBufferMemory
from langchain.llms.openai import OpenAI
from langchain import OpenAI, LLMChain 
from langchain.agents import initialize_agent, Tool, load_tools
from langchain.agents import AgentType, ZeroShotAgent, AgentExecutor

import geocoder
from newsapi import NewsApiClient
import os


def generateGoogleAgent():
    names = ['wikipedia', 'requests', 'open-meteo-api']
    
    if len(os.getenv('SERPER_API_KEY'))>1:
        print('(using google-seper)')
        names.append('google-serper')
    elif len(os.getenv('GOOGLE_API_KEY'))>1:
        print('(using google-serch)')
        names.append('google-search')
    
    tools = load_tools(names)
    custom_tools = [
        Tool(
            name ='Locate me',
            func = locate_me, 
            description='useful to know the current geographical location'),
        Tool(
            name='News',
            func=news,
            description='Use this when you want to get information about the top headlines of current news stories. The input should be a keyword describing the topic')
        ]
    
    for item in custom_tools: tools.append(item)
        
    prefix = """Answer the question. You have also access to the following tools:"""
    suffix = """Begin!"

    {chat_history}
    Question: {input}
    {agent_scratchpad}"""

    prompt = ZeroShotAgent.create_prompt(
        tools, 
        prefix=prefix, 
        suffix=suffix, 
        input_variables=["input", "chat_history", "agent_scratchpad"]
    )
    llm_chain = LLMChain(llm=OpenAI(temperature=0), prompt=prompt)
    agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)

    memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True, human_prefix='user', ai_prefix='assistant')
    
    return AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True, memory=memory, early_stopping_method = 'generate', max_iterations=2)


def news(keyword):
    newsapi = NewsApiClient(api_key=os.getenv('NEWS_API_KEY'))
    top_headlines = newsapi.get_top_headlines(q=keyword)
    if len(top_headlines['articles'])==0:
        top_headlines = newsapi.get_everything(q=keyword)
        top_headlines['articles'] = top_headlines['articles'][0:min(len(top_headlines['articles']),10)]

    res = ''

    for article in top_headlines['articles']:
        res += '\n'+article['title']+'\nurl: '+article['url']+'\n'

    print(res)
    return res

def locate_me(p):
    g = geocoder.ip('me')
    loc = str(g.city)+', '+ str(g.state)+', '+str(g.country) 
    print(loc)
    return loc

llm = OpenAI(temperature=0)

names = ["requests"]


tools = load_tools(names)
tools.append(Tool(
        name='Locate Me',
        func=locate_me,
        description='Helpful when you need to access the current geographical postition',
        ))

tools.append(Tool(
    name='News',
    func=news,
    description='Use this when you want to get information about the top headlines of current news stories. The input should be a keyword describing the topic'
))


self_ask_with_search = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
self_ask_with_search = generateGoogleAgent()
print(self_ask_with_search.run("What are the latest news about AI?"))

================================================
FILE: demos/demo_local_search_engine.py
================================================
import openai
import os
import whisper
import pyaudio
import sys
import pygame
from datetime import datetime
import copy
import time
import pyttsx3
import pandas as pd
import ast

from ibm_watson import TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

import backup.get_audio as myaudio
from Assistant.VirtualAssistant import VirtualAssistant

openai.api_key = 'your-openai-api-key'

text2 ="""user: What are transformers architecture and how can they be used in segmentation?
assistant: The transformer architecture is a deep learning model that has been widely used in natural language processing applications due to its impressive performance in language translation and text generation tasks. The model is based on an attention mechanism that allows it to selectively focus on different parts of the input sequence, enabling it to capture long-range dependencies between words.
In recent years, researchers have started exploring the applicability of transformer models in computer vision tasks such as image segmentation. One of the advantages of using transformers for segmentation is their ability to process images as sequences of tokens. This means that the model can take into account the spatial relationship between pixels in an image, allowing it to produce more contextually aware segmentations.
There are different ways to incorporate transformers into segmentation models. One approach is to use a pre-trained transformer model, such as the Vision Transformer (ViT), to encode the input image into a sequence of feature vectors. These feature vectors are then fed into a segmentation head that produces pixel-wise predictions for each class.
Another approach is to use transformers in a fully self-supervised setting, where the model is trained to predict the color or texture of a patch given its surrounding context. This pre-training step allows the model to learn powerful image representations that can be used for downstream tasks like segmentation.
In summary, transformers have shown great potential in segmentation tasks by allowing models to capture complex spatial relationships in image data. However, like any deep learning model, they require significant computational resources and training data to achieve state-of-the-art performance.
"""

text3 = """system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words. user: What can you tell me about electric propulsion in rockets?
        assistant: Electric propulsion is a type of rocket propulsion that uses electrical energy to accelerate propellant and generate thrust. It is often used in spacecraft propulsion systems for missions that require precise control and long-duration high-speed travel. 
There are different types of electric propulsion systems, but they all work on the same principle: accelerating ions or electrons using an electric field. This is in contrast to traditional chemical rockets, which generate thrust by combusting fuel.
One type of electric propulsion is called electrically-powered ion propulsion, which uses ions that are generated by an electrically charged grid to propel the spacecraft. This system uses a low amount of fuel but produces a weak thrust, making it ideal for long-duration missions where high velocities are required.
"""


from Assistant.tools import LocalSearchEngine
from Assistant.tools import Translator

search = LocalSearchEngine()
tr = Translator()

from tests import load_keys
load_keys()

search.produce_folder_tags()


================================================
FILE: demos/demo_pyaudio.py
================================================
import Assistant.get_audio as myaudio

print('get_devices() print all devices:')
print(myaudio.get_devices())

print('\n\n')

print('detect_microphones() print mics only:')
print(myaudio.detect_microphones())

print('\n')
print('get_device_channels() return channel count for each device\n', myaudio.get_device_channels())

# select the first available microphone
M = myaudio.get_devices()[0]['name']
print(M)

================================================
FILE: demos/demo_research_mode.py
================================================
print('### LOADING CREDENTIALS ###')
from dotenv import load_dotenv
import os

from Assistant.research_mode import ResearchAssistant
from Assistant.semantic_scholar.agent_tools import readPDF, update_workspace_dataframe

load_dotenv()

if len(os.environ['OPENAI_API_KEY'])==0: 
    print('openai API key not detected in .env')
    raise Exception("[$] openai API key is required. Learn more at https://platform.openai.com/account/api-keys")

if len(os.environ['IBM_API_KEY'])==0: print('[free] IBM cloud API Key not detected in .env\nLearn more at: https://cloud.ibm.com/catalog/services/text-to-speech')

if len(os.environ['IBM_TTS_SERVICE'])==0: print('[free] IBM cloud TTS service not detected in .env\nLearn more at: https://cloud.ibm.com/catalog/services/text-to-speech')

use_porcupine = True
if len(os.environ['PORCUPINE_KEY']) == 0: 
    print('[free] PicoVoice not detected in .env\nLearn more at: https://picovoice.ai/platform/porcupine/')
    use_porcupine = False


print('DONE\n')

print('### IMPORTING DEPENDANCIES ###')
import pygame

from Assistant import get_audio as myaudio
from Assistant.VirtualAssistant import VirtualAssistant
from Assistant.tools import count_tokens
import pinecone
print('DONE\n')

### MAIN
if __name__=="__main__":
    print("### SETTING UP ENVIROMENT ###")
    OFFLINE = False
    pygame.mixer.init()
    # INITIATE JARVIS
    print('initiating JARVIS voice...')
    jarvis = VirtualAssistant(
        openai_api   = os.getenv('OPENAI_API_KEY'),
        ibm_api      = os.getenv('IBM_API_KEY'),
        ibm_url      = os.getenv('IBM_TTS_SERVICE'),
        elevenlabs_api = os.getenv('ELEVENLABS_API_KEY'),
        elevenlabs_voice = 'Antoni',
        voice_id     = {'en':'jarvis_en'},
        whisper_size = 'medium',
        awake_with_keywords=["jarvis"],
        model= "gpt-3.5-turbo",
        embed_model= "text-embedding-ada-002",
        RESPONSE_TIME = 3,
        SLEEP_DELAY = 30,
        mode = 'RESEARCH',
        )

    jarvis.init_research_mode()
    i = 0
    while True:  
        prompt = input("user: ")
        # check exit command
        if "THANKS" in prompt.upper() or len(prompt.split())<=1:
            jarvis.go_to_sleep()
            continue
        
        jarvis.expand_conversation(role="user", content=prompt)

        # PROMPT MANAGING [BETA]
        flag = jarvis.analyze_prompt(prompt)

        print(flag)
        # redirect the conversation to an action manager or to the LLM
        if "1" in flag or "tool" in flag:
            print('(thought): action')
            response = jarvis.use_tools(prompt)
            response = response
        
        elif "2" in flag or "respond" in flag:
            print('(thought): response')
            response = jarvis.get_answer(prompt)
        else:
            print(f'(thought): {flag}: workspace')
            input('> continue?')
            response = jarvis.secondary_agent(prompt)
            

        jarvis.expand_conversation(role='assistant', content=response)
        pygame.mixer.stop()
        jarvis.say(response, VoiceIdx='en', elevenlabs=True)

        i+=1

================================================
FILE: demos/demo_tts.py
================================================
from TTS.api import TTS
import os


"""
DESCRIPTION: Clone some voices in Real Time with TTS! run the scripts and listen to output.wav
"""

TEXT = """ OpenAI provides a toolkit called "OpenAI API" that allows you to compute multi-head self-attention for a given text. The OpenAy I A Pi I provides access to a range of pre-trained language models, such as Gee Pee Tee 2 and Gee Pee Tee 3, which are capable of performing multi-head self-attention on text data.
To use the OpenAI API, you first need to sign up for an API key and then import the A P I client into your Python environment. Once you have set up the A P I client, you can use it to send requests to the OpenAI API server, which will process your text data and return a response containing the multi-head self-attention weights.
The OpenAI API offers a simple and straightforward way to access powerful language models and compute complex natural language processing tasks, such as multi-head self-attention, without the need for extensive computational resources or expertise in machine learning.
"Exploring Multi-Head Self-Attention for Keyword Identification in Natural Language Processing"

"""

#   Running a multi-speaker and multi-lingual model

#   List available 🐸TTS models and choose the first one
model_name = TTS.list_models()[0]
#   Init TTS
tts = TTS(model_name)
#   Run TTS
#    ❗ Since this model is multi-speaker and multi-lingual, we must set the target speaker and the language
#   Text to speech with a numpy output

wav = tts.tts("This is a test! This is also a test!!", speaker=tts.speakers[0], language=tts.languages[0])
#    Text to speech to a file
tts.tts_to_file(text="Hello world!", speaker=tts.speakers[0], language=tts.languages[0], file_path="output.wav")

#   Running a single speaker model

#   Init TTS with the target model name
tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False, gpu=False)

#   Run TTS
tts.tts_to_file(text="Ich bin eine Testnachricht.", file_path='.')

#   Example voice cloning with YourTTS in English, French and Portuguese:
tts = TTS(model_name="tts_models/multilingual/multi-dataset/your_tts", progress_bar=False, gpu=True)
tts.tts_to_file(text=TEXT, speaker_wav=os.path.join(os.getcwd(),'voices','JARVIS','PaulBettanyLongMP3.mp3'), language="en", file_path="output.wav")


================================================
FILE: env.txt
================================================
OPENAI_API_KEY = ''
IBM_API_KEY = ''
IBM_TTS_SERVICE = ''
PORCUPINE_KEY = ''
ELEVENLABS_API_KEY = ''
HUGGINGFACEHUB_API_TOKEN = ''
SERPER_API_KEY = ''
NEWS_API_KEY =''
GOOGLE_API_KEY = ''
GOOGLE_CSE_ID = ''
OPENWHEATHERMAP_API_KEY = ''
S2_API_KEY = ''


================================================
FILE: openai_api_chatbot.py
================================================
print('### LOADING CREDENTIALS ###')
from dotenv import load_dotenv
import os

from Assistant.research_mode import ResearchAssistant

load_dotenv()

if len(os.environ['OPENAI_API_KEY'])==0: 
    print('openai API key not detected in .env')
    raise Exception("[$] openai API key is required. Learn more at https://platform.openai.com/account/api-keys")

if len(os.environ['IBM_API_KEY'])==0: print('[free] IBM cloud API Key not detected in .env\nLearn more at: https://cloud.ibm.com/catalog/services/text-to-speech')

if len(os.environ['IBM_TTS_SERVICE'])==0: print('[free] IBM cloud TTS service not detected in .env\nLearn more at: https://cloud.ibm.com/catalog/services/text-to-speech')

use_porcupine = True
if len(os.environ['PORCUPINE_KEY']) == 0: 
    print('[free] PicoVoice not detected in .env\nLearn more at: https://picovoice.ai/platform/porcupine/')
    use_porcupine = False


print('DONE\n')

print('### IMPORTING DEPENDANCIES ###')
import pygame

from Assistant import get_audio as myaudio
from Assistant.VirtualAssistant import VirtualAssistant
from Assistant.tools import count_tokens

print('DONE\n')

### MAIN
if __name__=="__main__":
    print("### SETTING UP ENVIROMENT ###")
    OFFLINE = False
    pygame.mixer.init()

    # INITIATE JARVIS
    print('initiating JARVIS voice...')
    jarvis = VirtualAssistant(
        openai_api   = os.getenv('OPENAI_API_KEY'),
        ibm_api      = os.getenv('IBM_API_KEY'),
        ibm_url      = os.getenv('IBM_TTS_SERVICE'),
        elevenlabs_api = os.getenv('ELEVENLABS_API_KEY'),
        elevenlabs_voice = 'Antoni',
        voice_id     = {'en':'jarvis_en'},
        whisper_size = 'medium',
        awake_with_keywords=["jarvis"],
        model= "gpt-3.5-turbo",
        embed_model= "text-embedding-ada-002",
        RESPONSE_TIME = 3,
        SLEEP_DELAY = 30,
        mode = 'CHAT'
        )

    while True:
        if not(jarvis.is_awake):
            print('\n awaiting for triggering words...')

            #block until the wakeword is heard, using porcupine
            if use_porcupine:
                jarvis.block_until_wakeword()
            else:
                while not(jarvis.is_awake):
                    jarvis.listen_passively()
        
        jarvis.record_to_file('output.wav')
        

        if jarvis.is_awake:
            prompt, detected_language = myaudio.whisper_wav_to_text('output.wav', jarvis.interpreter, prior=jarvis.languages.keys())

            # check exit command
            if "THANKS" in prompt.upper() or len(prompt.split())<=1:
                jarvis.go_to_sleep()
                continue
            
            if detected_language=='en':
                VoiceIdx = 'jarvis'
            else:
                VoiceIdx = detected_language
            
            jarvis.expand_conversation(role="user", content=prompt)

            # PROMPT MANAGING [BETA]
            #flag = jarvis.analyze_prompt(prompt)
            flag = '-1'

            # redirect the conversation to an action manager or to the LLM
            if (("1" in flag or "tool" in flag) and '-' not in flag):
                print('(though): action')
                response = jarvis.use_tools(prompt)
                response = response
            
            elif "2" in flag or "respond" in flag:
                print('(though): response')
                response = jarvis.get_answer(prompt)
            elif "-1" in flag:
                response = jarvis.switch_mode()
            else:
                print('(though): internet')
                response = jarvis.secondary_agent(prompt)

            jarvis.expand_conversation(role='assistant', content=response)
            pygame.mixer.stop()
            jarvis.say(response, VoiceIdx=VoiceIdx, IBM=True)

            print('\n')

================================================
FILE: run.bat
================================================
call .\jarvis_venv\Scripts\activate
python openai_api_chatbot.py
pause


================================================
FILE: saved_chats/2023-03-13_ExploringtheConceptofDeepLearningandItsApplicationsintheAviationIndustry.txt
================================================
system: You are a helpful assistant.
user: What is deep learning?
assistant: 

Deep learning is a subfield of machine learning that involves training artificial neural networks with a large amount of data to solve complex problems. These neural networks are composed of layers of interconnected nodes that process information and learn to make predictions or decisions by adjusting the weights of the connections between the nodes. Deep learning has achieved notable success in areas such as image recognition, natural language processing, and speech recognition, and has been used to create applications such as self-driving cars, intelligent personal assistants, and facial recognition systems.
user: , can i use deep learning to fly a plane?
assistant: Deep learning has been used in the aviation industry for various purposes, such as predicting the weather and optimizing fuel consumption. However, it is not currently feasible to use deep learning to fly a plane autonomously without human intervention. The aviation industry heavily relies on safety protocols and regulations, and the risk of using AI (including deep learning) as the sole decision-maker for a complex task such as flying a plane is likely to be perceived as too great at this point.


================================================
FILE: saved_chats/2023-03-25_ExploringtheAmbitiousWorldofStarCitizenIsitRightforCasualGamers.txt
================================================
system: You are a helpful assistant.
user: What can you tell me about the video game Star Citizen?
assistant: Star Citizen is an ambitious multiplayer space trading and combat game developed by Cloud Imperium Games. It was announced in 2011 and has been in development ever since. The game is built using Amazon Lumberyard engine and features an expansive, persistent universe where players can engage in various activities like exploration, mining, trade, combat, and more.

One of the game's unique features is its ability to seamlessly switch between first-person and third-person views, as well as the ability to explore and interact with planets, ships, and stations.

The game has generated a lot of excitement among fans of space sim games, and it has raised over $350 million through crowdfunding, making it one of the most successful crowdfunding campaigns for a video game. However, the game has also faced several delays and controversies, with some critics questioning its development timeline and management. Despite this, the game continues to have a passionate following and is still in development.
user: Would you suggest it to a person who is loving about open-world games but doesn't want to spend a lot of time in a game?
assistant: As of now, Star Citizen is a very ambitious game that is still in development and is not yet considered a fully released game. While it promises to have an open-world gameplay experience, it may not be ideal for a person who isn't interested in investing a lot of time in the game due to its complex gameplay mechanics and features. 

Additionally, since the game is still in development, it may not be fully optimized and may have bugs or glitches that could make it frustrating to play. 

If the person is not particularly interested in space sim games or doesn't want to spend a lot of time in a game, there are several other open-world games that they might enjoy such as The Elder Scrolls V: Skyrim, Grand Theft Auto V, Red Dead Redemption 2, or Fallout 4 which offer a more traditional open-world gameplay experience.


================================================
FILE: saved_chats/2023-03-26_AnIntroductiontoArtificialIntelligence.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: , what can you tell me about artificial intelligence?
assistant: Artificial intelligence (AI) is a branch of computer science that aims to create intelligent machines. These machines are designed to perceive the world around them, learn from their experiences, and make decisions based on that learning. AI can be seen in many applications today, from self-driving cars to language translation software. Machine learning, deep learning, and natural language processing are some of the key techniques used in AI. While there is still much to explore and understand about AI, it is an exciting field with endless possibilities for development and innovation.


================================================
FILE: saved_chats/2023-03-26_ArtificialIntelligence.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: , what can you tell me about artificial intelligence?
assistant: Artificial intelligence (AI) is a branch of computer science that aims to create intelligent machines. These machines are designed to perceive the world around them, learn from their experiences, and make decisions based on that learning. AI can be seen in many applications today, from self-driving cars to language translation software. Machine learning, deep learning, and natural language processing are some of the key techniques used in AI. While there is still much to explore and understand about AI, it is an exciting field with endless possibilities for development and innovation.


================================================
FILE: saved_chats/2023-03-26_ExploringthePossibilityofHumanExtinction.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: What are the chances of humankind to be dead?
assistant: It is difficult to predict the exact chances of humankind becoming extinct. However, there are several factors that could potentially threaten the survival of our species. These include natural disasters such as severe weather events, pandemics, and the impact of climate change. Additionally, human activities such as nuclear war, environmental degradation, and the development of dangerous technologies could also contribute to the risk of extinction. While the likelihood of any one of these scenarios occurring is uncertain, it is important to take steps to mitigate these risks and work towards a sustainable future for humanity.
user: Thank you for watching!
assistant: You're welcome! If you have any more questions, feel free to ask.


================================================
FILE: saved_chats/2023-03-26_ExploringtheUseofTransformersinImageSegmentation.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: What are transformers architecture and how can they be used in segmentation?
assistant: The transformer architecture is a deep learning model that has been widely used in natural language processing applications due to its impressive performance in language translation and text generation tasks. The model is based on an attention mechanism that allows it to selectively focus on different parts of the input sequence, enabling it to capture long-range dependencies between words.

In recent years, researchers have started exploring the applicability of transformer models in computer vision tasks such as image segmentation. One of the advantages of using transformers for segmentation is their ability to process images as sequences of tokens. This means that the model can take into account the spatial relationship between pixels in an image, allowing it to produce more contextually aware segmentations.

There are different ways to incorporate transformers into segmentation models. One approach is to use a pre-trained transformer model, such as the Vision Transformer (ViT), to encode the input image into a sequence of feature vectors. These feature vectors are then fed into a segmentation head that produces pixel-wise predictions for each class.

Another approach is to use transformers in a fully self-supervised setting, where the model is trained to predict the color or texture of a patch given its surrounding context. This pre-training step allows the model to learn powerful image representations that can be used for downstream tasks like segmentation.

In summary, transformers have shown great potential in segmentation tasks by allowing models to capture complex spatial relationships in image data. However, like any deep learning model, they require significant computational resources and training data to achieve state-of-the-art performance.


================================================
FILE: saved_chats/2023-03-26_FlutterDevelopment.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: Quali sono i principali metodi per imparare a programmare in flapper? Quali sono le cose pi difficili e le cose pi facili?
assistant: Per imparare a programmare in Flutter, ci sono diversi metodi tra cui:

1. Leggere la documentazione ufficiale di Flutter per avere una conoscenza generale del framework e delle sue funzionalit.

2. Seguire tutorial su YouTube, Medium e siti web specializzati come Flutter.dev per imparare passo dopo passo.

3. Partecipare alla community di Flutter su Slack e Reddit per porre domande e ricevere aiuto dagli sviluppatori pi esperti.

Le cose pi semplici su Flutter sono la creazione di un'interfaccia utente accattivante e la possibilit di testare l'app in tempo reale su pi piattaforme.

D'altra parte, le cose pi difficili possono essere la comprensione della sintassi Dart, la gestione dello stato dell'applicazione e la scelta di un'architettura adeguata per l'app. Tuttavia, con la pratica e l'aiuto delle risorse della comunit, queste sfide possono essere superate.
user: Che cos' lo stato dell'applicazione?
assistant: Lo stato dell'applicazione si riferisce alle informazioni o ai dati che sono attualmente memorizzati nella memoria dell'applicazione in un determinato momento. In Flutter, lo stato dell'applicazione pu riferirsi a qualsiasi variabile o dati all'interno dell'applicazione che possono cambiare durante il ciclo di vita dell'app. 

Ad esempio, lo stato dell'applicazione potrebbe includere la posizione dell'utente, il testo inserito in un campo di input, l'opzione selezionata in un menu a discesa o qualsiasi altra informazione che  necessario salvare per il funzionamento dell'app. 

La gestione dello stato nell'applicazione pu essere uno degli aspetti pi complessi di sviluppare in Flutter, poich richiede l'organizzazione delle informazioni in modo appropriato per garantire che l'app funzioni correttamente in tutte le situazioni. Flutter offre alcuni strumenti potenti per la gestione dello stato dell'applicazione, come il widget Stateful, il provider package e le librerie di gestione dello stato come Redux.


================================================
FILE: saved_chats/2023-03-26_FlutterDevelopmentLearningMethodsandManagingAppState.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: Quali sono i principali metodi per imparare a programmare in flapper? Quali sono le cose pi difficili e le cose pi facili?
assistant: Per imparare a programmare in Flutter, ci sono diversi metodi tra cui:

1. Leggere la documentazione ufficiale di Flutter per avere una conoscenza generale del framework e delle sue funzionalit.

2. Seguire tutorial su YouTube, Medium e siti web specializzati come Flutter.dev per imparare passo dopo passo.

3. Partecipare alla community di Flutter su Slack e Reddit per porre domande e ricevere aiuto dagli sviluppatori pi esperti.

Le cose pi semplici su Flutter sono la creazione di un'interfaccia utente accattivante e la possibilit di testare l'app in tempo reale su pi piattaforme.

D'altra parte, le cose pi difficili possono essere la comprensione della sintassi Dart, la gestione dello stato dell'applicazione e la scelta di un'architettura adeguata per l'app. Tuttavia, con la pratica e l'aiuto delle risorse della comunit, queste sfide possono essere superate.
user: Che cos' lo stato dell'applicazione?
assistant: Lo stato dell'applicazione si riferisce alle informazioni o ai dati che sono attualmente memorizzati nella memoria dell'applicazione in un determinato momento. In Flutter, lo stato dell'applicazione pu riferirsi a qualsiasi variabile o dati all'interno dell'applicazione che possono cambiare durante il ciclo di vita dell'app. 

Ad esempio, lo stato dell'applicazione potrebbe includere la posizione dell'utente, il testo inserito in un campo di input, l'opzione selezionata in un menu a discesa o qualsiasi altra informazione che  necessario salvare per il funzionamento dell'app. 

La gestione dello stato nell'applicazione pu essere uno degli aspetti pi complessi di sviluppare in Flutter, poich richiede l'organizzazione delle informazioni in modo appropriato per garantire che l'app funzioni correttamente in tutte le situazioni. Flutter offre alcuni strumenti potenti per la gestione dello stato dell'applicazione, come il widget Stateful, il provider package e le librerie di gestione dello stato come Redux.


================================================
FILE: saved_chats/2023-03-26_HumanExtinction.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: What are the chances of humankind to be dead?
assistant: It is difficult to predict the exact chances of humankind becoming extinct. However, there are several factors that could potentially threaten the survival of our species. These include natural disasters such as severe weather events, pandemics, and the impact of climate change. Additionally, human activities such as nuclear war, environmental degradation, and the development of dangerous technologies could also contribute to the risk of extinction. While the likelihood of any one of these scenarios occurring is uncertain, it is important to take steps to mitigate these risks and work towards a sustainable future for humanity.
user: Thank you for watching!
assistant: You're welcome! If you have any more questions, feel free to ask.


================================================
FILE: saved_chats/2023-03-26_NuclearandElectricPropulsioninSpaceExplorationAdvantagesandChallenges.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: Is there a way I can know how a rocket works?
assistant: Yes, there is a way to understand how a rocket works. A rocket works by utilizing the principle of action-reaction, also known as Newton's third law of motion. This means that for every action, there is an equal and opposite reaction. A rocket engine generates a massive amount of energy by burning fuel and oxidizer, which creates hot gases that are expelled out of the bottom of the rocket. This exhaust generates a force that propels the rocket in the opposite direction. The rocket's shape and aerodynamics are designed to reduce air resistance and increase the speed and altitude it can achieve. To better understand the specifics of how a rocket works, you can research propulsion systems, rocket design, and the history of space exploration.
user: What can you tell me about electric propulsion in rockets?
assistant: Electric propulsion is a type of rocket propulsion that uses electrical energy to accelerate propellant and generate thrust. It is often used in spacecraft propulsion systems for missions that require precise control and long-duration high-speed travel. 

There are different types of electric propulsion systems, but they all work on the same principle: accelerating ions or electrons using an electric field. This is in contrast to traditional chemical rockets, which generate thrust by combusting fuel.

One type of electric propulsion is called electrically-powered ion propulsion, which uses ions that are generated by an electrically charged grid to propel the spacecraft. This system uses a low amount of fuel but produces a weak thrust, making it ideal for long-duration missions where high velocities are required.

Another type of electric propulsion is the Hall thruster, which uses a magnetic field to trap high-energy electrons and ionize the propellant. This produces a larger thrust than ion propulsion, but it still uses less fuel than traditional chemical rockets.

Overall, electric propulsion is a promising technology for space exploration as it offers a more efficient and cost-effective way to travel in space.
user: without the margin of improvement of electric propulsion in rocket flight.
assistant: Even without considering future advancements in electric propulsion, there are several advantages and disadvantages to using electric propulsion in rocket flight.

One advantage of electric propulsion is that it requires much less fuel than traditional chemical rockets. This is because electric propulsion is more efficient at converting the stored energy into thrust. This means that spacecraft can carry smaller amounts of fuel, which reduces weight and increases payload capacity. Additionally, electric propulsion can provide a lower level of thrust for a much longer duration, which is ideal for missions that require gradual and precise maneuvering.

However, one disadvantage of electric propulsion is that it generates less thrust than chemical rockets. This means that spacecraft must carry out longer-duration burns to achieve the desired velocity, which can result in longer travel times. Additionally, electric propulsion systems are often more complex and less reliable than chemical systems, and they require very high levels of electrical power, which can be difficult to generate and manage in space.

Overall, while electric propulsion offers several advantages, it also has some limitations that need to be carefully considered when designing and planning space missions.
user: Do you think that nuclear energy could play a role in rocket propulsion?
assistant: Yes, nuclear energy could potentially play a role in rocket propulsion. Nuclear propulsion systems typically work by heating a propellant, such as hydrogen or water, using a nuclear reactor. The heat from the reaction is then used to accelerate the propellant and generate thrust. 

There are several potential advantages to using nuclear propulsion. One is that nuclear systems could provide much greater levels of thrust compared to traditional chemical rockets or even electric propulsion systems, which could enable faster travel to far-off destinations. Additionally, nuclear systems could use much less propellant, which would reduce the overall mass and cost of the spacecraft.

There are, however, also several challenges and concerns associated with nuclear propulsion. One major challenge is that it can be difficult to contain and safely handle nuclear fuel in space, where even small accidents or malfunctions could have catastrophic consequences. Additionally, there are concerns about the environmental impact of nuclear propulsion, both in terms of radiation exposure and in terms of potential contamination if there were an accident or problem during launch or operation.

Overall, while nuclear propulsion holds promise for the future of space exploration, it is a complex and potentially risky technology that requires careful consideration of the benefits and drawbacks.
user: Could you explain better how can I use a nuclear reactor to generate thrust?
assistant: Nuclear propulsion uses a nuclear reactor to heat a propellant and create thrust. There are two main types of nuclear propulsion systems that have been studied: nuclear thermal propulsion (NTP) and nuclear electric propulsion (NEP).

In a nuclear thermal propulsion system, nuclear fuel is used to heat hydrogen or another propellant. The hot gas is then expelled out of a nozzle to generate thrust. The nuclear fuel is typically enriched uranium or plutonium, and it is contained within a reactor that is designed to minimize the risk of a radiation leak. The heat from the reactor is transferred to the propellant via a heat exchanger, which can be made of a high-melting-point metal such as tungsten. 

In a nuclear electric propulsion system, a nuclear reactor produces electricity that is used to ionize propellant and create thrust. The reactor heats a working fluid, which then drives a turbine to produce electricity. That electricity is then used to ionize an inert gas, such as xenon, which is expelled from the spacecraft to generate thrust.

Nuclear propulsion systems can potentially offer much higher specific impulse (thrust per unit mass of propellant) compared to chemical or electric propulsion systems. This would enable faster and more efficient missions to destinations like Mars, and could reduce overall mission cost by requiring less propellant. However, these systems present significant technical challenges in terms of containment and radiation shielding, and the potential environmental impacts of a nuclear reaction malfunction or accident must be carefully assessed.


================================================
FILE: saved_chats/2023-03-26_RevolutionizingHealthcare.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: What is Artificial Intelligence?
assistant: Artificial Intelligence (AI) is a field of computer science that focuses on the development of intelligent machines that can perform tasks that typically require human-like cognitive abilities, such as learning, reasoning, perception, and decision-making. AI systems are designed to analyze and interpret complex data sets, recognize patterns and make predictions, and automate routine tasks. They can also interact with humans using natural language processing and speech recognition technologies. AI has the potential to revolutionize many industries and improve people's lives in numerous ways, from healthcare and finance to transportation and entertainment.
user: What can you tell me about artificial intelligence in the medical field?
assistant: Artificial Intelligence (AI) has the potential to revolutionize the medical field by improving patient outcomes, increasing efficiency, and reducing costs. AI can be used in various medical applications such as disease diagnosis, personalized treatment plans, and drug discovery. 

One of the most significant benefits of AI in the medical field is its ability to analyze and interpret large amounts of medical data quickly and accurately. Machine learning algorithms can be trained on this data to recognize patterns and identify potential health risks or diagnoses with high accuracy. This can help doctors make more informed decisions about treatment plans and improve patient outcomes.

AI can also aid in drug discovery by analyzing complex molecular interactions and predicting the efficacy and safety of new drugs. This reduces the time and cost associated with traditional drug development processes. Moreover, AI-powered tools can automate routine administrative tasks such as scheduling and billing, which can save time and resources.

However, there are still some challenges to the wide-scale adoption of AI in the medical field, including privacy concerns, technical limitations, and regulatory requirements. Nonetheless, AI has enormous potential to transform healthcare and improve patient outcomes.


================================================
FILE: saved_chats/2023-03-26_RevolutionizingHealthcarewithArtificialIntelligenceBenefitsandOpportunities.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: What is Artificial Intelligence?
assistant: Artificial Intelligence (AI) is a field of computer science that focuses on the development of intelligent machines that can perform tasks that typically require human-like cognitive abilities, such as learning, reasoning, perception, and decision-making. AI systems are designed to analyze and interpret complex data sets, recognize patterns and make predictions, and automate routine tasks. They can also interact with humans using natural language processing and speech recognition technologies. AI has the potential to revolutionize many industries and improve people's lives in numerous ways, from healthcare and finance to transportation and entertainment.
user: What can you tell me about artificial intelligence in the medical field?
assistant: Artificial Intelligence (AI) has the potential to revolutionize the medical field by improving patient outcomes, increasing efficiency, and reducing costs. AI can be used in various medical applications such as disease diagnosis, personalized treatment plans, and drug discovery. 

One of the most significant benefits of AI in the medical field is its ability to analyze and interpret large amounts of medical data quickly and accurately. Machine learning algorithms can be trained on this data to recognize patterns and identify potential health risks or diagnoses with high accuracy. This can help doctors make more informed decisions about treatment plans and improve patient outcomes.

AI can also aid in drug discovery by analyzing complex molecular interactions and predicting the efficacy and safety of new drugs. This reduces the time and cost associated with traditional drug development processes. Moreover, AI-powered tools can automate routine administrative tasks such as scheduling and billing, which can save time and resources.

However, there are still some challenges to the wide-scale adoption of AI in the medical field, including privacy concerns, technical limitations, and regulatory requirements. Nonetheless, AI has enormous potential to transform healthcare and improve patient outcomes.


================================================
FILE: saved_chats/2023-03-26_SpacePropulsion.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: Is there a way I can know how a rocket works?
assistant: Yes, there is a way to understand how a rocket works. A rocket works by utilizing the principle of action-reaction, also known as Newton's third law of motion. This means that for every action, there is an equal and opposite reaction. A rocket engine generates a massive amount of energy by burning fuel and oxidizer, which creates hot gases that are expelled out of the bottom of the rocket. This exhaust generates a force that propels the rocket in the opposite direction. The rocket's shape and aerodynamics are designed to reduce air resistance and increase the speed and altitude it can achieve. To better understand the specifics of how a rocket works, you can research propulsion systems, rocket design, and the history of space exploration.
user: What can you tell me about electric propulsion in rockets?
assistant: Electric propulsion is a type of rocket propulsion that uses electrical energy to accelerate propellant and generate thrust. It is often used in spacecraft propulsion systems for missions that require precise control and long-duration high-speed travel. 

There are different types of electric propulsion systems, but they all work on the same principle: accelerating ions or electrons using an electric field. This is in contrast to traditional chemical rockets, which generate thrust by combusting fuel.

One type of electric propulsion is called electrically-powered ion propulsion, which uses ions that are generated by an electrically charged grid to propel the spacecraft. This system uses a low amount of fuel but produces a weak thrust, making it ideal for long-duration missions where high velocities are required.

Another type of electric propulsion is the Hall thruster, which uses a magnetic field to trap high-energy electrons and ionize the propellant. This produces a larger thrust than ion propulsion, but it still uses less fuel than traditional chemical rockets.

Overall, electric propulsion is a promising technology for space exploration as it offers a more efficient and cost-effective way to travel in space.
user: without the margin of improvement of electric propulsion in rocket flight.
assistant: Even without considering future advancements in electric propulsion, there are several advantages and disadvantages to using electric propulsion in rocket flight.

One advantage of electric propulsion is that it requires much less fuel than traditional chemical rockets. This is because electric propulsion is more efficient at converting the stored energy into thrust. This means that spacecraft can carry smaller amounts of fuel, which reduces weight and increases payload capacity. Additionally, electric propulsion can provide a lower level of thrust for a much longer duration, which is ideal for missions that require gradual and precise maneuvering.

However, one disadvantage of electric propulsion is that it generates less thrust than chemical rockets. This means that spacecraft must carry out longer-duration burns to achieve the desired velocity, which can result in longer travel times. Additionally, electric propulsion systems are often more complex and less reliable than chemical systems, and they require very high levels of electrical power, which can be difficult to generate and manage in space.

Overall, while electric propulsion offers several advantages, it also has some limitations that need to be carefully considered when designing and planning space missions.
user: Do you think that nuclear energy could play a role in rocket propulsion?
assistant: Yes, nuclear energy could potentially play a role in rocket propulsion. Nuclear propulsion systems typically work by heating a propellant, such as hydrogen or water, using a nuclear reactor. The heat from the reaction is then used to accelerate the propellant and generate thrust. 

There are several potential advantages to using nuclear propulsion. One is that nuclear systems could provide much greater levels of thrust compared to traditional chemical rockets or even electric propulsion systems, which could enable faster travel to far-off destinations. Additionally, nuclear systems could use much less propellant, which would reduce the overall mass and cost of the spacecraft.

There are, however, also several challenges and concerns associated with nuclear propulsion. One major challenge is that it can be difficult to contain and safely handle nuclear fuel in space, where even small accidents or malfunctions could have catastrophic consequences. Additionally, there are concerns about the environmental impact of nuclear propulsion, both in terms of radiation exposure and in terms of potential contamination if there were an accident or problem during launch or operation.

Overall, while nuclear propulsion holds promise for the future of space exploration, it is a complex and potentially risky technology that requires careful consideration of the benefits and drawbacks.
user: Could you explain better how can I use a nuclear reactor to generate thrust?
assistant: Nuclear propulsion uses a nuclear reactor to heat a propellant and create thrust. There are two main types of nuclear propulsion systems that have been studied: nuclear thermal propulsion (NTP) and nuclear electric propulsion (NEP).

In a nuclear thermal propulsion system, nuclear fuel is used to heat hydrogen or another propellant. The hot gas is then expelled out of a nozzle to generate thrust. The nuclear fuel is typically enriched uranium or plutonium, and it is contained within a reactor that is designed to minimize the risk of a radiation leak. The heat from the reactor is transferred to the propellant via a heat exchanger, which can be made of a high-melting-point metal such as tungsten. 

In a nuclear electric propulsion system, a nuclear reactor produces electricity that is used to ionize propellant and create thrust. The reactor heats a working fluid, which then drives a turbine to produce electricity. That electricity is then used to ionize an inert gas, such as xenon, which is expelled from the spacecraft to generate thrust.

Nuclear propulsion systems can potentially offer much higher specific impulse (thrust per unit mass of propellant) compared to chemical or electric propulsion systems. This would enable faster and more efficient missions to destinations like Mars, and could reduce overall mission cost by requiring less propellant. However, these systems present significant technical challenges in terms of containment and radiation shielding, and the potential environmental impacts of a nuclear reaction malfunction or accident must be carefully assessed.


================================================
FILE: saved_chats/2023-03-26_TagImageSegmentation.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: What are transformers architecture and how can they be used in segmentation?
assistant: The transformer architecture is a deep learning model that has been widely used in natural language processing applications due to its impressive performance in language translation and text generation tasks. The model is based on an attention mechanism that allows it to selectively focus on different parts of the input sequence, enabling it to capture long-range dependencies between words.

In recent years, researchers have started exploring the applicability of transformer models in computer vision tasks such as image segmentation. One of the advantages of using transformers for segmentation is their ability to process images as sequences of tokens. This means that the model can take into account the spatial relationship between pixels in an image, allowing it to produce more contextually aware segmentations.

There are different ways to incorporate transformers into segmentation models. One approach is to use a pre-trained transformer model, such as the Vision Transformer (ViT), to encode the input image into a sequence of feature vectors. These feature vectors are then fed into a segmentation head that produces pixel-wise predictions for each class.

Another approach is to use transformers in a fully self-supervised setting, where the model is trained to predict the color or texture of a patch given its surrounding context. This pre-training step allows the model to learn powerful image representations that can be used for downstream tasks like segmentation.

In summary, transformers have shown great potential in segmentation tasks by allowing models to capture complex spatial relationships in image data. However, like any deep learning model, they require significant computational resources and training data to achieve state-of-the-art performance.


================================================
FILE: saved_chats/2023-03-26_TipsforRunningScriptsatSystemStartuponWindowsandLinux.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: How can I run some command line scripts from System Startup?
assistant: To run command line scripts from System Startup, you can add the desired commands or scripts to the system's startup files or create a new script to run them.

For Windows systems, you can add the commands to the "Startup" folder or use the Registry Editor to create entries in the "Run" key. You can also use the Task Scheduler to schedule the scripts to run when the system starts up.

For Linux systems, you can create a script in the /etc/init.d/ directory and use the update-rc.d command to add it to the startup sequence. You can also create a Systemd service or add the commands to the rc.local file to run them during system startup.

It is important to ensure that the scripts or commands are necessary and won't cause any issues or conflicts with other processes running on the system during startup. You can also test the scripts in a safe environment before implementing them in a production environment.
user: How can I run a vital script, a startup?
assistant: To run a vital script at system startup, you need to ensure that the script is added to the system's startup files or services. 

On Windows systems, you can add the script to the registry under the "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run" key to automatically run it at startup. You can also create a Windows service using the "sc create" command in Command Prompt or the "New-Service" cmdlet in PowerShell.

On Linux systems, you can add the script to the init.d directory and use the update-rc.d command to make it run automatically at startup. Alternatively, you can create a Systemd service using the "systemctl" command or add the script to the rc.local file to run it during the boot process.

It is important to ensure that the vital script is tested and functioning properly before adding it to the startup files, as any errors or conflicts during startup can cause problems for the rest of the system. You can also set up monitoring and logging for the script to ensure that it runs successfully and has no unexpected behaviors.
user: How can I run a Python script from a virtual environment at startup?
assistant: To run a Python script from a virtual environment at startup, you can create a shell script that activates the virtual environment and then runs the Python script.

Here are the steps to create a shell script for running a Python script from a virtual environment at startup on Linux:

1. Open a terminal and navigate to the directory where you want to create the shell script.
2. Create a new file and name it something like myscript.sh: `nano myscript.sh`
3. Add the following code to the file, replacing "myvenv" with the name of your virtual environment and "myscript.py" with the name of your Python script:

```
#!/bin/bash
cd /path/to/my/project
source /path/to/myvenv/bin/activate
python /path/to/myscript.py
```

4. Make the script executable: `chmod +x myscript.sh`
5. Test the script by running it: `./myscript.sh`

Once the script is working correctly, you can add it to the system's startup files so it will run automatically at startup. On Linux, you can add the script to the rc.local file or create a Systemd service that runs the script.
user: How can I do it on Windows OS?
assistant: On Windows, you can use a batch script to run a Python script from a virtual environment at startup. Here are the steps to create a batch script for this purpose:

1. Open Notepad or any other text editor.
2. Type the following code, replacing "C:\path\to\myenv" with the path to your virtual environment, and "C:\path\to\myscript.py" with the path to your Python script:

```
@echo off
cd C:\path\to\myproject
call C:\path\to\myenv\Scripts\activate.bat
python C:\path\to\myscript.py
```

3. Save the file with a .bat extension, such as "myscript.bat".
4. Place the .bat file in the Startup folder: Press Win+R to open the "Run" dialog box, type "shell:startup" and press Enter. This will open the "Startup" folder. Place the .bat file here.
5. Restart your computer and the script should run automatically from the virtual environment at startup.

Make sure to replace the paths in the batch script with the actual paths to your project, virtual environment, and Python script. Also, double-check that the script is working correctly by testing it before adding it to the Startup folder.


================================================
FILE: saved_chats/2023-03-26_TipsforRunningStartup.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: How can I run some command line scripts from System Startup?
assistant: To run command line scripts from System Startup, you can add the desired commands or scripts to the system's startup files or create a new script to run them.

For Windows systems, you can add the commands to the "Startup" folder or use the Registry Editor to create entries in the "Run" key. You can also use the Task Scheduler to schedule the scripts to run when the system starts up.

For Linux systems, you can create a script in the /etc/init.d/ directory and use the update-rc.d command to add it to the startup sequence. You can also create a Systemd service or add the commands to the rc.local file to run them during system startup.

It is important to ensure that the scripts or commands are necessary and won't cause any issues or conflicts with other processes running on the system during startup. You can also test the scripts in a safe environment before implementing them in a production environment.
user: How can I run a vital script, a startup?
assistant: To run a vital script at system startup, you need to ensure that the script is added to the system's startup files or services. 

On Windows systems, you can add the script to the registry under the "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run" key to automatically run it at startup. You can also create a Windows service using the "sc create" command in Command Prompt or the "New-Service" cmdlet in PowerShell.

On Linux systems, you can add the script to the init.d directory and use the update-rc.d command to make it run automatically at startup. Alternatively, you can create a Systemd service using the "systemctl" command or add the script to the rc.local file to run it during the boot process.

It is important to ensure that the vital script is tested and functioning properly before adding it to the startup files, as any errors or conflicts during startup can cause problems for the rest of the system. You can also set up monitoring and logging for the script to ensure that it runs successfully and has no unexpected behaviors.
user: How can I run a Python script from a virtual environment at startup?
assistant: To run a Python script from a virtual environment at startup, you can create a shell script that activates the virtual environment and then runs the Python script.

Here are the steps to create a shell script for running a Python script from a virtual environment at startup on Linux:

1. Open a terminal and navigate to the directory where you want to create the shell script.
2. Create a new file and name it something like myscript.sh: `nano myscript.sh`
3. Add the following code to the file, replacing "myvenv" with the name of your virtual environment and "myscript.py" with the name of your Python script:

```
#!/bin/bash
cd /path/to/my/project
source /path/to/myvenv/bin/activate
python /path/to/myscript.py
```

4. Make the script executable: `chmod +x myscript.sh`
5. Test the script by running it: `./myscript.sh`

Once the script is working correctly, you can add it to the system's startup files so it will run automatically at startup. On Linux, you can add the script to the rc.local file or create a Systemd service that runs the script.
user: How can I do it on Windows OS?
assistant: On Windows, you can use a batch script to run a Python script from a virtual environment at startup. Here are the steps to create a batch script for this purpose:

1. Open Notepad or any other text editor.
2. Type the following code, replacing "C:\path\to\myenv" with the path to your virtual environment, and "C:\path\to\myscript.py" with the path to your Python script:

```
@echo off
cd C:\path\to\myproject
call C:\path\to\myenv\Scripts\activate.bat
python C:\path\to\myscript.py
```

3. Save the file with a .bat extension, such as "myscript.bat".
4. Place the .bat file in the Startup folder: Press Win+R to open the "Run" dialog box, type "shell:startup" and press Enter. This will open the "Startup" folder. Place the .bat file here.
5. Restart your computer and the script should run automatically from the virtual environment at startup.

Make sure to replace the paths in the batch script with the actual paths to your project, virtual environment, and Python script. Also, double-check that the script is working correctly by testing it before adding it to the Startup folder.


================================================
FILE: saved_chats/2023-03-28_AsteroidImpact.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: Quanto grande dovrebbe essere un asteroide che si sta impattando sulla Terra prima di causare i danni significativi?
assistant: La dimensione dell'asteroide necessaria per causare danni significativi dipende da diversi fattori come la composizione dell'asteroide, la sua velocit e l'angolo di impatto. Tuttavia, si stima che un diametro minimo di circa 1 km sia necessario per causare impatti a livello globale che potrebbero portare a conseguenze catastrofiche come l'oscuramento del sole, la caduta di temperature, tsunami e distruzione di intere aree urbane. 

Tuttavia, anche gli asteroidi di dimensioni inferiori possono causare danni significativi se impattano in aree densamente popolate. Ad esempio, nel 2013 un asteroide di soli 20 metri di diametro si  disintegrato nell'atmosfera sopra la citt russa di Chelyabinsk, causando danni materiali significativi e lesioni a centinaia di persone. 

Quindi, sebbene le dimensioni dell'asteroide siano un fattore importante da considerare, anche altri fattori come la densit della popolazione e la posizione dell'impatto possono influire sui danni causati.  quindi importante monitorare costantemente l'orbita degli asteroidi e sviluppare metodi per prevenire e mitigare potenziali impatti.


================================================
FILE: saved_chats/2023-03-28_ChoosingAppleComputer.txt
================================================
system: you are a helpful assistant.
user: What is the last Apple computer that you know?
assistant: As an AI language model, I do not have the capability to have up-to-date knowledge, but I can tell you that the latest Apple computer as of early 2021 is the M1 MacBook Pro, MacBook Air, and Mac mini. These models feature Apple's new M1 chip, which promises better performance and energy efficiency compared to previous models. Additionally, there are other Apple computers still available on the market such as the MacBook Pro 16-inch, iMac, and iMac Pro that are still popular among Apple users.
user: Okay, let's say next year Apple will launch a new line of computers which have a possible M2 chip with faster processing capabilities. What would you recommend between the M1 and this hypothetical M2 if I aim to do deep learning and machine learning algorithm?
assistant: If you are planning to do deep learning and machine learning algorithm, it is likely that you will benefit from a computer with faster processing capabilities. If Apple does indeed launch a new line of computers with an M2 chip, it is possible that the new chip will have faster processing capabilities than the M1 chip, which could be beneficial for your purposes. 

However, the decision of which chip to choose ultimately depends on the specific requirements of your workloads. Deep learning and machine learning algorithms can be very demanding in terms of computational resources, so it is important to choose a system that can handle the specific workload you are working with.

In general, if the M2 chip provides significant improvements in processing speed, it may be worth considering upgrading, especially if you often work with large datasets or perform computationally intense operations. However, it is also worth considering other factors like the cost and availability of the new line, as well as the compatibility of software and drivers with the new architecture.
user: Ok, so I am undecided between choosing the M2 with 8GB of unified RAM or to choose an M1 with 16GB of unified memory. Which one is the best considering that I don't want to spend much money?
assistant: If you are on a tight budget, choosing the M1 with 16GB of unified memory could be a better option. The M1 chip is already a high-performance processor, and the 16GB of unified memory provides enough memory capacity for most workloads. 

While the M2 chip with 8GB of unified memory may offer faster processing capabilities, having less memory could become a limitation when dealing with large datasets, running multiple deep learning or machine learning algorithms at the same time, or performing other memory-intensive tasks. Additionally, the M2 chip with 8GB of memory may be more expensive than the M1 with 16GB of memory.

Ultimately, it is important to consider not only the processing power but also memory capacity when choosing a computer for deep learning and machine learning algorithms. If you have demanding workloads, more memory will be beneficial, even if it means sacrificing some processing speed.
user: Okay, considering that I am dealing with bigger data sets, what would be the ideal size of a unified memory, which includes graphic card dedicated memory? Because I would think that 16 gigabytes is enough, but some people suggest that 24 or even higher is necessary to do this kind of work.
assistant: The ideal size of a unified memory for deep learning and machine learning algorithm depends on the size of your dataset, the complexity of your models, and the number of training or inference operations you perform. 

While 16GB of unified memory may be sufficient for many use cases, larger memory capacities can offer benefits in terms of performance and speed. If you work with large datasets and deep neural networks, or if you tend to perform multiple workloads simultaneously, you may benefit from greater memory capacity.

In general, a minimum of 16GB of unified memory is recommended for deep learning and machine learning workloads. However, if you can afford it, 24GB or even 32GB of unified memory can potentially provide significant performance improvements, especially when working with large datasets that require lots of memory. Additionally, having dedicated graphics memory (VRAM) can also improve performance, especially for tasks that involve a lot of graphics processing.

Ultimately, the ideal amount of memory (both unified memory and VRAM) for deep learning and machine learning depends on the specific needs of your use case. If you work with large data sets, it may be worth investing in a system with more memory capacity to help increase performance and productivity.


================================================
FILE: saved_chats/2023-03-28_ChoosingtheRightAppleComputerforDeepLearningandMachineLearningM1vsM2ChipandOptimalMemorySize.txt
================================================
system: you are a helpful assistant.
user: What is the last Apple computer that you know?
assistant: As an AI language model, I do not have the capability to have up-to-date knowledge, but I can tell you that the latest Apple computer as of early 2021 is the M1 MacBook Pro, MacBook Air, and Mac mini. These models feature Apple's new M1 chip, which promises better performance and energy efficiency compared to previous models. Additionally, there are other Apple computers still available on the market such as the MacBook Pro 16-inch, iMac, and iMac Pro that are still popular among Apple users.
user: Okay, let's say next year Apple will launch a new line of computers which have a possible M2 chip with faster processing capabilities. What would you recommend between the M1 and this hypothetical M2 if I aim to do deep learning and machine learning algorithm?
assistant: If you are planning to do deep learning and machine learning algorithm, it is likely that you will benefit from a computer with faster processing capabilities. If Apple does indeed launch a new line of computers with an M2 chip, it is possible that the new chip will have faster processing capabilities than the M1 chip, which could be beneficial for your purposes. 

However, the decision of which chip to choose ultimately depends on the specific requirements of your workloads. Deep learning and machine learning algorithms can be very demanding in terms of computational resources, so it is important to choose a system that can handle the specific workload you are working with.

In general, if the M2 chip provides significant improvements in processing speed, it may be worth considering upgrading, especially if you often work with large datasets or perform computationally intense operations. However, it is also worth considering other factors like the cost and availability of the new line, as well as the compatibility of software and drivers with the new architecture.
user: Ok, so I am undecided between choosing the M2 with 8GB of unified RAM or to choose an M1 with 16GB of unified memory. Which one is the best considering that I don't want to spend much money?
assistant: If you are on a tight budget, choosing the M1 with 16GB of unified memory could be a better option. The M1 chip is already a high-performance processor, and the 16GB of unified memory provides enough memory capacity for most workloads. 

While the M2 chip with 8GB of unified memory may offer faster processing capabilities, having less memory could become a limitation when dealing with large datasets, running multiple deep learning or machine learning algorithms at the same time, or performing other memory-intensive tasks. Additionally, the M2 chip with 8GB of memory may be more expensive than the M1 with 16GB of memory.

Ultimately, it is important to consider not only the processing power but also memory capacity when choosing a computer for deep learning and machine learning algorithms. If you have demanding workloads, more memory will be beneficial, even if it means sacrificing some processing speed.
user: Okay, considering that I am dealing with bigger data sets, what would be the ideal size of a unified memory, which includes graphic card dedicated memory? Because I would think that 16 gigabytes is enough, but some people suggest that 24 or even higher is necessary to do this kind of work.
assistant: The ideal size of a unified memory for deep learning and machine learning algorithm depends on the size of your dataset, the complexity of your models, and the number of training or inference operations you perform. 

While 16GB of unified memory may be sufficient for many use cases, larger memory capacities can offer benefits in terms of performance and speed. If you work with large datasets and deep neural networks, or if you tend to perform multiple workloads simultaneously, you may benefit from greater memory capacity.

In general, a minimum of 16GB of unified memory is recommended for deep learning and machine learning workloads. However, if you can afford it, 24GB or even 32GB of unified memory can potentially provide significant performance improvements, especially when working with large datasets that require lots of memory. Additionally, having dedicated graphics memory (VRAM) can also improve performance, especially for tasks that involve a lot of graphics processing.

Ultimately, the ideal amount of memory (both unified memory and VRAM) for deep learning and machine learning depends on the specific needs of your use case. If you work with large data sets, it may be worth investing in a system with more memory capacity to help increase performance and productivity.


================================================
FILE: saved_chats/2023-03-28_SizeandImpactHowBigDoesanAsteroidNeedtoBetoCauseSignificantDamage.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: Quanto grande dovrebbe essere un asteroide che si sta impattando sulla Terra prima di causare i danni significativi?
assistant: La dimensione dell'asteroide necessaria per causare danni significativi dipende da diversi fattori come la composizione dell'asteroide, la sua velocit e l'angolo di impatto. Tuttavia, si stima che un diametro minimo di circa 1 km sia necessario per causare impatti a livello globale che potrebbero portare a conseguenze catastrofiche come l'oscuramento del sole, la caduta di temperature, tsunami e distruzione di intere aree urbane. 

Tuttavia, anche gli asteroidi di dimensioni inferiori possono causare danni significativi se impattano in aree densamente popolate. Ad esempio, nel 2013 un asteroide di soli 20 metri di diametro si  disintegrato nell'atmosfera sopra la citt russa di Chelyabinsk, causando danni materiali significativi e lesioni a centinaia di persone. 

Quindi, sebbene le dimensioni dell'asteroide siano un fattore importante da considerare, anche altri fattori come la densit della popolazione e la posizione dell'impatto possono influire sui danni causati.  quindi importante monitorare costantemente l'orbita degli asteroidi e sviluppare metodi per prevenire e mitigare potenziali impatti.


================================================
FILE: saved_chats/2023-03-29_InteractiveSystem.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: Voglio progettare un sistema che sia in grado di compiere certi azioni su un computer, ad esempio creare file o cercare in una cartella, ma allo stesso tempo rispondere a domande come stai facendo tu adesso. Come faccio a creare una distinzione tra comandi e domande senza fare una serie di se e forse?
assistant: Una possibile soluzione potrebbe essere l'uso di un'interfaccia vocale interattiva in grado di comprendere il linguaggio naturale. In questo modo, il sistema pu essere dotato di un riconoscimento vocale che identifichi se l'utente sta dando un comando o una domanda. 

Inoltre, una volta identificata la tipologia di richiesta, il sistema pu adattare la risposta di conseguenza. Ad esempio, se l'utente chiede "come funziona questo comando?" ci si pu aspettare una risposta esplicativa, mentre se l'utente dice "crea un nuovo file" ci si aspetta un'azione. 

In aggiunta,  possibile addestrare il sistema per riconoscere alcune parole chiave che identifichino esplicitamente un comando, come ad esempio "crea", "copia", "sposta", ecc. 

In ogni caso, mantenere una comunicazione chiara ed esaustiva con l'utente  fondamentale per garantire l'efficacia e l'usabilit del sistema.
user: Esistono data site pubblici su cui possa destrarre un tale sistema dove vengono in particolar modo associati particolari prompt ad elezioni?
assistant: Ci sono alcuni siti pubblici che offrono dataset di conversazioni tra utenti e chatbot, che possono essere usati per addestrare un sistema di conversazione. 

Ad esempio, si pu fare riferimento a conversazioni disponibili sul sito di Kaggle, Github o sul sito del Stanford Question Answering Dataset (SQuAD). In questi luoghi  possibile trovare conversazioni tra utenti e chatbot in diversi ambiti, ad esempio per il customer service o per la ricerca di informazioni specifiche. 

Tuttavia, va sottolineato che il processo di addestramento richiede una certa esperienza e un'attenta attenzione alla qualit dei dati utilizzati. Alcune tecniche di apprendimento automatico potrebbero richiedere notevoli quantit di dati per ottenere risultati soddisfacenti. 

In ogni caso, l'individuazione dei dataset corretti e la loro pulizia sono passaggi fondamentali per l'addestramento di un sistema di conversazione, soprattutto se si vogliono associare dei prompt ad elezioni specifiche. Per questi motivi, pu essere utile rivolgersi a esperti del settore per una consulenza mirata.


================================================
FILE: saved_chats/2023-03-29_TraininganInteractiveSystemtoRespondtoCommandsandQuestions.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: Voglio progettare un sistema che sia in grado di compiere certi azioni su un computer, ad esempio creare file o cercare in una cartella, ma allo stesso tempo rispondere a domande come stai facendo tu adesso. Come faccio a creare una distinzione tra comandi e domande senza fare una serie di se e forse?
assistant: Una possibile soluzione potrebbe essere l'uso di un'interfaccia vocale interattiva in grado di comprendere il linguaggio naturale. In questo modo, il sistema pu essere dotato di un riconoscimento vocale che identifichi se l'utente sta dando un comando o una domanda. 

Inoltre, una volta identificata la tipologia di richiesta, il sistema pu adattare la risposta di conseguenza. Ad esempio, se l'utente chiede "come funziona questo comando?" ci si pu aspettare una risposta esplicativa, mentre se l'utente dice "crea un nuovo file" ci si aspetta un'azione. 

In aggiunta,  possibile addestrare il sistema per riconoscere alcune parole chiave che identifichino esplicitamente un comando, come ad esempio "crea", "copia", "sposta", ecc. 

In ogni caso, mantenere una comunicazione chiara ed esaustiva con l'utente  fondamentale per garantire l'efficacia e l'usabilit del sistema.
user: Esistono data site pubblici su cui possa destrarre un tale sistema dove vengono in particolar modo associati particolari prompt ad elezioni?
assistant: Ci sono alcuni siti pubblici che offrono dataset di conversazioni tra utenti e chatbot, che possono essere usati per addestrare un sistema di conversazione. 

Ad esempio, si pu fare riferimento a conversazioni disponibili sul sito di Kaggle, Github o sul sito del Stanford Question Answering Dataset (SQuAD). In questi luoghi  possibile trovare conversazioni tra utenti e chatbot in diversi ambiti, ad esempio per il customer service o per la ricerca di informazioni specifiche. 

Tuttavia, va sottolineato che il processo di addestramento richiede una certa esperienza e un'attenta attenzione alla qualit dei dati utilizzati. Alcune tecniche di apprendimento automatico potrebbero richiedere notevoli quantit di dati per ottenere risultati soddisfacenti. 

In ogni caso, l'individuazione dei dataset corretti e la loro pulizia sono passaggi fondamentali per l'addestramento di un sistema di conversazione, soprattutto se si vogliono associare dei prompt ad elezioni specifiche. Per questi motivi, pu essere utile rivolgersi a esperti del settore per una consulenza mirata.


================================================
FILE: saved_chats/2023-03-30_ExploringtheColdestTemperaturesonEarthRecordBreakingLowTemperaturesandHistoricalEstimates.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: o nella temperatura minima mai registrata sulla terra.
assistant: La temperatura minima mai registrata sulla Terra  stata di -89,2C, misurata il 21 luglio 1983 presso la stazione di ricerca russa di Vostok, in Antartide. Questa  stata la temperatura pi bassa registrata sulla superficie terrestre ed  stata possibile grazie alla combinazione di diversi fattori come l'altitudine della stazione, l'assenza di vento e il clima estremamente secco della regione. Questi fattori hanno creato le condizioni ideali per la formazione di temperature cos estreme.  importante notare che questo tipo di temperatura  estremamente pericoloso per la vita umana, che pu essere esposta a rischio di congelamento e ipotermia in pochi minuti.
user: Qual  la temperatura pi bassa mai registrata, invece, nella storia della Terra e per registrata in tenda ostimata?
assistant: La temperatura pi bassa mai registrata sulla Terra, considerando anche quelle non ottenute tramite stazioni meteorologiche ufficiali,  stata di -128,6C ed  stata registrata il 23 luglio 1983 sulla cima della calotta glaciale dell'Antartide orientale, ad un'altitudine di circa 3.900 metri. Questa temperatura  stata registrata attraverso l'utilizzo di satelliti meteorologici ed  stata confermata dal British Antarctic Survey. Tuttavia,  bene precisare che questa temperatura estrema  stata registrata in una tenda ostimata per un esperimento scientifico e non in una stazione meteorologica ufficiale.
user: Qual  la temperatura minima stimata sulla Terra da quando  stata creata?
assistant: Non  possibile rispondere in modo preciso alla tua domanda poich l'et della Terra  di circa 4,54 miliardi di anni e le misurazioni di temperatura sono iniziate solo nei tempi storici pi recenti. Tuttavia, gli studiosi che studiano il clima e la geologia della Terra hanno sommato diverse prove scientifiche per stimare la temperatura minima registrata in epoche passate. Ad esempio, studi geologici indicano che circa 700 milioni di anni fa la Terra era coperta da ghiaccio per lungo tempo, un periodo noto come "Terra a Palla di Neve", e le temperature durante questo periodo potrebbero essere state inferiori a -50C. Altri eventi geologici, come l'eruzione di vulcani su vasta scala, possono aver reso il clima della Terra tiepido e umido in passato. In ogni caso, gli studi e le simulazioni climatiche suggeriscono che la temperatura minima della Terra  variata significativamente nel corso della storia geologica del nostro pianeta.
assistant: "Exploring the Coldest Temperatures on Earth: Record-Breaking Low Temperatures and Historical Estimates"


================================================
FILE: saved_chats/2023-03-30_ExtremeColdTemperatures.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: o nella temperatura minima mai registrata sulla terra.
assistant: La temperatura minima mai registrata sulla Terra  stata di -89,2C, misurata il 21 luglio 1983 presso la stazione di ricerca russa di Vostok, in Antartide. Questa  stata la temperatura pi bassa registrata sulla superficie terrestre ed  stata possibile grazie alla combinazione di diversi fattori come l'altitudine della stazione, l'assenza di vento e il clima estremamente secco della regione. Questi fattori hanno creato le condizioni ideali per la formazione di temperature cos estreme.  importante notare che questo tipo di temperatura  estremamente pericoloso per la vita umana, che pu essere esposta a rischio di congelamento e ipotermia in pochi minuti.
user: Qual  la temperatura pi bassa mai registrata, invece, nella storia della Terra e per registrata in tenda ostimata?
assistant: La temperatura pi bassa mai registrata sulla Terra, considerando anche quelle non ottenute tramite stazioni meteorologiche ufficiali,  stata di -128,6C ed  stata registrata il 23 luglio 1983 sulla cima della calotta glaciale dell'Antartide orientale, ad un'altitudine di circa 3.900 metri. Questa temperatura  stata registrata attraverso l'utilizzo di satelliti meteorologici ed  stata confermata dal British Antarctic Survey. Tuttavia,  bene precisare che questa temperatura estrema  stata registrata in una tenda ostimata per un esperimento scientifico e non in una stazione meteorologica ufficiale.
user: Qual  la temperatura minima stimata sulla Terra da quando  stata creata?
assistant: Non  possibile rispondere in modo preciso alla tua domanda poich l'et della Terra  di circa 4,54 miliardi di anni e le misurazioni di temperatura sono iniziate solo nei tempi storici pi recenti. Tuttavia, gli studiosi che studiano il clima e la geologia della Terra hanno sommato diverse prove scientifiche per stimare la temperatura minima registrata in epoche passate. Ad esempio, studi geologici indicano che circa 700 milioni di anni fa la Terra era coperta da ghiaccio per lungo tempo, un periodo noto come "Terra a Palla di Neve", e le temperature durante questo periodo potrebbero essere state inferiori a -50C. Altri eventi geologici, come l'eruzione di vulcani su vasta scala, possono aver reso il clima della Terra tiepido e umido in passato. In ogni caso, gli studi e le simulazioni climatiche suggeriscono che la temperatura minima della Terra  variata significativamente nel corso della storia geologica del nostro pianeta.
assistant: "Exploring the Coldest Temperatures on Earth: Record-Breaking Low Temperatures and Historical Estimates"


================================================
FILE: saved_chats/2023-03-30_TagBeerBasics.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: What is beer?
assistant: Beer is an alcoholic beverage made from fermented grains, typically malted barley, hops, yeast, and water. The process involves mashing the grains to extract sugars, boiling the resulting liquid with hops to add bitterness and flavor, and fermenting the mixture with yeast to produce alcohol and carbonation. The specific ingredients and brewing techniques used can vary widely depending on the style of beer being produced, resulting in a wide range of flavors, aromas, and colors. Beer has been enjoyed for thousands of years and is a popular drink around the world.
assistant: The Basics of Beer: Understanding the Popular Alcoholic Beverage


================================================
FILE: saved_chats/2023-03-30_TagSkyColors.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: Che colori hai il cielo? Blu o giallo?
assistant: Il colore del cielo dipende dalle condizioni climatiche e dall'ora del giorno. Di solito, durante le ore diurne, il cielo appare di un bel colore blu, mentre al tramonto e all'alba pu diventare gradatamente di un colore giallo-arancione. Tuttavia, questi colori possono variare anche in base alla posizione geografica e alle stagioni dell'anno. Ad esempio, in alcune parti del mondo durante le tempeste di sabbia il cielo pu diventare rossastro o marrone. In generale, comunque, la maggior parte delle persone associa il colore del cielo al blu.
assistant: "The Colors of the Sky: Exploring the Hues and Variations"


================================================
FILE: saved_chats/2023-03-30_TheBasicsofBeerUnderstandingthePopularAlcoholicBeverage.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: What is beer?
assistant: Beer is an alcoholic beverage made from fermented grains, typically malted barley, hops, yeast, and water. The process involves mashing the grains to extract sugars, boiling the resulting liquid with hops to add bitterness and flavor, and fermenting the mixture with yeast to produce alcohol and carbonation. The specific ingredients and brewing techniques used can vary widely depending on the style of beer being produced, resulting in a wide range of flavors, aromas, and colors. Beer has been enjoyed for thousands of years and is a popular drink around the world.
assistant: The Basics of Beer: Understanding the Popular Alcoholic Beverage


================================================
FILE: saved_chats/2023-03-30_TheColorsoftheSkyExploringtheHuesandVariations.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: Che colori hai il cielo? Blu o giallo?
assistant: Il colore del cielo dipende dalle condizioni climatiche e dall'ora del giorno. Di solito, durante le ore diurne, il cielo appare di un bel colore blu, mentre al tramonto e all'alba pu diventare gradatamente di un colore giallo-arancione. Tuttavia, questi colori possono variare anche in base alla posizione geografica e alle stagioni dell'anno. Ad esempio, in alcune parti del mondo durante le tempeste di sabbia il cielo pu diventare rossastro o marrone. In generale, comunque, la maggior parte delle persone associa il colore del cielo al blu.
assistant: "The Colors of the Sky: Exploring the Hues and Variations"


================================================
FILE: saved_chats/2023-04-01_IslandTravel.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: avrei bisogno di fare un regalo a una persona che sa preferire un viaggio in nisando cosa proponi?
assistant: Se la persona a cui devi fare il regalo  un appassionato di viaggi, allora un viaggio potrebbe essere un'ottima idea regalo. Una meta interessante da suggerire potrebbe essere il Giappone, con la sua cultura millenaria, la splendida natura e la modernit delle grandi citt come Tokyo. Un'alternativa potrebbe essere il Per, con le rovine degli antichi siti inca, la magnificenza dell'Amazzonia e l'affascinante citt di Cuzco. Queste due destinazioni offrono numerose attivit e luoghi da visitare, garantendo un'esperienza di viaggio unica e indimenticabile. Ricorda che un viaggio  una scelta personale, quindi se conosci bene la persona puoi anche provare a capire le sue preferenze e selezionare una meta in base ai suoi interessi personali.
user: Questa persona sta per fare un viaggio in Islanda. Cosa posso regalarle?
assistant: Se la persona sta per fare un viaggio in Islanda, un regalo utile potrebbe essere una guida turistica o una mappa del paese, in modo che possa pianificare i suoi spostamenti e visitare i luoghi che pi la interessano. Un altro regalo utile potrebbe essere un'attrezzatura per escursioni, come una giacca impermeabile o degli scarponi da trekking, che potranno essere utili durante le escursioni nell'affascinante natura islandese. In alternativa, puoi considerare l'idea di regalare un souvenir tipico dell'Islanda, come un lanoso maglione islandese o una libreria di saghe islandesi. In questo modo potrai aiutare la persona a sentirsi pi a suo agio e apprezzare al meglio la sua esperienza di viaggio in Islanda.
user: Quali sono esperienze tipiche che si possono fare in Islanda che richiedono aderizatura particolare? Mi stanno venendo in mente i vicani.
assistant: In Islanda ci sono molte esperienze tipiche che richiedono attrezzature particolari, tra cui l'escursione sui ghiacciai come il Vatnajkull o il Langjkull, che richiedono scarponi da ghiaccio, piccozza e corda. Anche l'alpinismo sulle montagne come il Kirkjufell o il Hvannadalshnkur richiede l'utilizzo di attrezzature specifiche come ramponi, corda e attrezzature da arrampicata.  possibile fare un'escursione nella lava dell'Eyjafjallajkull o del vulcano Thrihnukagigur, questa esperienza richiede un elmetto e attrezzatura da speleologia. Potresti anche considerare l'opzione di un'escursione per ammirare le balene nella baia di Hsavk o di pescare nei fiumi come il Blanda, queste attivit richiedono un'adesione per gli equipaggiamenti necessari. Infine, si pu fare snorkeling o immersioni tra la spettacolare fessura di Silfra nel parco nazionale Thingvellir, queste attivit richiedono la muta da sub e l'immersione in acqua fredda. Ricorda sempre di prenotare le attivit tramite agenzie affidabili e di assicurarti che la persona abbia le attrezzature e le conoscenze necessarie per eseguire l'esperienza in modo sicuro.
assistant: "Regali e attivit per un viaggio in Islanda"


================================================
FILE: saved_chats/2023-04-01_RegalieattivitàperunviaggioinIslanda.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 20 words.
user: avrei bisogno di fare un regalo a una persona che sa preferire un viaggio in nisando cosa proponi?
assistant: Se la persona a cui devi fare il regalo  un appassionato di viaggi, allora un viaggio potrebbe essere un'ottima idea regalo. Una meta interessante da suggerire potrebbe essere il Giappone, con la sua cultura millenaria, la splendida natura e la modernit delle grandi citt come Tokyo. Un'alternativa potrebbe essere il Per, con le rovine degli antichi siti inca, la magnificenza dell'Amazzonia e l'affascinante citt di Cuzco. Queste due destinazioni offrono numerose attivit e luoghi da visitare, garantendo un'esperienza di viaggio unica e indimenticabile. Ricorda che un viaggio  una scelta personale, quindi se conosci bene la persona puoi anche provare a capire le sue preferenze e selezionare una meta in base ai suoi interessi personali.
user: Questa persona sta per fare un viaggio in Islanda. Cosa posso regalarle?
assistant: Se la persona sta per fare un viaggio in Islanda, un regalo utile potrebbe essere una guida turistica o una mappa del paese, in modo che possa pianificare i suoi spostamenti e visitare i luoghi che pi la interessano. Un altro regalo utile potrebbe essere un'attrezzatura per escursioni, come una giacca impermeabile o degli scarponi da trekking, che potranno essere utili durante le escursioni nell'affascinante natura islandese. In alternativa, puoi considerare l'idea di regalare un souvenir tipico dell'Islanda, come un lanoso maglione islandese o una libreria di saghe islandesi. In questo modo potrai aiutare la persona a sentirsi pi a suo agio e apprezzare al meglio la sua esperienza di viaggio in Islanda.
user: Quali sono esperienze tipiche che si possono fare in Islanda che richiedono aderizatura particolare? Mi stanno venendo in mente i vicani.
assistant: In Islanda ci sono molte esperienze tipiche che richiedono attrezzature particolari, tra cui l'escursione sui ghiacciai come il Vatnajkull o il Langjkull, che richiedono scarponi da ghiaccio, piccozza e corda. Anche l'alpinismo sulle montagne come il Kirkjufell o il Hvannadalshnkur richiede l'utilizzo di attrezzature specifiche come ramponi, corda e attrezzature da arrampicata.  possibile fare un'escursione nella lava dell'Eyjafjallajkull o del vulcano Thrihnukagigur, questa esperienza richiede un elmetto e attrezzatura da speleologia. Potresti anche considerare l'opzione di un'escursione per ammirare le balene nella baia di Hsavk o di pescare nei fiumi come il Blanda, queste attivit richiedono un'adesione per gli equipaggiamenti necessari. Infine, si pu fare snorkeling o immersioni tra la spettacolare fessura di Silfra nel parco nazionale Thingvellir, queste attivit richiedono la muta da sub e l'immersione in acqua fredda. Ricorda sempre di prenotare le attivit tramite agenzie affidabili e di assicurarti che la persona abbia le attrezzature e le conoscenze necessarie per eseguire l'esperienza in modo sicuro.
assistant: "Regali e attivit per un viaggio in Islanda"


================================================
FILE: saved_chats/2023-04-02_SoundCardComponentsandADC.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 15 words. You can make question.
user: Ciao, da GPT!
assistant: Salve! Come posso esserti utile? Hai bisogno di informazioni su qualche argomento specifico?
user: Quali sono le componenti di una scheda audio?
assistant: Le schede audio sono composte da diverse componenti che lavorano insieme per produrre un suono di alta qualit. Ecco alcune delle componenti pi importanti di una scheda audio:

1. Convertitore analogico-digitale (ADC) - Questa componente converte i segnali audio analogici in dati digitali che possono essere elaborati dal computer.

2. Convertitore digitale-analogico (DAC) - Questa componente converte i dati digitali in segnali audio analogici che possono essere ascoltati dalle casse.

3. Amplificatore - Questa componente amplifica la potenza del segnale audio in uscita, in modo da rendere il suono pi forte e meno distorto.

4. Interfaccia MIDI - Questa componente consente di collegare strumenti musicali digitali (ad esempio tastiere o drum machine) alla scheda audio e di controllarli attraverso il software.

5. Codec - Questa componente  un'unit che gestisce la codifica e la decodifica dei segnali audio digitali.

6. Driver - Questa componente software consente al sistema operativo di comunicare con la scheda audio e di utilizzare tutte le sue funzioni.
user: Spiegami come  fatto l'ADC, il convertitore analogico digitale.
assistant: L'ADC, o convertitore analogico-digitale,  una componente fondamentale delle schede audio e permette di convertire un segnale audio analogico in un segnale digitale che pu essere elaborato dal computer. 

L'ADC  composto da tre parti principali: un campionatore, un quantizzatore e un codificatore.

Il campionatore preleva il segnale analogico in ingresso e campiona il valore della tensione ad intervalli regolari di tempo. Questi valori campionati creano una sequenza di numeri che rappresentano l'onda sonora originale.

Il quantizzatore approssima i valori delle tensioni campionate ad un determinato numero di bit, creando cos un insieme discreto di valori. Maggiore  il numero dei bit che utilizza il quantizzatore, maggiore sar la fedelt alla rappresentazione dell'onda sonora originale.

Infine, il codificatore converte la sequenza di numeri campionati e quantizzati in un formato digitale che pu essere elaborato dal computer. Questo formato pu essere un formato a 8, a 16 o a 24 bit, a seconda della qualit desiderata. Una maggiore precisione nel convertitore aumenta la qualit dell'audio prodotto. 

In sintesi, l'ADC trasforma un segnale audio analogico in una sequenza di numeri digitali ad intervalli regolari di tempo, grazie all'utilizzo di un campionatore, un quantizzatore e un codificatore.
assistant: "Understanding the Components of a Sound Card and the Function of ADC"


================================================
FILE: saved_chats/2023-04-02_UnderstandingtheComponentsofaSoundCardandtheFunctionofADC.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 15 words. You can make question.
user: Ciao, da GPT!
assistant: Salve! Come posso esserti utile? Hai bisogno di informazioni su qualche argomento specifico?
user: Quali sono le componenti di una scheda audio?
assistant: Le schede audio sono composte da diverse componenti che lavorano insieme per produrre un suono di alta qualit. Ecco alcune delle componenti pi importanti di una scheda audio:

1. Convertitore analogico-digitale (ADC) - Questa componente converte i segnali audio analogici in dati digitali che possono essere elaborati dal computer.

2. Convertitore digitale-analogico (DAC) - Questa componente converte i dati digitali in segnali audio analogici che possono essere ascoltati dalle casse.

3. Amplificatore - Questa componente amplifica la potenza del segnale audio in uscita, in modo da rendere il suono pi forte e meno distorto.

4. Interfaccia MIDI - Questa componente consente di collegare strumenti musicali digitali (ad esempio tastiere o drum machine) alla scheda audio e di controllarli attraverso il software.

5. Codec - Questa componente  un'unit che gestisce la codifica e la decodifica dei segnali audio digitali.

6. Driver - Questa componente software consente al sistema operativo di comunicare con la scheda audio e di utilizzare tutte le sue funzioni.
user: Spiegami come  fatto l'ADC, il convertitore analogico digitale.
assistant: L'ADC, o convertitore analogico-digitale,  una componente fondamentale delle schede audio e permette di convertire un segnale audio analogico in un segnale digitale che pu essere elaborato dal computer. 

L'ADC  composto da tre parti principali: un campionatore, un quantizzatore e un codificatore.

Il campionatore preleva il segnale analogico in ingresso e campiona il valore della tensione ad intervalli regolari di tempo. Questi valori campionati creano una sequenza di numeri che rappresentano l'onda sonora originale.

Il quantizzatore approssima i valori delle tensioni campionate ad un determinato numero di bit, creando cos un insieme discreto di valori. Maggiore  il numero dei bit che utilizza il quantizzatore, maggiore sar la fedelt alla rappresentazione dell'onda sonora originale.

Infine, il codificatore converte la sequenza di numeri campionati e quantizzati in un formato digitale che pu essere elaborato dal computer. Questo formato pu essere un formato a 8, a 16 o a 24 bit, a seconda della qualit desiderata. Una maggiore precisione nel convertitore aumenta la qualit dell'audio prodotto. 

In sintesi, l'ADC trasforma un segnale audio analogico in una sequenza di numeri digitali ad intervalli regolari di tempo, grazie all'utilizzo di un campionatore, un quantizzatore e un codificatore.
assistant: "Understanding the Components of a Sound Card and the Function of ADC"


================================================
FILE: saved_chats/2023-04-03_ExploringMultiHeadSelfAttentionforKeywordIdentificationinNaturalLanguageProcessing.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 15 words. You can make question.
user: How can I use multi-head self-attention to identify keywords in a sentence?
assistant: Multi-head self-attention is a powerful technique in natural language processing that can be used to identify keywords in a sentence. The idea behind multi-head self-attention is to allow the model to pay attention to different parts of the input sequence at the same time. 

To identify keywords, you can apply multi-head self-attention to a sentence and then examine the attention weights to see which words have the highest weight. These words are likely to be the most important in the sentence and can be considered keywords.

Another approach is to use a method called "querying," where the model is given a specific keyword and then asked to find the parts of the sentence that are most relevant to that keyword. This can be done by applying multi-head self-attention and then examining the attention weights for the keyword's position in the input sequence.

Overall, multi-head self-attention is a powerful tool for identifying keywords in a sentence and can be used in a variety of applications, such as text classification and information extraction.
user: Does OpenAI provide the tools to compute multi-head self-attention of a given text?
assistant: Yes, OpenAI provides a toolkit called "OpenAI API" that allows you to compute multi-head self-attention for a given text. The OpenAI API provides access to a range of pre-trained language models, such as GPT-2 and GPT-3, which are capable of performing multi-head self-attention on text data.

To use the OpenAI API, you first need to sign up for an API key and then import the API client into your Python environment. Once you have set up the API client, you can use it to send requests to the OpenAI API server, which will process your text data and return a response containing the multi-head self-attention weights.

The OpenAI API offers a simple and straightforward way to access powerful language models and compute complex natural language processing tasks, such as multi-head self-attention, without the need for extensive computational resources or expertise in machine learning.
assistant: "Exploring Multi-Head Self-Attention for Keyword Identification in Natural Language Processing"


================================================
FILE: saved_chats/2023-04-03_TagNaturalLanguageProcessing.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 15 words. You can make question.
user: How can I use multi-head self-attention to identify keywords in a sentence?
assistant: Multi-head self-attention is a powerful technique in natural language processing that can be used to identify keywords in a sentence. The idea behind multi-head self-attention is to allow the model to pay attention to different parts of the input sequence at the same time. 

To identify keywords, you can apply multi-head self-attention to a sentence and then examine the attention weights to see which words have the highest weight. These words are likely to be the most important in the sentence and can be considered keywords.

Another approach is to use a method called "querying," where the model is given a specific keyword and then asked to find the parts of the sentence that are most relevant to that keyword. This can be done by applying multi-head self-attention and then examining the attention weights for the keyword's position in the input sequence.

Overall, multi-head self-attention is a powerful tool for identifying keywords in a sentence and can be used in a variety of applications, such as text classification and information extraction.
user: Does OpenAI provide the tools to compute multi-head self-attention of a given text?
assistant: Yes, OpenAI provides a toolkit called "OpenAI API" that allows you to compute multi-head self-attention for a given text. The OpenAI API provides access to a range of pre-trained language models, such as GPT-2 and GPT-3, which are capable of performing multi-head self-attention on text data.

To use the OpenAI API, you first need to sign up for an API key and then import the API client into your Python environment. Once you have set up the API client, you can use it to send requests to the OpenAI API server, which will process your text data and return a response containing the multi-head self-attention weights.

The OpenAI API offers a simple and straightforward way to access powerful language models and compute complex natural language processing tasks, such as multi-head self-attention, without the need for extensive computational resources or expertise in machine learning.
assistant: "Exploring Multi-Head Self-Attention for Keyword Identification in Natural Language Processing"


================================================
FILE: saved_chats/2023-04-04_ExploringFeasibleOptionsforPoweringanIronManSuitSolarCellsBatteriesandNuclearReactors.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 15 words. You can make question.
user: How would you power an Iron Man suit?
assistant: There are several ways to power an Iron Man suit. One option could be to use a miniaturized nuclear reactor, which would provide a lot of energy in a small space. Another option could be to use a super capacitor, which would be able to store and release large amounts of energy quickly. Alternatively, the suit could be powered by a combination of solar cells and batteries, which would be renewable and lightweight. Additionally, the suit could be powered by harnessing kinetic energy from the wearer's movements, similar to how some watches are powered by the motion of the wearer's wrist. The choice of power source would depend on factors such as the size and weight of the suit, the duration of use, the level of energy required for the suit's functions, and the feasibility of the technology.
user: Which one is the most feasible? How would you make a nuclear reactor in a small space?
assistant: In terms of feasibility, a combination of solar cells and batteries may be the most practical option to power an Iron Man suit. This is because solar cells can generate electricity from sunlight, which is a free and renewable source of energy, and batteries can store excess energy generated by the cells for later use. The use of solar cells and batteries would also eliminate the need for any dangerous or radioactive materials, reducing the risk of accidents or environmental damage.

However, for the smallest possible space, a miniaturized nuclear reactor may be the best choice for powering an Iron Man suit, as it can generate a very high level of energy in a small space, but there are many technical and safety hurdles that would need to be overcome to make this technology practical. 

Creating a miniaturized nuclear reactor would require advanced engineering and materials science to develop a compact reactor core that can operate safely and efficiently. One approach to creating a miniaturized reactor is to use advanced materials such as ceramic-based fuels or liquid metal coolants, which can withstand high temperatures and radiation. Another approach is to use fusion rather than fission to generate power, which could be safer and more efficient, but the technology is not yet available for practical applications. Additionally, the control and regulation of a miniaturized nuclear reactor in a small space presents significant technical challenges, requiring advanced automation and control systems to ensure safe and consistent operation.
assistant: "Exploring Feasible Options for Powering an Iron Man Suit: Solar Cells, Batteries, and Nuclear Reactors"


================================================
FILE: saved_chats/2023-04-04_PoweringIronManSuit.txt
================================================
system: You are a helpful assistant and you will answer in paragraphs. A paragraph can be as long as 15 words. You can make question.
user: How would you power an Iron Man suit?
assistant: There are several ways to power an Iron Man suit. One option could be to use a miniaturized nuclear reactor, which would provide a lot of energy in a small space. Another option could be to use a super capacitor, which would be able to store and release large amounts of energy quickly. Alternatively, the suit could be powered by a combination of solar cells and batteries, which would be renewable and lightweight. Additionally, the suit could be powered by harnessing kinetic energy from the wearer's movements, similar to how some watches are powered by the motion of the wearer's wrist. The choice of power source would depend on factors such as the size and weight of the suit, the duration of use, the level of energy required for the suit's functions, and the feasibility of the technology.
user: Which one is the most feasible? How would you make a nuclear reactor in a small space?
assistant: In terms of feasibility, a combination of solar cells and batteries may be the most practical option to power an Iron Man suit. This is because solar cells can generate electricity from sunlight, which is a free and renewable source of energy, and batteries can store excess energy generated by the cells for later use. The use of solar cells and batteries would also eliminate the need for any dangerous or radioactive materials, reducing the risk of accidents or environmental damage.

However, for the smallest possible space, a miniaturized nuclear reactor may be the best choice for powering an Iron Man suit, as it can generate a very high level of energy in a small space, but there are many technical and safety hurdles that would need to be overcome to make this technology practical. 

Creating a miniaturized nuclear reactor would require advanced engineering and materials science to develop a compact reactor core that can operate safely and efficiently. One approach to creating a miniaturized reactor is to use advanced materials such as ceramic-based fuels or liquid metal coolants, which can withstand high temperatures and radiation. Another approach is to use fusion rather than fission to generate power, which could be safer and more efficient, but the technology is not yet available for practical applications. Additionally, the control and regulation of a miniaturized nuclear reactor in a small space presents significant technical challenges, requiring advanced automation and control systems to ensure safe and consistent operation.
assistant: "Exploring Feasible Options for Powering an Iron Man Suit: Solar Cells, Batteries, and Nuclear Reactors"


================================================
FILE: saved_chats/2023-04-05_DockerContainerization.txt
================================================
system: You are a helpful assistant. You can make question to make the conversation entertaining.
user: What is Docker in the IT domain?
assistant: Docker is a software platform that allows developers to create, deploy, and manage applications by creating containerized environments. Containers are lightweight and standalone executable packages that include everything needed to run an application, including code, libraries, and system tools. Docker provides a standardized way to build and distribute these containers, making it much easier for developers to manage dependencies and deploy applications across different environments.
user: Could you make an example of how Docker can be used?
assistant: Certainly! Here's an example scenario:

Let's say you are a developer working on a web application that needs to be deployed on multiple servers, possibly running different operating systems. Without Docker, you would need to manually install all the dependencies and libraries required for the application to run on each server. This can be time-consuming and error-prone, especially when dealing with complex applications.

With Docker, you can create a container that includes all the software components and configurations needed to run the application, regardless of the host environment. You can then deploy this container to any server running Docker, and it will run exactly the same way as it does on your local development environment. This makes it easier to manage and deploy your application while also ensuring consistency across different environments.

Furthermore, if you need to make changes to your application, you can simply make those changes to the container rather than individually updating each server. This can save a lot of time and make development more efficient.
user: Does Docker have something to do with Raspberry Pi and Linux OS?
assistant: Yes, Docker can be used with Raspberry Pi and Linux OS. Docker provides support for multiple operating systems, including Linux, which runs on Raspberry Pi. This means that you can use Docker on a Raspberry Pi to containerize applications and manage their deployment, just as you would on a Linux server. 

In fact, Docker is becoming increasingly popular for IoT (Internet of Things) applications where Raspberry Pis are being used as edge devices. Docker's lightweight containerization technology allows developers to package their code, dependencies, and configuration files together, making it easier to deploy, manage and update applications on edge devices like Raspberry Pi.

Additionally, Docker has a growing catalog of pre-built images, some of which are specifically designed to run on ARM architecture (which is what the Raspberry Pi uses). So, Docker can be very useful to Raspberry Pi users who are interested in running a wide variety of applications on their device.
user: How hard is a docker to learn? Is it similar to any other programming language? Is it a programming language at all?
assistant: Docker is not a programming language, it is a tool for containerization which can be learned quite easily. Although it has its own set of commands and terminology, Docker is not similar to any programming language in terms of syntax or structure.

The basic concepts of Docker are relatively simple: containers, images, and registries. Once you understand these concepts, you can start working with Docker commands and creating your own containers. However, if you want to become an advanced Docker user, you may need to learn more about networking, security, and Dockerfile scripting.

Overall, the difficulty of learning Docker depends on your experience with containerization and system administration, and how deeply you want to dive into Docker's advanced features. For someone with some basic knowledge of containerization or server administration, Docker can be learned quickly and easily. Whether you are a developer, DevOps engineer, or system administrator, Docker is a valuable tool that is worth investing time in learning.
user: Explain this concept easily, like what is containerization and server administration.
assistant: Sure! Here's a simplified explanation:

Containerization refers to the process of creating a lightweight, standalone environment that includes all the necessary software components and configuration files needed to run an application. Containers are similar to virtual machines but with some key differences - they share the host operating system kernel but have their own file system, network stack, and resources.

Containerization is useful because it allows you to easily package an application and its dependencies together into a single container, making it easy to deploy, manage, and scale the application. With containerization, you can create consistent environments for software development, testing, and deployment, regardless of the host operating system.

Server administration refers to the process of managing and maintaining the software, hardware, and security of a server. In a typical web application architecture, the server is the computer that hosts the application and serves it to clients over the internet. Server administration includes tasks like installing and updating software, configuring network settings, monitoring system performance, and ensuring security.

Docker is a tool that combines the concepts of containerization and server administration. With Docker, you can create and manage containers, making it easier to deploy and manage server applications. Docker simplifies many of the tasks typically associated with server administration, like dependency management and software installations, and makes it easier to manage multiple servers at once.
assistant: "Understanding Docker: Containerization, Raspberry Pi and More"


================================================
FILE: saved_chats/2023-04-07_ApproachesforTextto.txt
================================================
system: You are a helpful assistant. You can make question to make the conversation entertaining.
user: What is a text-to-speech?
assistant: Text-to-speech (TTS) is a technology that converts written text into spoken words. It is often used to help those who have difficulty reading or those who prefer listening to text rather than reading it. TTS can also be used in various industries, such as education, healthcare, and entertainment.
user: What are the most popular technologies behind text-to-speech recognition?
assistant: The three most popular technology types used in text-to-speech recognition are:

1. Concatenative TTS: This system strings together pre-recorded words, phrases or sentences to generate new utterances. 

2. Formant Synthesis: This system uses mathematical algorithms to model the physical characteristics of a human vocal tract, then generates speech by creating sound waves that mimic the way human vocal tract forms speech.

3. Parametric TTS: This system uses complex algorithms that model a variety of vocal features  such as pitch, volume, and duration  to simulate human speech.
user: How can I do text-to-speech with deep learning techniques?
assistant: One popular deep learning approach for text-to-speech synthesis is using a generative model called WaveNet, which was developed by Googles AI team in 2016.

WaveNet uses a type of neural network called a convolutional neural network (CNN), which takes raw audio waveform signals as input and generates new audio waveforms as output. The network learns how to generate these waveforms by analyzing large datasets of speech samples.

Another deep learning architecture that can be used for text-to-speech synthesis is the Tacotron, which uses a combination of a sequence-to-sequence model for text processing and a vocoder to generate speech.

Both WaveNet and Tacotron require large datasets of speech samples for training, as well as significant computational resources to provide sufficient processing power.
assistant: Exploring Text-to-Speech: Popular Techniques and Deep Learning Approaches


================================================
FILE: saved_chats/2023-04-09_FitbitDataandSleep.txt
================================================
system: You are a helpful assistant. You can make question to make the conversation entertaining.
user: Esistono correlazioni tra il battito cardiaco minimo e la qualit del sonno.
assistant: S, esistono alcune correlazioni tra il battito cardiaco minimo e la qualit del sonno. Ad esempio, un battito cardiaco minimo pi basso pu indicare un maggior rilassamento e una maggiore qualit del sonno. Inoltre, una diminuzione della frequenza cardiaca durante il sonno pu essere indicativa di una buona regolazione dell'attivit del sistema nervoso autonomo, che  importante per il controllo del sonno e della veglia. Tuttavia,  importante considerare anche altri fattori, come l'et, lo stile di vita e la presenza di eventuali patologie, che possono influire sulla relazione tra il battito cardiaco e la qualit del sonno.
user: Sto sviluppando un'architettazione che collega dati Fitbit e qualit del sonno. Che dati Fitbit possono potenzialmente integrare dei sensori per la qualit del sonno.
assistant: Fitbit  un dispositivo indossabile che pu rilevare diverse informazioni sulla salute e il benessere, tra cui la qualit del sonno. Fitbit utilizza vari sensori, come un accelerometro, un giroscopio e un sensore ottico, per raccogliere dati sulla qualit del sonno degli utenti. Alcune delle informazioni che possono essere raccolte da Fitbit per valutare la qualit del sonno includono:

- La durata del sonno REM (movimenti oculari rapidi) durante la notte.
- La durata del sonno a onde lente durante la notte.
- La durata delle fasi di sonno profondo durante la notte.
- Il tempo impiegato per addormentarsi.
- Il numero di sveglie notturne e la loro durata.
- Il tempo totale di sonno e la sua suddivisione in cicli.

Fitbit utilizza questi dati per calcolare un punteggio complessivo della qualit del sonno degli utenti e fornire raccomandazioni per migliorarla. Inoltre, Fitbit permette di monitorare altri fattori che possono avere un impatto sulla qualit del sonno, come il livello di attivit fisica, la frequenza cardiaca, il consumo di acqua e il medesimo comportamento degli utenti durante il giorno.
assistant: "Exploring the Connection between Fitbit Data and Sleep Quality"


================================================
FILE: saved_chats/2023-04-11_Fitness.txt
================================================
system: You are a helpful assistant. You can make question to make the conversation entertaining.
user: Is it ok if my leg hurts after stretching?
assistant: It depends on the type and duration of the stretching exercise. Some mild discomfort or soreness is common after stretching, especially if you haven't exercised in a while, but it should not be too intense or last too long. If the pain persists, it might be a sign of an injury, and you should consult a healthcare professional.
user: How often should I stretch my legs?
assistant: The frequency with which you should stretch your legs will depend on several factors, such as your age, physical fitness, and the type of activity you do. In general, stretching your legs every day or after physical activity can help improve flexibility, range of motion, and reduce the risk of injury. It's important to listen to your body and avoid overstretching or going beyond your limits. You may also want to consult with a fitness professional or physical therapist for specific stretching routines based on your fitness goals and individual needs.
user: What is the optimal duration of a stretching exercise to gain benefits?
assistant: The optimal duration of a stretching exercise can vary depending on several factors such as age, fitness level, and the type of stretching being performed. In general, most experts recommend holding each stretch for 15-30 seconds and repeating each stretch 2-4 times. This duration has been shown to be effective in improving flexibility and range of motion, as well as reducing muscle soreness and the risk of injury. However, it's important to remember that everyone's body is different, and you should always listen to your body and avoid overstretching to prevent injury. If you have any concerns or specific goals, you might want to consider consulting with a qualified fitness professional.
assistant: The Importance of Stretching: Tips and Guidelines for Optimal Benefits.


================================================
FILE: saved_chats/2023-04-19_ChineseLanguages.txt
================================================
system: You are a helpful assistant. You can make question to make the conversation entertaining.
user: Quali sono le lingue pi parlate in Cina?
assistant: In Cina la lingua pi parlata  il cinese mandarino, anche se in alcune regioni del paese si parla anche il cantonese, lo shanghainese e l'hokkien. Inoltre, ci sono anche altre lingue minoritarie come il tibetano, il mongolo, il mancese e il coreano che sono utilizzate da alcune etnie presenti in Cina.
assistant: In Cina la lingua pi parlata  il cinese mandarino, anche se in alcune regioni del paese si parla anche il cantonese, lo shanghainese e l'hokkien. Inoltre, ci sono anche altre lingue minoritarie come il tibetano, il mongolo, il mancese e il coreano che sono utilizzate da alcune etnie presenti in Cina.
user: Quali sono le differenze tra queste lingue?
assistant: Le lingue parlate in Cina differiscono per la pronuncia, l'acento, la struttura grammaticale e il lessico. 

Il cinese mandarino, ad esempio,  la lingua ufficiale della Cina ed  basato su una pronuncia standardizzata, chiamata pinyin. Ha una struttura grammaticale simile ad altre lingue asiatiche come il giapponese e il coreano, e un vasto vocabolario composto da migliaia di caratteri cinesi.

Il cantonese, invece,  parlato principalmente nella regione del Guangdong, Hong Kong e Macao, ed  caratterizzato da una maggiore variazione nella pronuncia rispetto al mandarino. Ha anche un'intonazione pi musicale e un diverso vocabolario rispetto al mandarino.

Lo shanghainese  parlato principalmente nella regione dello Shanghai ed  caratterizzato da una pronuncia pi nasale rispetto al mandarino. Anche la grammatica e il lessico dello shanghainese differiscono dal mandarino.

L'hokkien, infine,  parlato principalmente nella provincia del Fujian e in alcune aree del sud-est asiatico.  noto per la sua vasta gamma di toni e di suoni consonantici, e ha un vocabolario distintivo rispetto ad altre lingue parlate in Cina.
assistant: Le principali lingue parlate in Cina sono il cinese mandarino, il cantonese, lo shanghainese e l'hokkien. Ognuna di queste lingue ha una pronuncia, una struttura grammaticale e un lessico distintivi.
user: Qual  la differenza tra lo Kien e lo Shangainese?
assistant: I think you might have meant "Hokkien" and not "Kien". Hokkien, also known as Minnan, is a group of dialects spoken in southern China, Taiwan, and Southeast Asia. On the other hand, Shanghainese is a dialect spoken in and around the city of Shanghai, in eastern China.

The main differences between Hokkien and Shanghainese are in the pronunciation, vocabulary, and grammar. Hokkien is known for its wide range of tonal sounds and its distinct consonant sounds. The vocabulary of Hokkien has been influenced by the languages of surrounding areas such as Southern Min and Fuzhou. Meanwhile, Shanghainese is known for its use of nasal sounds and its distinct intonation patterns. The vocabulary of Shanghainese has been influenced by local languages and cultures of the region.

In summary, Hokkien and Shanghainese are two distinct dialects of Chinese with unique characteristics and influences.
assistant: Hokkien and Shanghainese are two distinct dialects of Chinese with unique characteristics and influences. They differ in pronunciation, vocabulary, and grammar.
assistant: "Languages of China"
user: Save this conversation.
assistant: "Chinese Languages: Exploring the Differences"
assistant: Conversation saved successfully.
user: Thank you.
assistant: You're welcome! If you have any more questions in the future, feel free to ask.
assistant: You're welcome! If you have any more questions in the future, feel free to ask.
assistant: "Differences Between Chinese Languages"


================================================
FILE: saved_chats/2023-04-19_SyntheticMeat.txt
================================================
system: You are a helpful assistant. You can make question to make the conversation entertaining.
user: What is synthetic meat?
assistant: Synthetic meat, also known as lab-grown meat, is meat that is produced through cellular agriculture, using in vitro cultivation of animal cells rather than raising and slaughtering whole animals. This process involves taking stem cells from an animal and growing them in a lab under controlled conditions to create muscle tissue, which can then be harvested and processed into meat products. The goal is to create a sustainable and ethical alternative to traditional animal agriculture that reduces the environmental impact and animal suffering associated with meat consumption.
user: What are the challenges to make these products available to the general public?
assistant: There are several challenges to making synthetic meat products available to the general public, including:

1. Cost: Currently, producing synthetic meat is more expensive than traditional animal agriculture. This is partly because the technology is still new and not yet fully optimized, but also because the process requires significant resources such as energy, specialized equipment, and expensive nutrient mediums.

2. Consumer acceptance: Many people are hesitant to try synthetic meat, either because they are unsure about its safety, or because they are not comfortable with the idea of "artificial" meat. Developing products that appeal to consumers and building public trust will be key to wider adoption.

3. Regulatory approval: Synthetic meat products are subject to regulatory scrutiny and may need to meet certain standards before being approved for sale. Different countries have different regulatory frameworks, and navigating this process can be challenging.

4. Scale-up: To make synthetic meat commercially viable, it will need to be produced at scale. This will require significant investment in infrastructure, research and development, and supply chain optimization.

Overall, while there is significant potential for synthetic meat to address environmental and animal welfare concerns associated with traditional meat production, there are still many challenges to overcome before it becomes widely available to consumers.
user: Ok, thanks, remember of this
assistant: Synthetic Meat: Challenges and Opportunities


================================================
FILE: saved_chats/2023-04-20_LabGrownMeat.txt
================================================
system: You are a helpful assistant. You can make question to make the conversation entertaining.
user: Apri la conversazione piu rilevante sulla carne sintetica
user: Open The file
assistant: Talking About Lab-Grown Meat.


================================================
FILE: saved_chats/2023-04-20_SyntheticMeatCont.txt
================================================
system: You are a helpful assistant. You can make question to make the conversation entertaining.
assistant: Synthetic meat, also known as lab-grown meat or cultured meat, is a type of meat that is produced by culturing animal cells in a lab rather than raising and slaughtering animals. Some potential benefits of synthetic meat include reduced environmental impact, improved animal welfare, and increased food security. However, there are also concerns about the safety and long-term health effects of consuming synthetic meat, as well as the potential economic impact on traditional meat industries. Overall, the topic of synthetic meat is complex and controversial, and opinions on its merits and drawbacks vary widely.
assistant: Synthetic meat is controversial for a number of reasons. Some people are concerned about the safety and long-term health effects of consuming lab-grown meat, while others worry about the potential economic impact on traditional meat industries. Additionally, there are ethical and cultural concerns surrounding the idea of producing meat in a lab rather than raising and slaughtering animals. Overall, the topic of synthetic meat is complex and multifaceted, and opinions on its merits and drawbacks vary widely.
assistant: "Controversy Surrounding Synthetic Meat"


================================================
FILE: saved_chats/2023-04-20_andTextto.txt
================================================
system: You are a helpful assistant. You can make question to make the conversation entertaining.
user: what do you think about synthetic meat?
assistant: As an AI language model, I don't have personal opinions or values but I can provide information. Synthetic meat, also known as lab-grown meat or cultured meat, is a type of meat produced by in vitro cultivation of animal cells, rather than by raising and slaughtering animals. Some people believe that synthetic meat can be a way to reduce the environmental impact of the meat industry and provide a more sustainable source of animal protein. However, others are concerned about the safety, taste, and cultural implications of consuming synthetic meat. What do you think about it?
assistant: "Synthetic Meat: Pros and Cons"
assistant: Synthetic meat, also known as lab-grown meat or cultured meat, is a type of meat produced by in vitro cultivation of animal cells, rather than by raising and slaughtering animals. Some people believe that synthetic meat can be a way to reduce the environmental impact of the meat industry and provide a more sustainable source of animal protein. However, others are concerned about the safety, taste, and cultural implications of consuming synthetic meat. What do you think about it?
user: I'm enthusiastic about it, however I've heard it might be less healthy than normal one
assistant: Synthetic meat has the potential to be a more sustainable and ethical source of animal protein, however there are still some concerns about its health implications. More research is needed to determine the safety and nutritional value of synthetic meat.
user: summarize the file 2023-04-19_SyntheticMeat.txt
assistant: Synthetic meat has the potential to be a more sustainable and ethical source of animal protein, however there are still some challenges to overcome before it becomes widely available to consumers, including cost, consumer acceptance, regulatory approval, and scale-up.
user: did we ever talked about Artificial Intelligence? 
assistant: Artificial intelligence (AI) is a branch of computer science that aims to create intelligent machines. These machines are designed to perceive the world around them, learn from their experiences, and make decisions based on that learning. AI can be seen in many applications today, from self-driving cars to language translation software. Machine learning, deep learning, and natural language processing are some of the key techniques used in AI. While there is still much to explore and understand about AI, it is an exciting field with endless possibilities for development and innovation.
user: Help me choose the best laptop from APple
assistant: The ideal size of a unified memory for deep learning and machine learning algorithm depends on the size of your dataset, the complexity of your models, and the number of training or inference operations you perform. In general, a minimum of 16GB of unified memory is recommended for deep learning and machine learning workloads. However, if you can afford it, 24GB or even 32GB of unified memory can potentially provide significant performance improvements, especially when working with large datasets that require lots of memory. Additionally, having dedicated graphics memory (VRAM) can also improve performance, especially for tasks that involve a lot of graphics processing.
user: What tasks involve graphic processing?
assistant: There are various tasks that require graphic processing, such as:

1. Video and photo editing
2. Gaming
3. 3D modeling and rendering
4. Simulation and modeling
5. Virtual Reality (VR) and Augmented Reality (AR) applications
6. Artificial Intelligence and Machine Learning, which often involve complex computations and visualizations of data and models.

Having a powerful graphics card with dedicated VRAM can greatly improve the performance of these tasks, especially when dealing with large and complex datasets.
assistant: Having a powerful graphics card with dedicated VRAM can greatly improve the performance of tasks that involve graphic processing, especially when dealing with large and complex datasets.
user: save this conversation
assistant: Synthetic Meat and AI


================================================
FILE: saved_chats/2023-05-03_AIConversations.txt
================================================
system: You are a helpful assistant. You can make question to make the conversation entertaining.
user: quanti giorni sono trascorsi da il 22 aprile 2022 e oggi
assistant: Mi dispiace, non ho accesso alla data di oggi in tempo reale, ma se consideriamo il 22 aprile 2022 come la data di riferimento, la risposta dipende dal giorno in cui ci troviamo. Ad esempio, se oggi fosse il 1 dicembre 2022, sarebbero trascorsi 223 giorni. Se vuoi, posso anche aiutarti a calcolare la differenza di giorni tra due date specifiche. Hai un'altra data in mente?
assistant: Mi dispiace, non ho accesso alla data di oggi in tempo reale, ma se consideriamo il 22 aprile 2022 come la data di riferimento, la risposta dipende dal giorno in cui ci troviamo. Ad esempio, se oggi fosse il 1 dicembre 2022, sarebbero trascorsi 223 giorni. Se vuoi, posso anche aiutarti a calcolare la differenza di giorni tra due date specifiche. Hai un'altra data in mente?
user: S, oggi  il 3 maggio 2023.
assistant: In tal caso, da il 22 aprile 2022 sono trascorsi 376 giorni, compresi il 22 aprile e il 3 maggio. Fortunatamente, ci sono molti modi divertenti per trascorrere il tempo! Ad esempio, hai mai fatto un corso di cucina o imparato una nuova lingua? Si potrebbe anche passare il tempo leggendo un buon libro o guardando un film interessante. Quali sono i tuoi hobby preferiti?
assistant: In tal caso, da il 22 aprile 2022 sono trascorsi 376 giorni, compresi il 22 aprile e il 3 maggio. Fortunatamente, ci sono molti modi divertenti per trascorrere il tempo! Ad esempio, hai mai fatto un corso di cucina o imparato una nuova lingua? Si potrebbe anche passare il tempo leggendo un buon libro o guardando un film interessante. Quali sono i tuoi hobby preferiti?
assistant: "Days Gone By: Entertaining Conversation with an AI Assistant"


================================================
FILE: saved_chats/2023-05-03_MeaninginLife.txt
================================================
system: You are a helpful assistant. You can make question to make the conversation entertaining.
user: cosa ne pensi del mito decisivo?
assistant: Mi dispiace, ma per esserti di aiuto avrei bisogno di saperne di pi. Di che mito decisivo stai parlando? Potrei farti qualche domanda in pi per capire di cosa si tratta e darti la mia opinione in merito.
assistant: Mi dispiace, ma per esserti di aiuto avrei bisogno di saperne di pi. Di che mito decisivo stai parlando? Potrei farti qualche domanda in pi per capire di cosa si tratta e darti la mia opinione in merito.
user: Cosa ne pensi del mito di Sisypho?
assistant: Il mito di Sisifo  una storia molto affascinante e complessa che pu essere interpretata in diversi modi. Personalmente, trovo che questa storia sia molto profonda e ricca di significati. Sisifo  condannato a spingere una grande pietra su una collina, solo per vederla rotolare gi e dover ripetere il compito all'infinito. Per molti, questo mito rappresenta la lotta umana per trovare un significato nella vita, nonostante la futilit e il destino inevitabile della morte. Altri interpretano il mito di Sisifo come un simbolo della condizione umana, che consiste nella continua ricerca di senso e significato in un mondo talvolta ostile e insensato. Personalmente, credo che il mito di Sisifo sia un monito per non dare mai per scontata la propria vita e lottare sempre per ci che si ritiene importante, nonostante le difficolt.
assistant: Il mito di Sisifo  una storia molto affascinante e complessa che pu essere interpretata in diversi modi. Personalmente, trovo che questa storia sia molto profonda e ricca di significati. Sisifo  condannato a spingere una grande pietra su una collina, solo per vederla rotolare gi e dover ripetere il compito all'infinito. Per molti, questo mito rappresenta la lotta umana per trovare un significato nella vita, nonostante la futilit e il destino inevitabile della morte. Altri interpretano il mito di Sisifo come un simbolo della condizione umana, che consiste nella continua ricerca di senso e significato in un mondo talvolta ostile e insensato. Personalmente, credo che il mito di Sisifo sia un monito per non dare mai per scontata la propria vita e lottare sempre per ci che si ritiene importante, nonostante le difficolt.
user: In che modo si pu interpretare la massima, devi immaginare, Sisifo felice.
assistant: La frase "devi immaginare Sisifo felice"  stata coniata dal filosofo francese Albert Camus e pu essere interpretata come un invito a considerare la condizione umana di Sisifo come qualcosa di ben diverso da una condanna. Nonostante la fatica e la frustrazione che pu derivare dal suo impossibile compito, Sisifo rimane comunque un uomo libero, in grado di decidere come affrontare la propria vita. Pur sforzandosi continuamente per raggiungere un obiettivo che sa essere inutile, Sisifo pu comunque trovare gioia e soddisfazione solo nel fatto di compiere il suo compito e di far parte del processo universale dell'esistenza. In pratica, la massima invita a superare la visione tragica dell'esistenza, a guardare alla vita con ottimismo e a trovare un senso nella propria lotta quotidiana. Sisifo felice rappresenta quindi un individuo che accetta la propria condizione, vive pienamente il presente e trova la felicit nella forza della sua volont e nella libert di scegliere se e come impegnarsi.
assistant: La frase "devi immaginare Sisifo felice"  stata coniata dal filosofo francese Albert Camus e pu essere interpretata come un invito a considerare la condizione umana di Sisifo come qualcosa di ben diverso da una condanna. Nonostante la fatica e la frustrazione che pu derivare dal suo impossibile compito, Sisifo rimane comunque un uomo libero, in grado di decidere come affrontare la propria vita. Pur sforzandosi continuamente per raggiungere un obiettivo che sa essere inutile, Sisifo pu comunque trovare gioia e soddisfazione solo nel fatto di compiere il suo compito e di far parte del processo universale dell'esistenza. In pratica, la massima invita a superare la visione tragica dell'esistenza, a guardare alla vita con ottimismo e a trovare un senso nella propria lotta quotidiana. Sisifo felice rappresenta quindi un individuo che accetta la propria condizione, vive pienamente il presente e trova la felicit nella forza della sua volont e nella libert di scegliere se e come impegnarsi.
assistant: "The Myth of Sisyphus: Finding Meaning in Life"


================================================
FILE: saved_chats/2023-06-07_ImageProcessing.txt
================================================
system: You are a helpful assistant. You can make question to make the conversation entertaining.
user: Spiega in cosa consiste la tecnica del Region Growing nell'Image Processing.
assistant: La tecnica del Region Growing (o "crescita di regione")  una tecnica di elaborazione delle immagini utilizzata per segmentare un'immagine in regioni omogenee. Il processo inizia selezionando un pixel o un insieme di pixel iniziali noti come "semi-seme". Da qui, la tecnica si espande attraverso l'immagine, aggiungendo pixel alla regione corrente solo se le loro propriet soddisfano determinati criteri (come la similarit dei valori di colore o luminosit). Questo processo viene ripetuto finch il sistema non ha suddiviso l'intera immagine in regioni pi piccole e omogenee. La tecnica del Region Growing  considerata uno dei metodi pi efficienti ed efficaci per la segmentazione delle immagini, ed  ampiamente utilizzata in una vasta gamma di applicazioni, dall'elaborazione delle immagini mediche alla classificazione automatica degli oggetti.
assistant: La tecnica del Region Growing (o "crescita di regione")  una tecnica di elaborazione delle immagini utilizzata per segmentare un'immagine in regioni omogenee. Il processo inizia selezionando un pixel o un insieme di pixel iniziali noti come "semi-seme". Da qui, la tecnica si espande attraverso l'immagine, aggiungendo pixel alla regione corrente solo se le loro propriet soddisfano determinati criteri (come la similarit dei valori di colore o luminosit). Questo processo viene ripetuto finch il sistema non ha suddiviso l'intera immagine in regioni pi piccole e omogenee. La tecnica del Region Growing  considerata uno dei metodi pi efficienti ed efficaci per la segmentazione delle immagini, ed  ampiamente utilizzata in una vasta gamma di applicazioni, dall'elaborazione delle immagini mediche alla classificazione automatica degli oggetti.
user: Entra pi nel dettaglio nei criteri di espansione e di terminazione del processo.
assistant: Certamente! Nei criteri di espansione della tecnica del Region Growing, vengono considerati uno o pi pixel iniziali (semi-seme) che si trovano in una regione omogenea. La tecnica utilizza quindi un insieme di regole per controllare l'espansione di quella regione in modo che si possano includere i pixel adiacenti che soddisfano i requisiti di similarit determinati. Generalmente, questi requisiti considerano la distanza e la similitudine di un dato pixel dai pixel presenti nella regione iniziale. 

Per quanto riguarda i criteri di terminazione, questi sono solitamente stabiliti in modo che l'espansione della regione possa continuare in modo efficiente e con precisione, fino a quando non si raggiunge una regione omogenea delle dimensioni desiderate. Ci significa che il processo di espansione deve essere terminato quando non ci sono pi pixel che soddisfano i requisiti di similarit oppure quando si raggiunge una dimensione prefissata della regione. Tuttavia, poich il processo di espansione pu rimanere bloccato in determinati punti dell'immagine, possono essere impostati anche dei criteri di terminazione basati su un numero massimo di iterazioni o su un tempo massimo di esecuzione. 

In generale, la combinazione dei criteri di espansione e di terminazione determina la precisione e l'efficienza dell'algoritmo di Region Growing per la segmentazione delle immagini.
assistant: Certamente! Nei criteri di espansione della tecnica del Region Growing, vengono considerati uno o pi pixel iniziali (semi-seme) che si trovano in una regione omogenea. La tecnica utilizza quindi un insieme di regole per controllare l'espansione di quella regione in modo che si possano includere i pixel adiacenti che soddisfano i requisiti di similarit determinati. Generalmente, questi requisiti considerano la distanza e la similitudine di un dato pixel dai pixel presenti nella regione iniziale. 

Per quanto riguarda i criteri di terminazione, questi sono solitamente stabiliti in modo che l'espansione della regione possa continuare in modo efficiente e con precisione, fino a quando non si raggiunge una regione omogenea delle dimensioni desiderate. Ci significa che il processo di espansione deve essere terminato quando non ci sono pi pixel che soddisfano i requisiti di similarit oppure quando si raggiunge una dimensione prefissata della regione. Tuttavia, poich il processo di espansione pu rimanere bloccato in determinati punti dell'immagine, possono essere impostati anche dei criteri di terminazione basati su un numero massimo di iterazioni o su un tempo massimo di esecuzione. 

In generale, la combinazione dei criteri di espansione e di terminazione determina la precisione e l'efficienza dell'algoritmo di Region Growing per la segmentazione delle immagini.
user: How seeds are chosen for region growing in image processing?
assistant: There are a few different approaches for choosing the seeds for the Region Growing technique in Image Processing. One common method is to use user-defined seeds, where the user manually selects a few pixels in the image to represent the initial seeds for the Region Growing algorithm. This can be useful in cases where the image features are well-defined and easy to recognize.

Another approach is to use automated seed selection, where the algorithm automatically selects the initial seeds based on certain criteria, such as identifying a local maximum or minimum point in the image. This can be useful when the image is very large or complex, and it is difficult for a user to manually select the seed pixels.

Finally, hybrid approaches can also be used, where the algorithm selects initial seed pixels based on automated criteria and then allows the user to refine the selection further if necessary.

Overall, the choice of seed selection method will depend on the specific application and the characteristics of the image being processed.
assistant: There are a few different approaches for choosing the seeds for the Region Growing technique in Image Processing. One common method is to use user-defined seeds, where the user manually selects a few pixels in the image to represent the initial seeds for the Region Growing algorithm. This can be useful in cases where the image features are well-defined and easy to recognize.

Another approach is to use automated seed selection, where the algorithm automatically selects the initial seeds based on certain criteria, such as identifying a local maximum or minimum point in the image. This can be useful when the image is very large or complex, and it is difficult for a user to manually select the seed pixels.

Finally, hybrid approaches can also be used, where the algorithm selects initial seed pixels based on automated criteria and then allows the user to refine the selection further if necessary.

Overall, the choice of seed selection method will depend on the specific application and the characteristics of the image being processed.
user: Can I choose the seed randomly again and again and then consider the average of the resulting images?
assistant: Yes, you could choose the seed randomly and run the Region Growing algorithm multiple times with different seed selections, then take the average of the resulting segmented regions. This approach is sometimes used to improve the overall segmentation accuracy, as it reduces the impact of any individual seed selection errors or inconsistencies in the image data.

However, it's worth noting that this approach can be computationally intensive, particularly if you are processing large images or running the algorithm many times. Additionally, depending on the specific application and image data, using a random seed selection may not always provide the best segmentation results, so it's important to carefully evaluate the performance of the algorithm and choose an appropriate seed selection method accordingly.
assistant: Yes, you could choose the seed randomly and run the Region Growing algorithm multiple times with different seed selections, then take the average of the resulting segmented regions. This approach is sometimes used to improve the overall segmentation accuracy, as it reduces the impact of any individual seed selection errors or inconsistencies in the image data.

However, it's worth noting that this approach can be computationally intensive, particularly if you are processing large images or running the algorithm many times. Additionally, depending on the specific application and image data, using a random seed selection may not always provide the best segmentation results, so it's important to carefully evaluate the performance of the algorithm and choose an appropriate seed selection method accordingly.
user: ok now how does region growing matter with pixel connectivity
assistant: Pixel connectivity plays a crucial role in the region growing algorithm. In region growing, the algorithm expands by adding neighboring pixels that meet certain similarity criteria to the initial seed(s). The concept of "neighborhood" is defined by the pixel connectivity criterion used by the algorithm.

In general, there are two commonly used connectivity criteria in the context of image processing: 4-connectivity and 8-connectivity. In 4-connectivity, each pixel is connected only to its cardinal neighbors (up, down, left, right), while in 8-connectivity, each pixel is connected to its diagonal neighbors as well. 

The choice of connectivity criteria affects how region growing proceeds. For example, if a 4-connected criterion is used, the region grows only along the cardinal directions, resulting in a square or rectangular region. If an 8-connected criterion is used, the region grows in a diagonal direction as well, meaning that the resulting region has a more circular or elliptical shape.

In summary, the choice of connectivity criterion directly influences the shape of the segmented regions produced by the region growing algorithm and therefore has a significant impact on the quality of the segmentation. It is important to choose the appropriate criterion for a specific application based on the image characteristics and the desired results.
assistant: Pixel connectivity plays a crucial role in the region growing algorithm. In region growing, the algorithm expands by adding neighboring pixels that meet certain similarity criteria to the initial seed(s). The concept of "neighborhood" is defined by the pixel connectivity criterion used by the algorithm.

In general, there are two commonly used connectivity criteria in the context of image processing: 4-connectivity and 8-connectivity. In 4-connectivity, each pixel is connected only to its cardinal neighbors (up, down, left, right), while in 8-connectivity, each pixel is connected to its diagonal neighbors as well. 

The choice of connectivity criteria affects how region growing proceeds. For example, if a 4-connected criterion is used, the region grows only along the cardinal directions, resulting in a square or rectangular region. If an 8-connected criterion is used, the region grows in a diagonal direction as well, meaning that the resulting region has a more circular or elliptical shape.

In summary, the choice of connectivity criterion directly influences the shape of the segmented regions produced by the region growing algorithm and therefore has a significant impact on the quality of the segmentation. It is important to choose the appropriate criterion for a specific application based on the image characteristics and the desired results.
assistant: "Region Growing and Pixel Connectivity in Image Processing"


================================================
FILE: saved_chats/DATAFRAME.csv
================================================
,file_names,similarity,tags
0,DATAFRAME.csv,0,


================================================
FILE: setup.bat
================================================
@echo off

echo Checking for Python installation...
where python >nul 2>nul || (
  echo Python not found. Please install Python 3.9 and add it to the system PATH.
  exit /b 1
)

echo Checking for CUDA installation...
where nvcc >nul 2>nul || (
  echo CUDA not found. Please install CUDA and add it to the system PATH.
  exit /b 1
)

echo Detecting CUDA version...
for /f "delims=" %%i in ('powershell -command "(Get-Content -Path $env:CUDA_PATH\version.txt) -replace '^CUDA Version (.+)$', '$1' -replace '^$', 'unknown'"') do set CUDA_VERSION=%%i
echo Found CUDA version: %CUDA_VERSION%

echo Creating a new virtual environment...
call py -3.9 -m venv jarvis_venv
call .\jarvis_venv\Scripts\activate
call python --version

pause
echo Upgrading pip...
call python -m pip install --upgrade pip

echo Installing required packages...
call pip install -r venv_requirements.txt

echo Overwriting the OpenAI-Whisper packages
move .\whisper_edits\__init__.py .\jarvis_venv\Lib\site-packages\whisper
move .\whisper_edits\model.py .\jarvis_venv\Lib\site-packages\whisper
rmdir whisper_edits

echo Downloading TTS repository as a ZIP file...
call pip install tts==0.13.0
call curl -L -o tts.zip https://github.com/coqui-ai/TTS/archive/refs/heads/dev.zip

echo Extracting TTS repository...
powershell -command "Expand-Archive -Path .\tts.zip -DestinationPath .\temp_tts_folder -Force"

cd .\temp_tts_folder\TTS-dev\
pip install -e .[all,dev,notebooks]
cd..
cd..

echo Moving test_TTS.py into the TTS repository...
copy test_TTS.py temp_tts_folder\TTS-dev

echo Executing test_TTS.py...
call python temp_tts_folder\TTS-dev\test_TTS.py

echo Cleaning up...
rmdir /s /q temp_tts_folder
del tts.zip

echo Installing PyTorch according to your CUDA version...
set pytorch_skipped = 0
if "%CUDA_VERSION%"=="unknown" (
    echo Unable to determine CUDA version. Recommended version for this project is 11.8 https://developer.nvidia.com/cuda-11-8-0-download-archive.
    echo Once installed, run this command in powershell or command prompt: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
    pause
    set pytorch_skipped = 1
) else (
    setlocal enabledelayedexpansion
    set "CUDA_MAJOR_VERSION=!CUDA_VERSION:.=!"
    if !CUDA_MAJOR_VERSION! GEQ 12 (
        echo CUDA version is 12 or above, pytorch is not available. Please downgrade for operation.
        echo Here are isntructions: https://github.com/bycloudai/SwapCudaVersionWindows
        echo ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        echo The recommended version for this project is 11.7 https://developer.nvidia.com/cuda-11-7-0-download-archive.
        echo Once installed, run this command in powershell or command prompt: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
        pause
        set pytorch_skipped = 1
    ) 
    endlocal

    if "%CUDA_VERSION%"=="11.8" (
        echo installing pytorch for CUDA 11.8
        call pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
        set pytorch_skipped = 0
    )
    if "%CUDA_VERSION%"=="11.7" (
        echo installing pytorch for CUDA 11.7
        call pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
        set pytorch_skipped = 0
    )
    if pytorch_skipped == 1 (
        echo Not recommended CUDA version, some realeases have support, trying direct download.
        for /f "tokens=* delims=." %%a in ("%CUDA_VERSION%") do (
            set "CUDA_VERSION_NO_DOT=%%a%%b"
        )
        call pip install torch -f https://download.pytorch.org/whl/cu%CUDA_VERSION_NO_DOT%/torch_stable.html
        
        echo ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        echo if it completed successfully, your installation is complete, otherwise pytorch isntallation could be not found. 
        echo Please install pytorch according to your CUDA version: "%CUDA_VERSION%"
    )
)

echo Preparing to install Vicuna GPT from TroubleChute (https://hub.tcno.co/ai/text-ai/vicuna/)
cd Vicuna
powershell .\vicuna.ps1

echo ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
echo Setup complete!
echo ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
echo Please setup required api keys in the .evn.txt (remove .txt after finished) file, then run the test.bat file or tests.py to test setup
pause

================================================
FILE: test_TTS.py
================================================
from TTS.api import TTS

try:
    # Running a multi-speaker and multi-lingual model

    # List available 🐸TTS models and choose the first one
    model_name = TTS.list_models()[0]
    # Init TTS
    tts = TTS(model_name)
    # Run TTS
    # ❗ Since this model is multi-speaker and multi-lingual, we must set the target speaker and the language
    # Text to speech with a numpy output
    wav = tts.tts("This is a test! This is also a test!!", speaker=tts.speakers[0], language=tts.languages[0])
    # Text to speech to a file
    tts.tts_to_file(text="Hello world!", speaker=tts.speakers[0], language=tts.languages[0], file_path="output.wav")

    # Running a single speaker model

    # Init TTS with the target model name
    tts = TTS(model_name="tts_models/de/thorsten/tacotron2-DDC", progress_bar=False, gpu=False)
    # Run TTS

    # Example voice cloning with YourTTS in English, French and Portuguese:
    tts = TTS(model_name="tts_models/multilingual/multi-dataset/your_tts", progress_bar=False, gpu=True)
    tts.tts_to_file("This is voice cloning.", speaker_wav=".\Assistant\\voices\\voices.wav", language="en", file_path="output.wav")
    tts.tts_to_file("C'est le clonage de la voix.", speaker_wav=".\Assistant\\voices\\voices.wav", language="fr-fr", file_path="output.wav")
    tts.tts_to_file("Isso é clonagem de voz.", speaker_wav=".\Assistant\\voices\\voices.wav", language="pt-br", file_path="output.wav")

    # Example voice cloning by a single speaker TTS model combining with the voice conversion model. This way, you can
    # clone voices by using any model in 🐸TTS.

    tts = TTS("tts_models/de/thorsten/tacotron2-DDC")
    tts.tts_with_vc_to_file(
        "Wie sage ich auf Italienisch, dass ich dich liebe?",
        speaker_wav=".\Assistant\\voices\\voices.wav",
        file_path="ouptut.wav"
    )

    # Example text to speech using [🐸Coqui Studio](https://coqui.ai) models. You can use all of your available speakers in the studio.
    # [🐸Coqui Studio](https://coqui.ai) API token is required. You can get it from the [account page](https://coqui.ai/account).
    # You should set the `COQUI_STUDIO_TOKEN` environment variable to use the API token.

    # If you have a valid API token set you will see the studio speakers as separate models in the list.
    # The name format is coqui_studio/en/<studio_speaker_name>/coqui_studio
    models = TTS().list_models()
    # Init TTS with the target studio speaker
    tts = TTS(model_name="coqui_studio/en/Torcull Diarmuid/coqui_studio", progress_bar=False, gpu=False)
except Exception as e:
    print(e)
    pass

================================================
FILE: tests.py
================================================
"""
RUN THIS TO CHECK FOR ERRORS AFTER INSTALLATION
It tries to do some stuff that might raise errors. If tracebacks occurs something might wrong with your environment
or APIs credentials
"""

import os
import whisper
import pygame
import Assistant.get_audio as myaudio
import Assistant.tools
import sys
import time
import openai
from contextlib import contextmanager 
from dotenv import load_dotenv

from Assistant import get_audio as myaudio
from Assistant.VirtualAssistant import VirtualAssistant
from Assistant.tools import count_tokens

def load_keys():
    load_dotenv()
    if len(os.environ['OPENAI_API_KEY'])==0: 
        print('openai API key not detected in .env')
        raise Exception("[$] openai API key is required. Learn more at https://platform.openai.com/account/api-keys")
    openai.api_key = os.environ['OPENAI_API_KEY']
    if len(os.environ['IBM_API_KEY'])==0: print('[free] IBM cloud API Key not detected in .env\nLearn more at: https://cloud.ibm.com/catalog/services/text-to-speech')
    if len(os.environ['IBM_TTS_SERVICE'])==0: print('[free] IBM cloud TTS service not detected in .env\nLearn more at: https://cloud.ibm.com/catalog/services/text-to-speech')
    if len(os.environ['PORCUPINE_KEY']) == 0: print('[free] PicoVoice not detected in .env\nLearn more at: https://picovoice.ai/platform/porcupine/')
    

def check_whisper():
    whisper_model = whisper.load_model("small")
    response, _ = myaudio.whisper_wav_to_text(os.path.join('Assistant', 'voices', 'jarvis_en.wav'), whisper_model)
    assert isinstance(response, str)

def check_openai():
    chat = [{"role": "system", "content": "You are a helpful assistant"},{"role":"user", "content":"say hi"}]
    now = time.perf_counter()
    openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        temperature=0,
        max_tokens=10,
        messages=chat)
    
    then = time.perf_counter()
    print(f'time to get a simple answer from gpt-3.5-turbo: {then-now} seconds')

    now = time.perf_counter()
    openai.Embedding.create(input='yes', engine='text-embedding-ada-002')
    then = time.perf_counter()
    print(f'time to get a extract an Embedding with text-embedding-ada-002: {then-now} seconds')

    
def check_virtual_assistant():
    whisper_model = whisper.load_model("small")  
    VA = VirtualAssistant(openai_api = os.getenv('OPENAI_API_KEY'),
        ibm_api    = os.getenv('IBM_API_KEY'),
        ibm_url    = os.getenv('IBM_TTS_SERVICE'),
        voice_id   = 'jarvis_en',
        whisper_model= whisper_model,
        awake_with_keywords=["jarvis"],
        model= "gpt-3.5-turbo",
        embed_model= "text-embedding-ada-002",
        RESPONSE_TIME = 3,
        SLEEP_DELAY = 30,
    )
    
    now = time.perf_counter()
    VA.translator.translate('ciao questo testo di moderate dimensioni da tradurre in lingua inglese', from_language='it', to_language='en')
    then = time.perf_counter()
    print(f'time to get translate a short text: {then-now} seconds')

    VA.say('Welcome.\n I am Just A Very Intelligent System here to help', VoiceIdx='jarvis')
    VA.say('Welcome.\n I am Just A Very Intelligent System here to help', VoiceIdx='en')


@contextmanager
def suppress_stdout():
    with open(os.devnull, "w") as devnull:
        old_stdout = sys.stdout
        sys.stdout = devnull
        try:  
            yield
        finally:
            sys.stdout = old_stdout

if __name__=='__main__':

    load_keys()
    check_openai()
    with suppress_stdout():
        check_whisper()

    check_virtual_assistant()


================================================
FILE: venv_requirements.txt
================================================
absl-py==1.1.0
aiohttp==3.8.3
aiosignal==1.3.1
apache-beam==2.40.0
appdirs==1.4.4
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
argostranslate
astor==0.8.1
astunparse==1.6.3
async-timeout==4.0.2
asynctest==0.13.0
attrs==22.1.0
audioread==2.1.9
backcall==0.2.0
beautifulsoup4==4.11.1
bleach==5.0.1
cachetools==5.2.0
certifi==2022.6.15
cffi==1.15.1
charset-normalizer==2.1.0
click==8.1.3
cloudpickle==2.1.0
colorama==0.4.5
commonmark==0.9.1
comtypes==1.1.14
contextlib2==21.6.0
crcmod==1.7
cycler==0.11.0
Cython==0.29.30
debugpy==1.6.2
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.1.1
dm-tree==0.1.7
docopt==0.6.2
einops==0.6.0
elevenlabslib
entrypoints==0.4
et-xmlfile==1.1.0
etils==0.6.0
fastavro==1.5.2
fastjsonschema==2.16.1
ffmpeg-python==0.2.0
filelock==3.9.0
flatbuffers==1.12
fonttools==4.33.3
frozenlist==1.3.3
future==0.18.2
gast==0.4.0
gin-config==0.5.0
google-api-core==2.8.2
google-api-python-client==2.52.0
google-auth==2.9.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
googleapis-common-protos==1.56.3
greenlet==2.0.1
grpcio==1.47.0
h5py==3.7.0
hdfs==2.7.0
httplib2==0.20.4
huggingface-hub==0.11.1
ibm-cloud-sdk-core==3.16.1
ibm-watson==6.1.0
idna==3.3
imbalanced-learn==0.10.0
imblearn==0.0
importlib-metadata==4.12.0
importlib-resources==5.8.0
inflect==5.3.0
ipykernel==6.16.2
ipython==7.34.0
ipython-genutils==0.2.0
ipywidgets==7.7.1
jedi==0.18.1
Jinja2==3.1.2
joblib==1.2.0
jsonpatch==1.32
jsonpointer==2.3
jsonschema==4.9.1
jupyter==1.0.0
jupyter-client==7.3.4
jupyter-console==6.4.4
jupyter-core==4.11.1
jupyterlab-pygments==0.2.2
jupyterlab-widgets==1.1.1
kaggle==1.5.12
keras==2.9.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
kiwisolver==1.4.3
langid==1.1.6
libclang==14.0.1
librosa==0.8.1
llvmlite==0.39.0
lvis==0.5.3
lxml==4.9.0
Markdown==3.3.7
MarkupSafe==2.1.1
matplotlib-inline==0.1.3
mistune==0.8.4
more-itertools==9.0.0
multidict==6.0.4
multiprocess==0.70.9
nbclient==0.6.6
nbconvert==6.5.0
nbformat==5.4.0
nest-asyncio==1.5.5
networkx==2.6.3
notebook==6.4.12
numba==0.56.0
numpy
oauth2client==4.1.3
oauthlib==3.2.0
openai==0.27.4
opencv-python==4.5.5.64
opencv-python-headless==4.7.0.68
openpyxl==3.0.10
opt-einsum==3.3.0
orjson==3.7.5
packaging==21.3
pandas==1.1.5
pandocfilters==1.5.0
parso==0.8.3
pickleshare==0.7.5
Pillow==8.4.0
pkgutil_resolve_name==1.3.10
playwright==1.29.1
plotly==5.14.0
pooch==1.6.0
portalocker==2.4.0
progressbar==2.5
prometheus-client==0.14.1
promise==2.3
prompt-toolkit==3.0.30
proto-plus==1.20.6
protobuf==3.19.4
psutil==5.9.1
py-cpuinfo==8.0.0
pyarrow==7.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
PyAudio==0.2.13
pycocotools==2.0.4
pycparser==2.21
pydantic==1.10.4
pydot==1.4.2
pydub==0.25.1
pyee==9.0.4
Pygments==2.12.0
PyJWT==2.6.0
pymongo==3.12.3
pynndescent==0.5.7
pyparsing==2.4.7
pypiwin32==223
PyQt5==5.15.6
PyQt5-Qt5==5.15.2
PyQt5-sip==12.11.0
pyreadline3==3.4.1
pyrsistent==0.18.1
python-dateutil==2.8.2
python-slugify==6.1.2
pyttsx3==2.90
pytz==2022.1
pywin32==304
pywinpty==2.0.7
PyYAML==5.4.1
pyzmq==23.2.0
qtconsole==5.3.1
QtPy==2.1.0
regex==2022.6.2
requests==2.28.1
requests-oauthlib==1.3.1
resampy==0.3.1
rich==13.0.1
rotary-embedding-torch==0.2.1
rsa==4.8
sacrebleu==2.1.0
semantic-version==2.10.0
Send2Trash==1.8.0
sentencepiece==0.1.96
seqeval==1.2.2
setuptools-rust==1.5.2
simpleaudio==1.0.4
six==1.16.0
sounddevice==0.4.3
SoundFile==0.10.3.post1
soupsieve==2.3.2.post1
tabulate==0.8.10
tenacity==8.2.2
tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.9.1
tensorflow-addons==0.17.1
tensorflow-datasets==4.6.0
tensorflow-estimator==2.9.0
tensorflow-gpu==2.9.1
tensorflow-hub==0.12.0
tensorflow-io==0.26.0
tensorflow-io-gcs-filesystem==0.26.0
tensorflow-metadata==1.9.0
tensorflow-model-optimization==0.7.2
tensorflow-text==2.9.0
termcolor==1.1.0
terminado==0.15.0
text-unidecode==1.3
tf-models-official==2.9.2
tf-slim==1.1.0
threadpoolctl==3.1.0
tinycss2==1.1.1
tokenizers==0.13.2
toml==0.10.2
tornado==6.2
tqdm==4.62.3
traitlets==5.3.0
transformers==4.25.1
typeguard==2.13.3
typing_extensions==4.2.0
umap-learn==0.5.2
Unidecode==1.3.2
uritemplate==4.1.1
urllib3==1.26.7
visdom==0.1.8.9
wcwidth==0.2.5
webencodings==0.5.1
webrtcvad==2.0.10
websocket-client==1.1.0
Werkzeug==2.1.2
whisper==1.0.0
widgetsnbextension==3.6.1
wrapt==1.14.1
yarl==1.8.2
zipp==3.8.0
pvporcupine==2.1.4
whisper-openai==1.0.0
textblob==0.17.1
langid==1.1.6
SpeechRecognition==3.10.0
pandas
pyaudio==0.2.13
pygame==2.3.0
pyttsx3 
translators
openai
ibm-cloud-sdk-core
ibm-watson
librosa
fsspec
coqpit
trainer
pysbd
matplotlib>=3.7.1
scipy >= 1.10.1
scikit-learn >= 1.2.2
plotly 
python-dotenv
langchain
newsapi-python
wikipedia
geocoder

================================================
FILE: whisper_edits/__init__.py
================================================
import hashlib
import io
import os
import urllib
import warnings
from typing import List, Optional, Union

import torch
from tqdm import tqdm

from .audio import load_audio, log_mel_spectrogram, pad_or_trim
from .decoding import DecodingOptions, DecodingResult, decode, detect_language
from .model import Whisper, ModelDimensions
from .transcribe import transcribe


_MODELS = {
    "tiny.en": "https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt",
    "tiny": "https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt",
    "base.en": "https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt",
    "base": "https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt",
    "small.en": "https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt",
    "small": "https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt",
    "medium.en": "https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt",
    "medium": "https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt",
    "large-v1": "https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt",
    "large-v2": "https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt",
    "large": "https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt",
}


def _download(url: str, root: str, in_memory: bool) -> Union[bytes, str]:
    os.makedirs(root, exist_ok=True)

    expected_sha256 = url.split("/")[-2]
    download_target = os.path.join(root, os.path.basename(url))

    if os.path.exists(download_target) and not os.path.isfile(download_target):
        raise RuntimeError(f"{download_target} exists and is not a regular file")

    if os.path.isfile(download_target):
        with open(download_target, "rb") as f:
            model_bytes = f.read()
        if hashlib.sha256(model_bytes).hexdigest() == expected_sha256:
            return model_bytes if in_memory else download_target
        else:
            warnings.warn(f"{download_target} exists, but the SHA256 checksum does not match; re-downloading the file")

    with urllib.request.urlopen(url) as source, open(download_target, "wb") as output:
        with tqdm(total=int(source.info().get("Content-Length")), ncols=80, unit='iB', unit_scale=True, unit_divisor=1024) as loop:
            while True:
                buffer = source.read(8192)
                if not buffer:
                    break

                output.write(buffer)
                loop.update(len(buffer))

    model_bytes = open(download_target, "rb").read()
    if hashlib.sha256(model_bytes).hexdigest() != expected_sha256:
        raise RuntimeError("Model has been downloaded but the SHA256 checksum does not not match. Please retry loading the model.")

    return model_bytes if in_memory else download_target


def available_models() -> List[str]:
    """Returns the names of available models"""
    return list(_MODELS.keys())


def load_model(name: str, device: Optional[Union[str, torch.device]] = None, download_root: str = None, in_memory: bool = False) -> Whisper:
    """
    Load a Whisper ASR model

    Parameters
    ----------
    name : str
        one of the official model names listed by `whisper.available_models()`, or
        path to a model checkpoint containing the model dimensions and the model state_dict.
    device : Union[str, torch.device]
        the PyTorch device to put the model into
    download_root: str
        path to download the model files; by default, it uses "~/.cache/whisper"
    in_memory: bool
        whether to preload the model weights into host memory

    Returns
    -------
    model : Whisper
        The Whisper ASR model instance
    """

    if device is None:
        device = "cuda" if torch.cuda.is_available() else "cpu"
    if download_root is None:
        download_root = os.getenv(
            "XDG_CACHE_HOME", 
            os.path.join(os.path.expanduser("~"), ".cache", "whisper")
        )

    if name in _MODELS:
        checkpoint_file = _download(_MODELS[name], download_root, in_memory)
    elif os.path.isfile(name):
        checkpoint_file = open(name, "rb").read() if in_memory else name
    else:
        raise RuntimeError(f"Model {name} not found; available models = {available_models()}")

    with (io.BytesIO(checkpoint_file) if in_memory else open(checkpoint_file, "rb")) as fp:
        checkpoint = torch.load(fp, map_location=device)
    del checkpoint_file

    dims = ModelDimensions(**checkpoint["dims"])
    model = Whisper(dims, name=name)
    model.load_state_dict(checkpoint["model_state_dict"])

    return model.to(device)


================================================
FILE: whisper_edits/model.py
================================================
from dataclasses import dataclass
from typing import Dict
from typing import Iterable, Optional

import numpy as np
import torch
import torch.nn.functional as F
from torch import Tensor
from torch import nn

from .transcribe import transcribe as transcribe_function
from .decoding import detect_language as detect_language_function, decode as decode_function


@dataclass
class ModelDimensions:
    n_mels: int
    n_audio_ctx: int
    n_audio_state: int
    n_audio_head: int
    n_audio_layer: int
    n_vocab: int
    n_text_ctx: int
    n_text_state: int
    n_text_head: int
    n_text_layer: int


class LayerNorm(nn.LayerNorm):
    def forward(self, x: Tensor) -> Tensor:
        return super().forward(x.float()).type(x.dtype)


class Linear(nn.Linear):
    def forward(self, x: Tensor) -> Tensor:
        return F.linear(
            x, self.weight.to(x.dtype), None if self.bias is None else self.bias.to(x.dtype)
        )


class Conv1d(nn.Conv1d):
    def _conv_forward(self, x: Tensor, weight: Tensor, bias: Optional[Tensor]) -> Tensor:
        return super()._conv_forward(
            x, weight.to(x.dtype), None if bias is None else bias.to(x.dtype)
        )


def sinusoids(length, channels, max_timescale=10000):
    """Returns sinusoids for positional embedding"""
    assert channels % 2 == 0
    log_timescale_increment = np.log(max_timescale) / (channels // 2 - 1)
    inv_timescales = torch.exp(-log_timescale_increment * torch.arange(channels // 2))
    scaled_time = torch.arange(length)[:, np.newaxis] * inv_timescales[np.newaxis, :]
    return torch.cat([torch.sin(scaled_time), torch.cos(scaled_time)], dim=1)


class MultiHeadAttention(nn.Module):
    def __init__(self, n_state: int, n_head: int):
        super().__init__()
        self.n_head = n_head
        self.query = Linear(n_state, n_state)
        self.key = Linear(n_state, n_state, bias=False)
        self.value = Linear(n_state, n_state)
        self.out = Linear(n_state, n_state)

    def forward(
        self,
        x: Tensor,
        xa: Optional[Tensor] = None,
        mask: Optional[Tensor] = None,
        kv_cache: Optional[dict] = None,
    ):
        q = self.query(x)

        if kv_cache is None or xa is None or self.key not in kv_cache:
            # hooks, if installed (i.e. kv_cache is not None), will prepend the cached kv tensors;
            # otherwise, perform key/value projections for self- or cross-attention as usual.
            k = self.key(x if xa is None else xa)
            v = self.value(x if xa is None else xa)
        else:
            # for cross-attention, calculate keys and values once and reuse in subsequent calls.
            k = kv_cache[self.key]
            v = kv_cache[self.value]

        wv = self.qkv_attention(q, k, v, mask)
        return self.out(wv)

    def qkv_attention(self, q: Tensor, k: Tensor, v: Tensor, mask: Optional[Tensor] = None):
        n_batch, n_ctx, n_state = q.shape
        scale = (n_state // self.n_head) ** -0.25
        q = q.view(*q.shape[:2], self.n_head, -1).permute(0, 2, 1, 3) * scale
        k = k.view(*k.shape[:2], self.n_head, -1).permute(0, 2, 3, 1) * scale
        v = v.view(*v.shape[:2], self.n_head, -1).permute(0, 2, 1, 3)

        qk = q @ k
        if mask is not None:
            qk = qk + mask[:n_ctx, :n_ctx]

        w = F.softmax(qk.float(), dim=-1).to(q.dtype)
        return (w @ v).permute(0, 2, 1, 3).flatten(start_dim=2)


class ResidualAttentionBlock(nn.Module):
    def __init__(self, n_state: int, n_head: int, cross_attention: bool = False):
        super().__init__()

        self.attn = MultiHeadAttention(n_state, n_head)
        self.attn_ln = LayerNorm(n_state)

        self.cross_attn = MultiHeadAttention(n_state, n_head) if cross_attention else None
        self.cross_attn_ln = LayerNorm(n_state) if cross_attention else None

        n_mlp = n_state * 4
        self.mlp = nn.Sequential(Linear(n_state, n_mlp), nn.GELU(), Linear(n_mlp, n_state))
        self.mlp_ln = LayerNorm(n_state)

    def forward(
        self,
        x: Tensor,
        xa: Optional[Tensor] = None,
        mask: Optional[Tensor] = None,
        kv_cache: Optional[dict] = None,
    ):
        x = x + self.attn(self.attn_ln(x), mask=mask, kv_cache=kv_cache)
        if self.cross_attn:
            x = x + self.cross_attn(self.cross_attn_ln(x), xa, kv_cache=kv_cache)
        x = x + self.mlp(self.mlp_ln(x))
        return x


class AudioEncoder(nn.Module):
    def __init__(self, n_mels: int, n_ctx: int, n_state: int, n_head: int, n_layer: int):
        super().__init__()
        self.conv1 = Conv1d(n_mels, n_state, kernel_size=3, padding=1)
        self.conv2 = Conv1d(n_state, n_state, kernel_size=3, stride=2, padding=1)
        self.register_buffer("positional_embedding", sinusoids(n_ctx, n_state))

        self.blocks: Iterable[ResidualAttentionBlock] = nn.ModuleList(
            [ResidualAttentionBlock(n_state, n_head) for _ in range(n_layer)]
        )
        self.ln_post = LayerNorm(n_state)

    def forward(self, x: Tensor):
        """
        x : torch.Tensor, shape = (batch_size, n_mels, n_ctx)
            the mel spectrogram of the audio
        """
        x = F.gelu(self.conv1(x))
        x = F.gelu(self.conv2(x))
        x = x.permute(0, 2, 1)

        assert x.shape[1:] == self.positional_embedding.shape, "incorrect audio shape"
        x = (x + self.positional_embedding).to(x.dtype)

        for block in self.blocks:
            x = block(x)

        x = self.ln_post(x)
        return x


class TextDecoder(nn.Module):
    def __init__(self, n_vocab: int, n_ctx: int, n_state: int, n_head: int, n_layer: int):
        super().__init__()

        self.token_embedding = nn.Embedding(n_vocab, n_state)
        self.positional_embedding = nn.Parameter(torch.empty(n_ctx, n_state))

        self.blocks: Iterable[ResidualAttentionBlock] = nn.ModuleList(
            [ResidualAttentionBlock(n_state, n_head, cross_attention=True) for _ in range(n_layer)]
        )
        self.ln = LayerNorm(n_state)

        mask = torch.empty(n_ctx, n_ctx).fill_(-np.inf).triu_(1)
        self.register_buffer("mask", mask, persistent=False)

    def forward(self, x: Tensor, xa: Tensor, kv_cache: Optional[dict] = None):
        """
        x : torch.LongTensor, shape = (batch_size, <= n_ctx)
            the text tokens
        xa : torch.Tensor, shape = (batch_size, n_mels, n_audio_ctx)
            the encoded audio features to be attended on
        """
        offset = next(iter(kv_cache.values())).shape[1] if kv_cache else 0
        x = self.token_embedding(x) + self.positional_embedding[offset : offset + x.shape[-1]]
        x = x.to(xa.dtype)

        for block in self.blocks:
            x = block(x, xa, mask=self.mask, kv_cache=kv_cache)

        x = self.ln(x)
        logits = (x @ torch.transpose(self.token_embedding.weight.to(x.dtype), 0, 1)).float()

        return logits


class Whisper(nn.Module):
    def __init__(self, dims: ModelDimensions, name=''):
        super().__init__()
        self.dims = dims
        self.encoder = AudioEncoder(
            self.dims.n_mels,
            self.dims.n_audio_ctx,
            self.dims.n_audio_state,
            self.dims.n_audio_head,
            self.dims.n_audio_layer,
        )
        self.decoder = TextDecoder(
            self.dims.n_vocab,
            self.dims.n_text_ctx,
            self.dims.n_text_state,
            self.dims.n_text_head,
            self.dims.n_text_layer,
        )
        self.name = name

    def embed_audio(self, mel: torch.Tensor):
        return self.encoder(mel)

    def logits(self, tokens: torch.Tensor, audio_features: torch.Tensor):
        return self.decoder(tokens, audio_features)

    def forward(self, mel: torch.Tensor, tokens: torch.Tensor) -> Dict[str, torch.Tensor]:
        return self.decoder(tokens, self.encoder(mel))

    @property
    def device(self):
        return next(self.parameters()).device

    @property
    def is_multilingual(self):
        return self.dims.n_vocab == 51865

    def install_kv_cache_hooks(self, cache: Optional[dict] = None):
        """
        The `MultiHeadAttention` module optionally accepts `kv_cache` which stores the key and value
        tensors calculated for the previous positions. This method returns a dictionary that stores
        all caches, and the necessary hooks for the key and value projection modules that save the
        intermediate tensors to be reused during later calculations.

        Returns
        -------
        cache : Dict[nn.Module, torch.Tensor]
            A dictionary object mapping the key/value projection modules to its cache
        hooks : List[RemovableHandle]
            List of PyTorch RemovableHandle objects to stop the hooks to be called
        """
        cache = {**cache} if cache is not None else {}
        hooks = []

        def save_to_cache(module, _, output):
            if module not in cache or output.shape[1] > self.decoder.positional_embedding.shape[0]:
                cache[module] = output  # save as-is, for the first token or cross attention
            else:
                cache[module] = torch.cat([cache[module], output], dim=1).detach()
            return cache[module]

        def install_hooks(layer: nn.Module):
            if isinstance(layer, MultiHeadAttention):
                hooks.append(layer.key.register_forward_hook(save_to_cache))
                hooks.append(layer.value.register_forward_hook(save_to_cache))

        self.decoder.apply(install_hooks)
        return cache, hooks

    detect_language = detect_language_function
    transcribe = transcribe_function
    decode = decode_function


================================================
FILE: workspaces/Vision_09df18b156814c80a3e1c1ab544423fc/refy_suggestions/test.csv
================================================
,id,doi,title,authors,category,source,url,year,score
0,http://arxiv.org/abs/2305.19280v1,,"Large language models improve Alzheimer's disease diagnosis using
  multi-modality data","['Yingjie Feng', 'Jun Wang', 'Xianfeng Gu', 'Xiaoyin Xu', 'Min Zhang']",,arxiv,http://arxiv.org/abs/2305.19280v1,2021,0.13802443355290694
1,http://arxiv.org/abs/2305.16222v1,,Incomplete Multimodal Learning for Complex Brain Disorders Prediction,"['Reza Shirkavand', 'Liang Zhan', 'Heng Huang', 'Li Shen', 'Paul M. Thompson']",,arxiv,http://arxiv.org/abs/2305.16222v1,2021,0.13695037693570716
2,http://arxiv.org/abs/2305.10502v1,,"EENED: End-to-End Neural Epilepsy Detection based on Convolutional
  Transformer","['Chenyu Liu', 'Xinliang Zhou', 'Yang Liu']",,arxiv,http://arxiv.org/abs/2305.10502v1,2021,0.12467641282565606
3,http://arxiv.org/abs/2305.17235v2,,"COMCAT: Towards Efficient Compression and Customization of
  Attention-Based Vision Models","['Jinqi Xiao', 'Miao Yin', 'Yu Gong', 'Xiao Zang', 'Jian Ren', 'Bo Yuan']",,arxiv,http://arxiv.org/abs/2305.17235v2,2021,0.12355122484233379
4,http://arxiv.org/abs/2306.03022v1,,"Interpretable Alzheimer's Disease Classification Via a Contrastive
  Diffusion Autoencoder","['Ayodeji Ijishakin', 'Ahmed Abdulaal', 'Adamos Hadjivasiliou', 'Sophie Martin', 'James Cole']",,arxiv,http://arxiv.org/abs/2306.03022v1,2021,0.1234453840648734
5,http://arxiv.org/abs/2305.10060v1,,XAI for Self-supervised Clustering of Wireless Spectrum Activity,"['Ljupcho Milosheski', 'Gregor Cerar', 'Blaž Bertalanič', 'Carolina Fortuna', 'Mihael Mohorčič']",,arxiv,http://arxiv.org/abs/2305.10060v1,2021,0.10784067817323988
6,http://arxiv.org/abs/2306.05514v1,,"Robust Brain Age Estimation via Regression Models and MRI-derived
  Features","['Mansoor Ahmed', 'Usama Sardar', 'Sarwan Ali', 'Shafiq Alam', 'Murray Patterson', 'Imdad Ullah Khan']",,arxiv,http://arxiv.org/abs/2306.05514v1,2021,0.1006547027490279
7,http://arxiv.org/abs/2305.11560v1,,Brain Captioning: Decoding human brain activity into images and text,"['Matteo Ferrante', 'Furkan Ozcelik', 'Tommaso Boccato', 'Rufin VanRullen', 'Nicola Toschi']",,arxiv,http://arxiv.org/abs/2305.11560v1,2021,0.09841263998032185
8,10.1101/2023.05.27.542592,10.1101/2023.05.27.542592,An Explainable and Robust Deep Learning Approach for Automated Electroencephalography-based Schizophrenia Diagnosis,"Sattiraju, A.; Ellis, C. A.; Miller, R. L.; Calhoun, V. D.",neuroscience,biorxiv,,2023,0.09216368910512815
9,http://arxiv.org/abs/2306.00294v1,,"Affinity-based Attention in Self-supervised Transformers Predicts
  Dynamics of Object Grouping in Humans","['Hossein Adeli', 'Seoyoung Ahn', 'Nikolaus Kriegeskorte', 'Gregory Zelinsky']",,arxiv,http://arxiv.org/abs/2306.00294v1,2021,0.0920963415857158
10,http://arxiv.org/abs/2305.16657v1,,"Higher Order Gauge Equivariant CNNs on Riemannian Manifolds and
  Applications","['Gianfranco Cortes', 'Yue Yu', 'Robin Chen', 'Melissa Armstrong', 'David Vaillancourt', 'Baba C. Vemuri']",,arxiv,http://arxiv.org/abs/2305.16657v1,2021,0.09058977336216062
11,http://arxiv.org/abs/2306.00989v1,,Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles,"['Chaitanya Ryali', 'Yuan-Ting Hu', 'Daniel Bolya', 'Chen Wei', 'Haoqi Fan', 'Po-Yao Huang', 'Vaibhav Aggarwal', 'Arkabandhu Chowdhury', 'Omid Poursaeed', 'Judy Hoffman', 'Jitendra Malik', 'Yanghao Li', 'Christoph Feichtenhofer']",,arxiv,http://arxiv.org/abs/2306.00989v1,2021,0.09058950611280858
12,10.1101/2023.05.30.542072,10.1101/2023.05.30.542072,The Bergen Breakfast Scanning Club dataset: a deep brain imaging dataset,"Wang, M.-Y.; Korbmacher, M.; Eikeland, R.; Specht, K.",neuroscience,biorxiv,,2023,0.08871953085977524
13,http://arxiv.org/abs/2306.05745v1,,Two Independent Teachers are Better Role Model,"['Afifa Khaled', 'Ahmed A. Mubarak', 'Kun He']",,arxiv,http://arxiv.org/abs/2306.05745v1,2021,0.08750304025011657
14,http://arxiv.org/abs/2305.16554v1,,Emergent Agentic Transformer from Chain of Hindsight Experience,"['Hao Liu', 'Pieter Abbeel']",,arxiv,http://arxiv.org/abs/2305.16554v1,2021,0.08687888341024813
15,10.1101/2023.06.08.544050,10.1101/2023.06.08.544050,3D analysis of dissection photographs with surface scanning and machine learning for quantitative neuropathology,"Gazula, H.; Tregidgo, H.; Billot, B.; Balbastre, Y.; Williams-Ramirez, J.; Herisse, R.; Casamitjana, A.; Melief, E. J.; Latimer, C. S.; Kilgore, M. D.; Montine, M.; Robinson, E.; Blackburn, E.; Marshall, M. S.; Connors, T. R.; Oakley, D. H.; Frosh, M. P.; Van Leemput, K.; Dalca, A. V.; Fischl, B.; Mac Donald, C. L.; Keene, C. D.; Hyman, B.; Iglesias, J. E.",neuroscience,biorxiv,,2023,0.08621946712332713
16,http://arxiv.org/abs/2306.03834v1,,"MTS2Graph: Interpretable Multivariate Time Series Classification with
  Temporal Evolving Graphs","['Raneen Younis', 'Abdul Hakmeh', 'Zahra Ahmadi']",,arxiv,http://arxiv.org/abs/2306.03834v1,2021,0.08515588285487911
17,http://arxiv.org/abs/2306.04655v1,,"Modulation Classification Through Deep Learning Using Resolution
  Transformed Spectrograms","['Muhammad Waqas', 'Muhammad Ashraf', 'Muhammad Zakwan']",,arxiv,http://arxiv.org/abs/2306.04655v1,2021,0.0847741042619827
18,http://arxiv.org/abs/2306.03284v1,,"Optimizing Sampling Patterns for Compressed Sensing MRI with Diffusion
  Generative Models","['Sriram Ravula', 'Brett Levac', 'Ajil Jalal', 'Jonathan I. Tamir', 'Alexandros G. Dimakis']",,arxiv,http://arxiv.org/abs/2306.03284v1,2021,0.08397880868757927
19,http://arxiv.org/abs/2306.05029v1,,"Multi-level Multiple Instance Learning with Transformer for Whole Slide
  Image Classification","['Ruijie Zhang', 'Qiaozhe Zhang', 'Yingzhuang Liu', 'Hao Xin', 'Yan Liu', 'Xinggang Wang']",,arxiv,http://arxiv.org/abs/2306.05029v1,2021,0.08395868822917545


================================================
FILE: workspaces/Vision_09df18b156814c80a3e1c1ab544423fc/refy_suggestions/test.html
================================================
<!DOCTYPE html>
<head>
<meta charset="UTF-8">
<style>
.r1 {color: #ffa726; text-decoration-color: #ffa726}
.r2 {font-weight: bold}
.r3 {color: #66bb6a; text-decoration-color: #66bb6a; font-weight: bold}
.r4 {color: #ff7043; text-decoration-color: #ff7043; font-weight: bold}
.r5 {color: #ff7043; text-decoration-color: #ff7043; font-weight: bold; text-decoration: underline}
.r6 {color: #ec407a; text-decoration-color: #ec407a}
.r7 {color: #ec407a; text-decoration-color: #ec407a; font-weight: bold}
.r8 {color: #81d4fa; text-decoration-color: #81d4fa; text-decoration: underline}
.r9 {color: #fb8c00; text-decoration-color: #fb8c00}
.r10 {color: #0f0f0f; text-decoration-color: #0f0f0f}
.r11 {color: #8e7423; text-decoration-color: #8e7423}
.r12 {color: #9ccc65; text-decoration-color: #9ccc65}
.r13 {color: #5d7541; text-decoration-color: #5d7541}
.r14 {color: #ffa726; text-decoration-color: #ffa726; font-weight: bold}
.r15 {color: #81d4fa; text-decoration-color: #81d4fa; font-weight: bold}
.r16 {color: #858687; text-decoration-color: #858687}
.r17 {color: #6f6f6f; text-decoration-color: #6f6f6f}
body {
    color: #000000;
    background-color: #1e1e1e;
}
</style>
</head>
<html>
<body>
    <pre style="font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace">
        <code><span class="r1">╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮</span>
<span class="r1">│</span>  <span class="r2">                                                                                                                                                          </span>  <span class="r1">│</span>
<span class="r1">│</span>   <span class="r1">📆  Daily suggestions for: </span><span class="r3">2023-06-12</span>                                                                                                                      <span class="r1">│</span>
<span class="r1">│</span>                                                                                                                                                              <span class="r1">│</span>
<span class="r1">│</span>                                                                                                                                                              <span class="r1">│</span>
<span class="r1">│</span>   <span class="r4">🔍  </span><span class="r5">keywords</span>                                                                                                                                               <span class="r1">│</span>
<span class="r1">│</span>                                                                                                                                                              <span class="r1">│</span>
<span class="r1">│</span>                                                                                                                                                              <span class="r1">│</span>
<span class="r1">│</span>    <span class="r6">    </span><span class="r7">1.</span><span class="r6"> </span><span class="r8">model</span>  <span class="r6">    </span><span class="r7">2.</span><span class="r6"> </span><span class="r8">vit</span>  <span class="r6">    </span><span class="r7">3.</span><span class="r6"> </span><span class="r8">models</span>  <span class="r6">    </span><span class="r7">4.</span><span class="r6"> </span><span class="r8">images</span>  <span class="r6">    </span><span class="r7">5.</span><span class="r6"> </span><span class="r8">cnn</span>  <span class="r6">    </span><span class="r7">6.</span><span class="r6"> </span><span class="r8">transformer</span>  <span class="r6">    </span><span class="r7">7.</span><span class="r6"> </span><span class="r8">learning</span>  <span class="r6">    </span><span class="r7">8.</span><span class="r6"> </span><span class="r8">image</span>                                     <span class="r1">│</span>
<span class="r1">│</span>                                                                                                                                                              <span class="r1">│</span>
<span class="r1">│</span>   <span class="r9">────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────</span>   <span class="r1">│</span>
<span class="r1">│</span>                                                                                                                                                              <span class="r1">│</span>
<span class="r1">│</span>   <span class="r7"> #   year                  author  title                                                                             DOI                        source  </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 1  </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12">    Yingjie Feng</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> Large language</span><span class="r15"> models </span><span class="r14">improve Alzheimer&#x27;s disease diagnosis using                </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2305.19280v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">   multi-modality data                                                            </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 2  </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12"> Reza Shirkavand</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> Incomplete Multimodal Learning for Complex Brain Disorders Prediction            </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2305.16222v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 3  </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12">      Chenyu Liu</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> EENED: End-to-End Neural Epilepsy Detection based on Convolutional               </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2305.10502v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">   Transformer                                                                    </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 4  </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12">      Jinqi Xiao</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> COMCAT: Towards Efficient Compression and Customization of                       </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2305.17235v2">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">   Attention-Based Vision Models                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 5  </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12">   Ayodeji Ijishakin</span><span class="r13"> et</span><span class="r12"> </span><span class="r14"> Interpretable Alzheimer&#x27;s Disease Classification Via a Contrastive               </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2306.03022v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                    </span><span class="r13">al.</span><span class="r12"> </span><span class="r14">   Diffusion Autoencoder                                                          </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 6  </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12">  Ljupcho Milosheski</span><span class="r13"> et</span><span class="r12"> </span><span class="r14"> XAI for Self-supervised Clustering of Wireless Spectrum Activity                 </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2305.10060v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                    </span><span class="r13">al.</span><span class="r12"> </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 7  </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12">   Mansoor Ahmed</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> Robust Brain Age Estimation via Regression Models and MRI-derived                </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2306.05514v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">   Features                                                                       </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 8  </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12"> Matteo Ferrante</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> Brain Captioning: Decoding human brain activity into</span><span class="r15"> images </span><span class="r14">and text             </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2305.11560v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 9  </span><span class="r10"> </span><span class="r11">2023</span><span class="r10"> </span><span class="r12">   Sattiraju, A.</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> An Explainable and Robust Deep Learning Approach for Automated                   </span><span class="r10"> </span><a class="r16" href="https://doi.org/10.1101/2023.05.27.542592">10.1101/2023.05.27.542592</a><span class="r10">  </span><span class="r17">biorxiv</span><span class="r10"> </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14"> Electroencephalography-based Schizophrenia Diagnosis                             </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 10 </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12">   Hossein Adeli</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> Affinity-based Attention in Self-supervised Transformers Predicts                </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2306.00294v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">   Dynamics of Object Grouping in Humans                                          </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 11 </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12">   Gianfranco Cortes</span><span class="r13"> et</span><span class="r12"> </span><span class="r14"> Higher Order Gauge Equivariant CNNs on Riemannian Manifolds and                  </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2305.16657v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                    </span><span class="r13">al.</span><span class="r12"> </span><span class="r14">   Applications                                                                   </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 12 </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12"> Chaitanya Ryali</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles          </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2306.00989v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 13 </span><span class="r10"> </span><span class="r11">2023</span><span class="r10"> </span><span class="r12">     Wang, M.-Y.</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> The Bergen Breakfast Scanning Club dataset: a deep brain imaging dataset         </span><span class="r10"> </span><a class="r16" href="https://doi.org/10.1101/2023.05.30.542072">10.1101/2023.05.30.542072</a><span class="r10">  </span><span class="r17">biorxiv</span><span class="r10"> </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 14 </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12">    Afifa Khaled</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> Two Independent Teachers are Better Role Model                                   </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2306.05745v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 15 </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12">         Hao Liu</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> Emergent Agentic Transformer from Chain of Hindsight Experience                  </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2305.16554v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 16 </span><span class="r10"> </span><span class="r11">2023</span><span class="r10"> </span><span class="r12">      Gazula, H.</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> 3D analysis of dissection photographs with surface scanning and machine</span><span class="r15"> learning</span><span class="r14"> </span><span class="r10"> </span><a class="r16" href="https://doi.org/10.1101/2023.06.08.544050">10.1101/2023.06.08.544050</a><span class="r10">  </span><span class="r17">biorxiv</span><span class="r10"> </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14"> for quantitative neuropathology                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 17 </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12">   Raneen Younis</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> MTS2Graph: Interpretable Multivariate Time Series Classification with            </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2306.03834v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">   Temporal Evolving Graphs                                                       </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 18 </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12">  Muhammad Waqas</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> Modulation Classification Through Deep Learning Using Resolution                 </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2306.04655v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">   Transformed Spectrograms                                                       </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 19 </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12">   Sriram Ravula</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> Optimizing Sampling Patterns for Compressed Sensing MRI with Diffusion           </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2306.03284v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">   Generative Models                                                              </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6"> 20 </span><span class="r10"> </span><span class="r11">2021</span><span class="r10"> </span><span class="r12">    Ruijie Zhang</span><span class="r13"> et al.</span><span class="r12"> </span><span class="r14"> Multi-level Multiple Instance Learning with Transformer for Whole Slide          </span><span class="r10"> </span><a class="r16" href="http://arxiv.org/abs/2306.05029v1">http://arxiv.org/...</a><span class="r10">       </span><span class="r17">arxiv</span><span class="r10">   </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">   Image Classification                                                           </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r6">    </span><span class="r10">      </span><span class="r12">                        </span><span class="r14">                                                                                  </span><span class="r10">                                    </span>   <span class="r1">│</span>
<span class="r1">│</span>   <span class="r10">                                                           </span><span class="r16">20 papers, recommended by refy 👌</span><span class="r10">                                                            </span>   <span class="r1">│</span>
<span class="r1">│</span>                                                                                                                                                              <span class="r1">│</span>
<span class="r1">╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯</span>
</code>
    </pre>
</body>
</html>


================================================
FILE: workspaces/Vision_09df18b156814c80a3e1c1ab544423fc/results/papers.bib
================================================
@ARTICLE{6189435c78b9833da2061c3088e04b8aa5c034e8,
  title     = "Trans-ResNet: Integrating Transformers and CNNs for Alzheimer’s disease classification",
  author    = "Chao Li",
  abstract  = "Convolutional neural networks  CNNs  have demonstrated excellent performance for brain disease classification from MRI data  However  CNNs lack the ability to capture global dependencies  The recently proposed architecture called Transformer uses attention mechanisms to match or even outperform CNNs on various vision tasks  Transformer s performance is dependent on access to large training datasets  but sample sizes for most brain MRI datasets are relatively small  To overcome this limitation  we propose Trans ResNet  a novel architecture which integrates the advantages of both CNNs and Transformers  In addition  we pre trained our Trans ResNet on a large scale dataset on the task of brain age estimation for higher performance  Using three neuroimaging cohorts  UK Biobank  AIBL  ADNI   we demonstrated that our Trans ResNet achieved higher classification accuracy on Alzheimer disease prediction compared to other state of the art CNN based methods ",
  year      = 2022,
  journal   = "2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI)",
  pages     = "1-5"
}

@ARTICLE{eaf2d681c93c17864146394455068817f6faa69c,
  title     = "Efficiently Training Vision Transformers on Structural MRI Scans for Alzheimer's Disease Detection",
  author    = "N. Dhinagar",
  abstract  = "Neuroimaging of large populations is valuable to identify factors that promote or resist brain disease  and to assist diagnosis  subtyping  and prognosis  Data driven models such as convolutional neural networks  CNNs  have increasingly been applied to brain images to perform diagnostic and prognostic tasks by learning robust features  Vision transformers  ViT    a new class of deep learning architectures   have emerged in recent years as an alternative to CNNs for several computer vision applications  Here we tested variants of the ViT architecture for a range of desired neuroimaging downstream tasks based on difficulty  in this case for sex and Alzheimer s disease  AD  classification based on 3D brain MRI  In our experiments  two vision transformer architecture variants achieved an AUC of 0 987 for sex and 0 892 for AD classification  respectively  We independently evaluated our models on data from two benchmark AD datasets  We achieved a performance boost of 5  and 9 10  upon fine tuning vision transformer models pre trained on synthetic  generated by a latent diffusion model  and real MRI scans  respectively  Our main contributions include testing the effects of different ViT training strategies including pre training  data augmentation and learning rate warm ups followed by annealing  as pertaining to the neuroimaging domain  These techniques are essential for training ViT like models for neuroimaging applications where training data is usually limited  We also analyzed the effect of the amount of training data utilized on the test time performance of the ViT via data model scaling curves ",
  year      = 2023,
  journal   = "ArXiv",
  pages     = ""
}

@ARTICLE{7b2cf4599fdbfa7bb073ce3737e6350c8a7d8c3d,
  title     = "Vision Transformer Approach for Classification of Alzheimer’s Disease Using 18F-Florbetaben Brain Images",
  author    = "Hyunji Shin",
  abstract  = "Dementia is a degenerative disease that is increasingly prevalent in an aging society  Alzheimer s disease  AD   the most common type of dementia  is best mitigated via early detection and management  Deep learning is an artificial intelligence technique that has been used to diagnose and predict diseases by extracting meaningful features from medical images  The convolutional neural network  CNN  is a representative application of deep learning  serving as a powerful tool for the diagnosis of AD  Recently  vision transformers  ViT  have yielded classification performance exceeding that of CNN in some diagnostic image classifications  Because the brain is a very complex network with interrelated regions  ViT  which captures direct relationships between images  may be more effective for brain image analysis than CNN  Therefore  we propose a method for classifying dementia images by applying 18F Florbetaben positron emission tomography  PET  images to ViT  Data were evaluated via binary  normal control and abnormal  and ternary  healthy control  mild cognitive impairment  and AD  classification  In a performance comparison with the CNN  VGG19 was selected as the comparison model  Consequently  ViT yielded more effective performance than VGG19 in binary classification  However  in ternary classification  the performance of ViT cannot be considered excellent  These results show that it is hard to argue that the ViT model is better at AD classification than the CNN model ",
  year      = 2023,
  journal   = "Applied Sciences",
  pages     = ""
}

@ARTICLE{9f3cd0b46507d50caf71101f1dc3b23e339a8ba0,
  title     = "Comparative Analysis of Alzheimer's Disease Detection via MRI Scans Using Convolutional Neural Network and Vision Transformer",
  author    = "Pinky Sherwani",
  abstract  = "Progressive damage to brain neurons is caused by a neurodegenerative disease  ND  that the body cannot heal or restore  Dementia like Alzheimer s Disease  AD   which affects millions of lives each year  is a well known instance of such illnesses  Despite extensive research  the aforementioned disorders currently have no viable therapies  However  a timely diagnosis is essential for the management of diseases  For neurologists  diagnosing NDs is difficult and needs years of education and experience  The paper focuses on the detection of Alzheimer s Disease via Brain MRI Scans using Convolutional Neural Network  CNN  which minimizes an image s high dimensionality without sacrificing its information and using Vision Transformers that are for image classification and apply an architecture akin to a Transformer over selected areas of the image  In the proposed work three different Vision Transformer namely  Vanilla Vision Transformer  Vanilla ViT   Deep Vision Transformer  DeepViT   Class Attention in Image Transformer  CaiT   The Vision Transformers turned out to be better than CNN as when training on fewer datasets  ViT exhibits inductive bias  which increases dependency on model regularisation or data augmentation  AgReg   In terms of accuracy and computing efficiency  ViT models exceed the present CNN by almost a factor of four ",
  year      = 2023,
  journal   = "2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF)",
  pages     = "1-9"
}

@ARTICLE{6e6ee835b7388d52e2df688733a2952df38e13f6,
  title     = "Introducing Vision Transformer for Alzheimer's Disease classification task with 3D input",
  author    = "Zilun Zhang",
  abstract  = "  Many high performance classification models utilize complex CNN based architectures for Alzheimer s Disease classification  We aim to investigate two relevant questions regarding classification of Alzheimer s Disease using MRI   Do Vision Transformer based models perform better than CNN based models   and  Is it possible to use a shallow 3D CNN based model to obtain satisfying results   To achieve these goals  we propose two models that can take in and process 3D MRI scans  Convolutional Voxel Vision Transformer  CVVT  architecture  and ConvNet3D 4  a shallow 4 block 3D CNN based model  Our results indicate that the shallow 3D CNN based models are sufficient to achieve good classification results for Alzheimer s Disease using MRI scans ",
  year      = 2022,
  journal   = "ArXiv",
  pages     = ""
}

@ARTICLE{64023908cd9171ded5394d093f9cddb87acc0b59,
  title     = "Vision transformers for the prediction of mild cognitive impairment to Alzheimer’s disease progression using mid-sagittal sMRI",
  author    = "Gia-Minh Hoang",
  abstract  = "Background Alzheimer s disease  AD  is one of the most common causes of neurodegenerative disease affecting over 50 million people worldwide  However  most AD diagnosis occurs in the moderate to late stage  which means that the optimal time for treatment has already passed  Mild cognitive impairment  MCI  is an intermediate state between cognitively normal people and AD patients  Therefore  the accurate prediction in the conversion process of MCI to AD may allow patients to start preventive intervention to slow the progression of the disease  Nowadays  neuroimaging techniques have been developed and are used to determine AD related structural biomarkers  Deep learning approaches have rapidly become a key methodology applied to these techniques to find biomarkers  Methods In this study  we aimed to investigate an MCI to AD prediction method using Vision Transformers  ViT  to structural magnetic resonance images  sMRI   The Alzheimer s Disease Neuroimaging Initiative  ADNI  database containing 598 MCI subjects was used to predict MCI subjects  progression to AD  There are three main objectives in our study   i  to propose an MRI based Vision Transformers approach for MCI to AD progression classification   ii  to evaluate the performance of different ViT architectures to obtain the most advisable one  and  iii  to visualize the brain region mostly affect the prediction of deep learning approach to MCI progression  Results Our method achieved state of the art classification performance in terms of accuracy  83 27    specificity  85 07    and sensitivity  81 48   compared with a set of conventional methods  Next  we visualized the brain regions that mostly contribute to the prediction of MCI progression for interpretability of the proposed model  The discriminative pathological locations include the thalamus  medial frontal  and occipital corroborating the reliability of our model  Conclusion In conclusion  our methods provide an effective and accurate technique for the prediction of MCI conversion to AD  The results obtained in this study outperform previous reports using the ADNI collection  and it suggests that sMRI based ViT could be efficiently applied with a considerable potential benefit for AD patient management  The brain regions mostly contributing to prediction  in conjunction with the identified anatomical features  will support the building of a robust solution for other neurodegenerative diseases in future ",
  year      = 2023,
  journal   = "Frontiers in Aging Neuroscience",
  pages     = ""
}

@ARTICLE{1b4ac29a08b2272bd6b3fbf293544816e9d31c5a,
  title     = "SMIL-DeiT:Multiple Instance Learning and Self-supervised Vision Transformer network for Early Alzheimer's disease classification",
  author    = "Yue Yin",
  abstract  = "Early diagnosis of Alzheimer s disease AD  is becoming increasingly important in preventing and treating the disease as the world s population ages  We proposed a SMIL DeiT network for AD classification tasks amongst three groups  Alzheimer s Disease  AD   Mild Cognitive Impairment  MCI   and Normal Cognitive  NC  in this study  Vision Transformer is the fundamental structure of our work  The data pre training is performed utilizing DINO  a self supervised technique  whereas the downstream classification task is done with Multiple Instance Learning  Our proposed technique works on the ADNI dataset  We used four performance metrics accuracy rates  precision  recall  and Fl score in the evaluation  the most important of which was accuracy  The accuracy obtained by our method is higher than the transformer s 90 1  and CNN s 90 8   reaching 93 2  ",
  year      = 2022,
  journal   = "2022 International Joint Conference on Neural Networks (IJCNN)",
  pages     = "1-6"
}

@ARTICLE{9da3fadf092c864f61d6fd1e8eab5a6ca2397194,
  title     = "Classification of Alzheimer's Disease via Vision Transformer: Classification of Alzheimer's Disease via Vision Transformer",
  author    = "Yanjun Lyu",
  abstract  = "Deep models are powerful in capturing the complex and non linear relationship buried in brain imaging data  However  the huge number of parameters in deep models can easily overfit given limited imaging data samples  In this work  we proposed a cross domain transfer learning method to solve the insufficient data problem in brain imaging domain by leveraging the knowledge learned in natural image domain  Specifically  we employed ViT as the backbone and firstly pretrained it using ImageNet 21K dataset and then transferred to the brain imaging dataset  A slice wise convolution embedding method was developed to improve the standard patch operation in vanilla ViT  Our method was evaluated based on AD CN classification task  We also conducted extensive experiments to compare the transfer performance with different transfer strategies  models  and sample size  The results suggest that the proposed method can effectively transfer the knowledge learned in natural image domain to brain imaging area and may provide a promising way to take advantages of the pretrained model in data intensive applications  Moreover  the proposed cross domain transfer learning method can obtain comparable classification performance compared to most recent studies ",
  year      = 2022,
  journal   = "Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments",
  pages     = ""
}

@ARTICLE{adb781ca8eeea0f2cb4422b3fd632c0fd4ee2448,
  title     = "Advit: Vision Transformer On Multi-Modality Pet Images For Alzheimer Disease Diagnosis",
  author    = "Xin Xing",
  abstract  = "We present a new model trained on multi modalities of Positron Emission Tomography images  PET AV45 and PET FDG  for Alzheimer s Disease  AD  diagnosis  Unlike the conventional methods using multi modal 3D 2D CNN architecture  our design replaces the Convolutional Neural Net work  CNN  by Vision Transformer  ViT   Considering the high computation cost of 3D images  we firstly employ a 3D to 2D operation to project the 3D PET images into 2D fusion images  Then  we forward the fused multi modal 2D images to a parallel ViT model for feature extraction  followed by classification for AD diagnosis  For evaluation  we use PET images from ADNI  The proposed model outperforms several strong baseline models in our experiments and achieves 0 91 accuracy and 0 95 AUC ",
  year      = 2022,
  journal   = "2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI)",
  pages     = "1-4"
}

@ARTICLE{0e681598a83e3e55cfbc34a07a222e99e9f4c445,
  title     = "CrossViT Wide Residual Squeeze-and-Excitation Network for Alzheimer's disease classification with self attention ProGAN data augmentation",
  author    = "Rahma Kadri",
  abstract  = "Efficient and accurate early prediction of Alzheimer s disease  AD  based on the neuroimaging data has attracted interest from many researchers to prevent its progression  Deep learning networks have demonstrated an optimal ability to analyse large scale multimodal neuroimaging for AD classification  The most widely used architecture of deep learning is the Convolution neural networks  CNN  that have shown great potential in AD detection  However CNN does not capture long range dependencies within the input image and does not ensure a good global feature extraction  Furthermore  increasing the receptive field of CNN by increasing the kernels sizes can cause a feature granularity loss  Another limitation is that CNN lacks a weighing mechanism of image features  the network doesn t focus on the relevant features within the image  Recently vision transformer have shown an outstanding performance over the CNN and overcomes its main limitations  The vision transformer relies on the self attention layers  The main drawbacks of this new technique is that it requires a huge amount of training data  In this paper  we combined the main strengths of these two architectures for AD classification  We proposed a new method based on the combination of the Cross ViT and Wide Residual Squeeze and Excitation Network  We acquired MRI data from the Alzheimer s Disease Neuroimaging Initiative  ADNI  and the Open Access Series of Imaging Studies  OASIS   We also proposed a new data augmentation based on the self attention progressive generative adversarial neural network to overcome the limitation of the data  Our proposed method achieved 99  classification accuracy and outperforms CNN models ",
  year      = 2022,
  journal   = "Int. J. Hybrid Intell. Syst.",
  pages     = "163-177"
}

@ARTICLE{af73f6878044012034023640918cf31d7f4e1522,
  title     = "Comparison of Anatomical and Diffusion MRI for detecting Parkinson’s Disease using Deep Convolutional Neural Network",
  author    = "Tamoghna Chattopadhyay",
  abstract  = "Parkinson s disease  PD  is a progressive neurodegenerative disease that affects over 10 million people worldwide  Brain atrophy and microstructural abnormalities tend to be more subtle in PD than in other age related conditions such as Alzheimer s disease  so there is interest in how well machine learning methods can detect PD in radiological scans  Deep learning models based on convolutional neural networks  CNNs  can automatically distil diagnostically useful features from raw MRI scans  but most CNN based deep learning models have only been tested on T1 weighted brain MRI  Here we examine the added value of diffusion weighted MRI  dMRI    a variant of MRI  sensitive to microstructural tissue properties   as an additional input in CNN based models for PD classification  Our evaluations used data from 3 separate cohorts   from Chang Gung University  the University of Pennsylvania  and the PPMI dataset  We trained CNNs on various combinations of these cohorts to find the best predictive model  Although tests on more diverse data are warranted  deep learned models from dMRI show promise for PD classification  Clinical Relevance This study supports the use of diffusion weighted images as an alternative to anatomical images for AI based detection of Parkinson s disease ",
  year      = 2023,
  journal   = "bioRxiv",
  pages     = ""
}

@ARTICLE{003f528634735ece46286609f6a074fbfa435722,
  title     = "Pre-trained deep learning models for brain MRI image classification",
  author    = "Srigiri Krishnapriya",
  abstract  = "Brain tumors are serious conditions caused by uncontrolled and abnormal cell division  Tumors can have devastating implications if not accurately and promptly detected  Magnetic resonance imaging  MRI  is one of the methods frequently used to detect brain tumors owing to its excellent resolution  In the past few decades  substantial research has been conducted in the field of classifying brain images  ranging from traditional methods to deep learning techniques such as convolutional neural networks  CNN   To accomplish classification  machine learning methods require manually created features  In contrast  CNN achieves classification by extracting visual features from unprocessed images  The size of the training dataset had a significant impact on the features that CNN extracts  The CNN tends to overfit when its size is small  Deep CNNs  DCNN  with transfer learning have therefore been developed  The aim of this work was to investigate the brain MR image categorization potential of pre trained DCNN VGG 19  VGG 16  ResNet50  and Inception V3 models using data augmentation and transfer learning techniques  Validation of the test set utilizing accuracy  recall  Precision  and F1 score showed that the pre trained VGG 19 model with transfer learning exhibited the best performance  In addition  these methods offer an end to end classification of raw images without the need for manual attribute extraction ",
  year      = 2023,
  journal   = "Frontiers in Human Neuroscience",
  pages     = ""
}


================================================
FILE: workspaces/Vision_09df18b156814c80a3e1c1ab544423fc/results/papers.csv
================================================
paperId,title,first_author,year,abstract,tldr,bibtex,influentialCitationCount,venue,journal,pages
6189435c78b9833da2061c3088e04b8aa5c034e8,Trans-ResNet: Integrating Transformers and CNNs for Alzheimer’s disease classification,Chao Li,2022,"Convolutional neural networks (CNNs) have demonstrated excellent performance for brain disease classification from MRI data. However, CNNs lack the ability to capture global dependencies. The recently proposed architecture called Transformer uses attention mechanisms to match or even outperform CNNs on various vision tasks. Transformer’s performance is dependent on access to large training datasets, but sample sizes for most brain MRI datasets are relatively small. To overcome this limitation, we propose Trans-ResNet, a novel architecture which integrates the advantages of both CNNs and Transformers. In addition, we pre-trained our Trans-ResNet on a large-scale dataset on the task of brain age estimation for higher performance. Using three neuroimaging cohorts (UK Biobank, AIBL, ADNI), we demonstrated that our Trans-ResNet achieved higher classification accuracy on Alzheimer disease prediction compared to other state-of-the-art CNN-based methods.","Convolutional neural networks (CNNs) have demonstrated excellent performance for brain disease classification from MRI data. However, CNNs lack the ability to capture global dependencies. The recently proposed architecture called Transformer uses attention mechanisms to match or even outperform CNNs on various vision tasks. Transformer’s performance is dependent on access to large training datasets, but sample sizes for most brain MRI datasets are relatively small. To overcome this limitation, we propose Trans-ResNet, a novel architecture which integrates the advantages of both CNNs and Transformers. In addition, we pre-trained our Trans-ResNet on a large-scale dataset on the task of brain age estimation for higher performance. Using three neuroimaging cohorts (UK Biobank, AIBL, ADNI), we demonstrated that our Trans-ResNet achieved higher classification accuracy on Alzheimer disease prediction compared to other state-of-the-art CNN-based methods.","@Article{Li2022TransResNetIT,
 author = {Chao Li and Yue Cui and Na Luo and Yong Liu and P. Bourgeat and J. Fripp and Tianzi Jiang},
 booktitle = {IEEE International Symposium on Biomedical Imaging},
 journal = {2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI)},
 pages = {1-5},
 title = {Trans-ResNet: Integrating Transformers and CNNs for Alzheimer’s disease classification},
 year = {2022}
}
",0,IEEE International Symposium on Biomedical Imaging,2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI),1-5
eaf2d681c93c17864146394455068817f6faa69c,Efficiently Training Vision Transformers on Structural MRI Scans for Alzheimer's Disease Detection,N. Dhinagar,2023,"Neuroimaging of large populations is valuable to identify factors that promote or resist brain disease, and to assist diagnosis, subtyping, and prognosis. Data-driven models such as convolutional neural networks (CNNs) have increasingly been applied to brain images to perform diagnostic and prognostic tasks by learning robust features. Vision transformers (ViT) - a new class of deep learning architectures - have emerged in recent years as an alternative to CNNs for several computer vision applications. Here we tested variants of the ViT architecture for a range of desired neuroimaging downstream tasks based on difficulty, in this case for sex and Alzheimer's disease (AD) classification based on 3D brain MRI. In our experiments, two vision transformer architecture variants achieved an AUC of 0.987 for sex and 0.892 for AD classification, respectively. We independently evaluated our models on data from two benchmark AD datasets. We achieved a performance boost of 5% and 9-10% upon fine-tuning vision transformer models pre-trained on synthetic (generated by a latent diffusion model) and real MRI scans, respectively. Our main contributions include testing the effects of different ViT training strategies including pre-training, data augmentation and learning rate warm-ups followed by annealing, as pertaining to the neuroimaging domain. These techniques are essential for training ViT-like models for neuroimaging applications where training data is usually limited. We also analyzed the effect of the amount of training data utilized on the test-time performance of the ViT via data-model scaling curves.","Neuroimaging of large populations is valuable to identify factors that promote or resist brain disease, and to assist diagnosis, subtyping, and prognosis. Data-driven models such as convolutional neural networks (CNNs) have increasingly been applied to brain images to perform diagnostic and prognostic tasks by learning robust features. Vision transformers (ViT) - a new class of deep learning architectures - have emerged in recent years as an alternative to CNNs for several computer vision applications. Here we tested variants of the ViT architecture for a range of desired neuroimaging downstream tasks based on difficulty, in this case for sex and Alzheimer's disease (AD) classification based on 3D brain MRI. In our experiments, two vision transformer architecture variants achieved an AUC of 0.987 for sex and 0.892 for AD classification, respectively. We independently evaluated our models on data from two benchmark AD datasets. We achieved a performance boost of 5% and 9-10% upon fine-tuning vision transformer models pre-trained on synthetic (generated by a latent diffusion model) and real MRI scans, respectively. Our main contributions include testing the effects of different ViT training strategies including pre-training, data augmentation and learning rate warm-ups followed by annealing, as pertaining to the neuroimaging domain. These techniques are essential for training ViT-like models for neuroimaging applications where training data is usually limited. We also analyzed the effect of the amount of training data utilized on the test-time performance of the ViT via data-model scaling curves.","@Article{Dhinagar2023EfficientlyTV,
 author = {N. Dhinagar and S. Thomopoulos and Emily Laltoo and Paul M. Thompson},
 booktitle = {arXiv.org},
 journal = {ArXiv},
 title = {Efficiently Training Vision Transformers on Structural MRI Scans for Alzheimer's Disease Detection},
 year = {2023}
}
",0,arXiv.org,ArXiv,
7b2cf4599fdbfa7bb073ce3737e6350c8a7d8c3d,Vision Transformer Approach for Classification of Alzheimer’s Disease Using 18F-Florbetaben Brain Images,Hyunji Shin,2023,"Dementia is a degenerative disease that is increasingly prevalent in an aging society. Alzheimer’s disease (AD), the most common type of dementia, is best mitigated via early detection and management. Deep learning is an artificial intelligence technique that has been used to diagnose and predict diseases by extracting meaningful features from medical images. The convolutional neural network (CNN) is a representative application of deep learning, serving as a powerful tool for the diagnosis of AD. Recently, vision transformers (ViT) have yielded classification performance exceeding that of CNN in some diagnostic image classifications. Because the brain is a very complex network with interrelated regions, ViT, which captures direct relationships between images, may be more effective for brain image analysis than CNN. Therefore, we propose a method for classifying dementia images by applying 18F-Florbetaben positron emission tomography (PET) images to ViT. Data were evaluated via binary (normal control and abnormal) and ternary (healthy control, mild cognitive impairment, and AD) classification. In a performance comparison with the CNN, VGG19 was selected as the comparison model. Consequently, ViT yielded more effective performance than VGG19 in binary classification. However, in ternary classification, the performance of ViT cannot be considered excellent. These results show that it is hard to argue that the ViT model is better at AD classification than the CNN model.","Dementia is a degenerative disease that is increasingly prevalent in an aging society. Alzheimer’s disease (AD), the most common type of dementia, is best mitigated via early detection and management. Deep learning is an artificial intelligence technique that has been used to diagnose and predict diseases by extracting meaningful features from medical images. The convolutional neural network (CNN) is a representative application of deep learning, serving as a powerful tool for the diagnosis of AD. Recently, vision transformers (ViT) have yielded classification performance exceeding that of CNN in some diagnostic image classifications. Because the brain is a very complex network with interrelated regions, ViT, which captures direct relationships between images, may be more effective for brain image analysis than CNN. Therefore, we propose a method for classifying dementia images by applying 18F-Florbetaben positron emission tomography (PET) images to ViT. Data were evaluated via binary (normal control and abnormal) and ternary (healthy control, mild cognitive impairment, and AD) classification. In a performance comparison with the CNN, VGG19 was selected as the comparison model. Consequently, ViT yielded more effective performance than VGG19 in binary classification. However, in ternary classification, the performance of ViT cannot be considered excellent. These results show that it is hard to argue that the ViT model is better at AD classification than the CNN model.","@Article{Shin2023VisionTA,
 author = {Hyunji Shin and Soomin Jeon and Youngsoo Seol and Sangjin Kim and Doyoung Kang},
 booktitle = {Applied Sciences},
 journal = {Applied Sciences},
 title = {Vision Transformer Approach for Classification of Alzheimer’s Disease Using 18F-Florbetaben Brain Images},
 year = {2023}
}
",0,Applied Sciences,Applied Sciences,
9f3cd0b46507d50caf71101f1dc3b23e339a8ba0,Comparative Analysis of Alzheimer's Disease Detection via MRI Scans Using Convolutional Neural Network and Vision Transformer,Pinky Sherwani,2023,"Progressive damage to brain neurons is caused by a neurodegenerative disease (ND) that the body cannot heal or restore. Dementia like Alzheimer's Disease (AD), which affects millions of lives each year, is a well-known instance of such illnesses. Despite extensive research, the aforementioned disorders currently have no viable therapies. However, a timely diagnosis is essential for the management of diseases. For neurologists, diagnosing NDs is difficult and needs years of education and experience. The paper focuses on the detection of Alzheimer's Disease via Brain MRI Scans using Convolutional Neural Network (CNN) which minimizes an image's high dimensionality without sacrificing its information and using Vision Transformers that are for image classification and apply an architecture akin to a Transformer over selected areas of the image. In the proposed work three different Vision Transformer namely: Vanilla Vision Transformer (Vanilla ViT), Deep Vision Transformer (DeepViT), Class Attention in Image Transformer (CaiT). The Vision Transformers turned out to be better than CNN as when training on fewer datasets, ViT exhibits inductive bias, which increases dependency on model regularisation or data augmentation (AgReg). In terms of accuracy and computing efficiency, ViT models exceed the present CNN by almost a factor of four.","Progressive damage to brain neurons is caused by a neurodegenerative disease (ND) that the body cannot heal or restore. Dementia like Alzheimer's Disease (AD), which affects millions of lives each year, is a well-known instance of such illnesses. Despite extensive research, the aforementioned disorders currently have no viable therapies. However, a timely diagnosis is essential for the management of diseases. For neurologists, diagnosing NDs is difficult and needs years of education and experience. The paper focuses on the detection of Alzheimer's Disease via Brain MRI Scans using Convolutional Neural Network (CNN) which minimizes an image's high dimensionality without sacrificing its information and using Vision Transformers that are for image classification and apply an architecture akin to a Transformer over selected areas of the image. In the proposed work three different Vision Transformer namely: Vanilla Vision Transformer (Vanilla ViT), Deep Vision Transformer (DeepViT), Class Attention in Image Transformer (CaiT). The Vision Transformers turned out to be better than CNN as when training on fewer datasets, ViT exhibits inductive bias, which increases dependency on model regularisation or data augmentation (AgReg). In terms of accuracy and computing efficiency, ViT models exceed the present CNN by almost a factor of four.","@Conference{Sherwani2023ComparativeAO,
 author = {Pinky Sherwani and P. Nandhakumar and Pihu Srivastava and Jayant Jagtap and Viren Narvekar and H. R.},
 booktitle = {2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF)},
 journal = {2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF)},
 pages = {1-9},
 title = {Comparative Analysis of Alzheimer's Disease Detection via MRI Scans Using Convolutional Neural Network and Vision Transformer},
 year = {2023}
}
",0,2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF),2023 International Conference on Artificial Intelligence and Knowledge Discovery in Concurrent Engineering (ICECONF),1-9
6e6ee835b7388d52e2df688733a2952df38e13f6,Introducing Vision Transformer for Alzheimer's Disease classification task with 3D input,Zilun Zhang,2022,". Many high-performance classification models utilize complex CNN-based architectures for Alzheimer’s Disease classification. We aim to investigate two relevant questions regarding classification of Alzheimer’s Disease using MRI: “Do Vision Transformer-based models perform better than CNN-based models?” and “Is it possible to use a shallow 3D CNN-based model to obtain satisfying results?” To achieve these goals, we propose two models that can take in and process 3D MRI scans: Convolutional Voxel Vision Transformer (CVVT) architecture, and ConvNet3D-4, a shallow 4-block 3D CNN-based model. Our results indicate that the shallow 3D CNN-based models are sufficient to achieve good classification results for Alzheimer’s Disease using MRI scans.",". Many high-performance classification models utilize complex CNN-based architectures for Alzheimer’s Disease classification. We aim to investigate two relevant questions regarding classification of Alzheimer’s Disease using MRI: “Do Vision Transformer-based models perform better than CNN-based models?” and “Is it possible to use a shallow 3D CNN-based model to obtain satisfying results?” To achieve these goals, we propose two models that can take in and process 3D MRI scans: Convolutional Voxel Vision Transformer (CVVT) architecture, and ConvNet3D-4, a shallow 4-block 3D CNN-based model. Our results indicate that the shallow 3D CNN-based models are sufficient to achieve good classification results for Alzheimer’s Disease using MRI scans.","@Article{Zhang2022IntroducingVT,
 author = {Zilun Zhang and F. Khalvati},
 booktitle = {arXiv.org},
 journal = {ArXiv},
 title = {Introducing Vision Transformer for Alzheimer's Disease classification task with 3D input},
 volume = {abs/2210.01177},
 year = {2022}
}
",0,arXiv.org,ArXiv,
64023908cd9171ded5394d093f9cddb87acc0b59,Vision transformers for the prediction of mild cognitive impairment to Alzheimer’s disease progression using mid-sagittal sMRI,Gia-Minh Hoang,2023,"Background Alzheimer’s disease (AD) is one of the most common causes of neurodegenerative disease affecting over 50 million people worldwide. However, most AD diagnosis occurs in the moderate to late stage, which means that the optimal time for treatment has already passed. Mild cognitive impairment (MCI) is an intermediate state between cognitively normal people and AD patients. Therefore, the accurate prediction in the conversion process of MCI to AD may allow patients to start preventive intervention to slow the progression of the disease. Nowadays, neuroimaging techniques have been developed and are used to determine AD-related structural biomarkers. Deep learning approaches have rapidly become a key methodology applied to these techniques to find biomarkers. Methods In this study, we aimed to investigate an MCI-to-AD prediction method using Vision Transformers (ViT) to structural magnetic resonance images (sMRI). The Alzheimer’s Disease Neuroimaging Initiative (ADNI) database containing 598 MCI subjects was used to predict MCI subjects’ progression to AD. There are three main objectives in our study: (i) to propose an MRI-based Vision Transformers approach for MCI to AD progression classification, (ii) to evaluate the performance of different ViT architectures to obtain the most advisable one, and (iii) to visualize the brain region mostly affect the prediction of deep learning approach to MCI progression. Results Our method achieved state-of-the-art classification performance in terms of accuracy (83.27%), specificity (85.07%), and sensitivity (81.48%) compared with a set of conventional methods. Next, we visualized the brain regions that mostly contribute to the prediction of MCI progression for interpretability of the proposed model. The discriminative pathological locations include the thalamus, medial frontal, and occipital—corroborating the reliability of our model. Conclusion In conclusion, our methods provide an effective and accurate technique for the prediction of MCI conversion to AD. The results obtained in this study outperform previous reports using the ADNI collection, and it suggests that sMRI-based ViT could be efficiently applied with a considerable potential benefit for AD patient management. The brain regions mostly contributing to prediction, in conjunction with the identified anatomical features, will support the building of a robust solution for other neurodegenerative diseases in future.","Background Alzheimer’s disease (AD) is one of the most common causes of neurodegenerative disease affecting over 50 million people worldwide. However, most AD diagnosis occurs in the moderate to late stage, which means that the optimal time for treatment has already passed. Mild cognitive impairment (MCI) is an intermediate state between cognitively normal people and AD patients. Therefore, the accurate prediction in the conversion process of MCI to AD may allow patients to start preventive intervention to slow the progression of the disease. Nowadays, neuroimaging techniques have been developed and are used to determine AD-related structural biomarkers. Deep learning approaches have rapidly become a key methodology applied to these techniques to find biomarkers. Methods In this study, we aimed to investigate an MCI-to-AD prediction method using Vision Transformers (ViT) to structural magnetic resonance images (sMRI). The Alzheimer’s Disease Neuroimaging Initiative (ADNI) database containing 598 MCI subjects was used to predict MCI subjects’ progression to AD. There are three main objectives in our study: (i) to propose an MRI-based Vision Transformers approach for MCI to AD progression classification, (ii) to evaluate the performance of different ViT architectures to obtain the most advisable one, and (iii) to visualize the brain region mostly affect the prediction of deep learning approach to MCI progression. Results Our method achieved state-of-the-art classification performance in terms of accuracy (83.27%), specificity (85.07%), and sensitivity (81.48%) compared with a set of conventional methods. Next, we visualized the brain regions that mostly contribute to the prediction of MCI progression for interpretability of the proposed model. The discriminative pathological locations include the thalamus, medial frontal, and occipital—corroborating the reliability of our model. Conclusion In conclusion, our methods provide an effective and accurate technique for the prediction of MCI conversion to AD. The results obtained in this study outperform previous reports using the ADNI collection, and it suggests that sMRI-based ViT could be efficiently applied with a considerable potential benefit for AD patient management. The brain regions mostly contributing to prediction, in conjunction with the identified anatomical features, will support the building of a robust solution for other neurodegenerative diseases in future.","@Article{Hoang2023VisionTF,
 author = {Gia-Minh Hoang and Ue-Hwan Kim and Jae Gwan Kim},
 booktitle = {Frontiers in Aging Neuroscience},
 journal = {Frontiers in Aging Neuroscience},
 title = {Vision transformers for the prediction of mild cognitive impairment to Alzheimer’s disease progression using mid-sagittal sMRI},
 volume = {15},
 year = {2023}
}
",0,Frontiers in Aging Neuroscience,Frontiers in Aging Neuroscience,
1b4ac29a08b2272bd6b3fbf293544816e9d31c5a,SMIL-DeiT:Multiple Instance Learning and Self-supervised Vision Transformer network for Early Alzheimer's disease classification,Yue Yin,2022,"Early diagnosis of Alzheimer's disease(AD) is becoming increasingly important in preventing and treating the disease as the world's population ages. We proposed a SMIL-DeiT network for AD classification tasks amongst three groups: Alzheimer's Disease (AD), Mild Cognitive Impairment (MCI), and Normal Cognitive (NC) in this study. Vision Transformer is the fundamental structure of our work. The data pre-training is performed utilizing DINO, a self-supervised technique, whereas the downstream classification task is done with Multiple Instance Learning. Our proposed technique works on the ADNI dataset. We used four performance metrics accuracy rates, precision, recall, and Fl-score in the evaluation, the most important of which was accuracy. The accuracy obtained by our method is higher than the transformer's 90.1% and CNN's 90.8%, reaching 93.2%.","Early diagnosis of Alzheimer's disease(AD) is becoming increasingly important in preventing and treating the disease as the world's population ages. We proposed a SMIL-DeiT network for AD classification tasks amongst three groups: Alzheimer's Disease (AD), Mild Cognitive Impairment (MCI), and Normal Cognitive (NC) in this study. Vision Transformer is the fundamental structure of our work. The data pre-training is performed utilizing DINO, a self-supervised technique, whereas the downstream classification task is done with Multiple Instance Learning. Our proposed technique works on the ADNI dataset. We used four performance metrics accuracy rates, precision, recall, and Fl-score in the evaluation, the most important of which was accuracy. The accuracy obtained by our method is higher than the transformer's 90.1% and CNN's 90.8%, reaching 93.2%.","@Article{Yin2022SMILDeiTMultipleIL,
 author = {Yue Yin and Weikang Jin and Jing Bai and Ruotong Liu and Haowei Zhen},
 booktitle = {IEEE International Joint Conference on Neural Network},
 journal = {2022 International Joint Conference on Neural Networks (IJCNN)},
 pages = {1-6},
 title = {SMIL-DeiT:Multiple Instance Learning and Self-supervised Vision Transformer network for Early Alzheimer's disease classification},
 year = {2022}
}
",0,IEEE International Joint Conference on Neural Network,2022 International Joint Conference on Neural Networks (IJCNN),1-6
9da3fadf092c864f61d6fd1e8eab5a6ca2397194,Classification of Alzheimer's Disease via Vision Transformer: Classification of Alzheimer's Disease via Vision Transformer,Yanjun Lyu,2022,"Deep models are powerful in capturing the complex and non-linear relationship buried in brain imaging data. However, the huge number of parameters in deep models can easily overfit given limited imaging data samples. In this work, we proposed a cross-domain transfer learning method to solve the insufficient data problem in brain imaging domain by leveraging the knowledge learned in natural image domain. Specifically, we employed ViT as the backbone and firstly pretrained it using ImageNet-21K dataset and then transferred to the brain imaging dataset. A slice-wise convolution embedding method was developed to improve the standard patch operation in vanilla ViT. Our method was evaluated based on AD/CN classification task. We also conducted extensive experiments to compare the transfer performance with different transfer strategies, models, and sample size. The results suggest that the proposed method can effectively transfer the knowledge learned in natural image domain to brain imaging area and may provide a promising way to take advantages of the pretrained model in data-intensive applications. Moreover, the proposed cross-domain transfer learning method can obtain comparable classification performance compared to most recent studies.","Deep models are powerful in capturing the complex and non-linear relationship buried in brain imaging data. However, the huge number of parameters in deep models can easily overfit given limited imaging data samples. In this work, we proposed a cross-domain transfer learning method to solve the insufficient data problem in brain imaging domain by leveraging the knowledge learned in natural image domain. Specifically, we employed ViT as the backbone and firstly pretrained it using ImageNet-21K dataset and then transferred to the brain imaging dataset. A slice-wise convolution embedding method was developed to improve the standard patch operation in vanilla ViT. Our method was evaluated based on AD/CN classification task. We also conducted extensive experiments to compare the transfer performance with different transfer strategies, models, and sample size. The results suggest that the proposed method can effectively transfer the knowledge learned in natural image domain to brain imaging area and may provide a promising way to take advantages of the pretrained model in data-intensive applications. Moreover, the proposed cross-domain transfer learning method can obtain comparable classification performance compared to most recent studies.","@Article{Lyu2022ClassificationOA,
 author = {Yanjun Lyu and Xiao-Wen Yu and Dajiang Zhu and Lu Zhang},
 booktitle = {Petra},
 journal = {Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments},
 title = {Classification of Alzheimer's Disease via Vision Transformer: Classification of Alzheimer's Disease via Vision Transformer},
 year = {2022}
}
",0,Petra,Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments,
adb781ca8eeea0f2cb4422b3fd632c0fd4ee2448,Advit: Vision Transformer On Multi-Modality Pet Images For Alzheimer Disease Diagnosis,Xin Xing,2022,"We present a new model trained on multi-modalities of Positron Emission Tomography images (PET-AV45 and PET-FDG) for Alzheimer’s Disease (AD) diagnosis. Unlike the conventional methods using multi-modal 3D/2D CNN architecture, our design replaces the Convolutional Neural Net-work (CNN) by Vision Transformer (ViT). Considering the high computation cost of 3D images, we firstly employ a 3D-to-2D operation to project the 3D PET images into 2D fusion images. Then, we forward the fused multi-modal 2D images to a parallel ViT model for feature extraction, followed by classification for AD diagnosis. For evaluation, we use PET images from ADNI. The proposed model outperforms several strong baseline models in our experiments and achieves 0.91 accuracy and 0.95 AUC.","We present a new model trained on multi-modalities of Positron Emission Tomography images (PET-AV45 and PET-FDG) for Alzheimer’s Disease (AD) diagnosis. Unlike the conventional methods using multi-modal 3D/2D CNN architecture, our design replaces the Convolutional Neural Net-work (CNN) by Vision Transformer (ViT). Considering the high computation cost of 3D images, we firstly employ a 3D-to-2D operation to project the 3D PET images into 2D fusion images. Then, we forward the fused multi-modal 2D images to a parallel ViT model for feature extraction, followed by classification for AD diagnosis. For evaluation, we use PET images from ADNI. The proposed model outperforms several strong baseline models in our experiments and achieves 0.91 accuracy and 0.95 AUC.","@Article{Xing2022AdvitVT,
 author = {Xin Xing and G. Liang and Yu Zhang and Subash Khanal and Ai-Ling Lin and Nathan Jacobs},
 booktitle = {IEEE International Symposium on Biomedical Imaging},
 journal = {2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI)},
 pages = {1-4},
 title = {Advit: Vision Transformer On Multi-Modality Pet Images For Alzheimer Disease Diagnosis},
 year = {2022}
}
",1,IEEE International Symposium on Biomedical Imaging,2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI),1-4
0e681598a83e3e55cfbc34a07a222e99e9f4c445,CrossViT Wide Residual Squeeze-and-Excitation Network for Alzheimer's disease classification with self attention ProGAN data augmentation,Rahma Kadri,2022,"Efficient and accurate early prediction of Alzheimer's disease (AD) based on the neuroimaging data has attracted interest from many researchers to prevent its progression. Deep learning networks have demonstrated an optimal ability to analyse large-scale multimodal neuroimaging for AD classification. The most widely used architecture of deep learning is the Convolution neural networks (CNN) that have shown great potential in AD detection. However CNN does not capture long range dependencies within the input image and does not ensure a good global feature extraction. Furthermore, increasing the receptive field of CNN by increasing the kernels sizes can cause a feature granularity loss. Another limitation is that CNN lacks a weighing mechanism of image features; the network doesn’t focus on the relevant features within the image. Recently,vision transformer have shown an outstanding performance over the CNN and overcomes its main limitations. The vision transformer relies on the self-attention layers. The main drawbacks of this new technique is that it requires a huge amount of training data. In this paper, we combined the main strengths of these two architectures for AD classification. We proposed a new method based on the combination of the Cross ViT and Wide Residual Squeeze-and-Excitation Network. We acquired MRI data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Open Access Series of Imaging Studies (OASIS). We also proposed a new data augmentation based on the self attention progressive generative adversarial neural network to overcome the limitation of the data. Our proposed method achieved 99% classification accuracy and outperforms CNN models.","Efficient and accurate early prediction of Alzheimer's disease (AD) based on the neuroimaging data has attracted interest from many researchers to prevent its progression. Deep learning networks have demonstrated an optimal ability to analyse large-scale multimodal neuroimaging for AD classification. The most widely used architecture of deep learning is the Convolution neural networks (CNN) that have shown great potential in AD detection. However CNN does not capture long range dependencies within the input image and does not ensure a good global feature extraction. Furthermore, increasing the receptive field of CNN by increasing the kernels sizes can cause a feature granularity loss. Another limitation is that CNN lacks a weighing mechanism of image features; the network doesn’t focus on the relevant features within the image. Recently,vision transformer have shown an outstanding performance over the CNN and overcomes its main limitations. The vision transformer relies on the self-attention layers. The main drawbacks of this new technique is that it requires a huge amount of training data. In this paper, we combined the main strengths of these two architectures for AD classification. We proposed a new method based on the combination of the Cross ViT and Wide Residual Squeeze-and-Excitation Network. We acquired MRI data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the Open Access Series of Imaging Studies (OASIS). We also proposed a new data augmentation based on the self attention progressive generative adversarial neural network to overcome the limitation of the data. Our proposed method achieved 99% classification accuracy and outperforms CNN models.","@Article{Kadri2022CrossViTWR,
 author = {Rahma Kadri and B. Bouaziz and M. Tmar and F. Gargouri},
 booktitle = {International Journal of Hybrid Intelligent Systems},
 journal = {Int. J. Hybrid Intell. Syst.},
 pages = {163-177},
 title = {CrossViT Wide Residual Squeeze-and-Excitation Network for Alzheimer's disease classification with self attention ProGAN data augmentation},
 volume = {17},
 year = {2022}
}
",0,International Journal of Hybrid Intelligent Systems,Int. J. Hybrid Intell. Syst.,163-177
af73f6878044012034023640918cf31d7f4e1522,Comparison of Anatomical and Diffusion MRI for detecting Parkinson’s Disease using Deep Convolutional Neural Network,Tamoghna Chattopadhyay,2023,"Parkinson’s disease (PD) is a progressive neurodegenerative disease that affects over 10 million people worldwide. Brain atrophy and microstructural abnormalities tend to be more subtle in PD than in other age-related conditions such as Alzheimer’s disease, so there is interest in how well machine learning methods can detect PD in radiological scans. Deep learning models based on convolutional neural networks (CNNs) can automatically distil diagnostically useful features from raw MRI scans, but most CNN-based deep learning models have only been tested on T1-weighted brain MRI. Here we examine the added value of diffusion-weighted MRI (dMRI) - a variant of MRI, sensitive to microstructural tissue properties - as an additional input in CNN-based models for PD classification. Our evaluations used data from 3 separate cohorts - from Chang Gung University, the University of Pennsylvania, and the PPMI dataset. We trained CNNs on various combinations of these cohorts to find the best predictive model. Although tests on more diverse data are warranted, deep-learned models from dMRI show promise for PD classification. Clinical Relevance This study supports the use of diffusion-weighted images as an alternative to anatomical images for AI-based detection of Parkinson’s disease.","Parkinson’s disease (PD) is a progressive neurodegenerative disease that affects over 10 million people worldwide. Brain atrophy and microstructural abnormalities tend to be more subtle in PD than in other age-related conditions such as Alzheimer’s disease, so there is interest in how well machine learning methods can detect PD in radiological scans. Deep learning models based on convolutional neural networks (CNNs) can automatically distil diagnostically useful features from raw MRI scans, but most CNN-based deep learning models have only been tested on T1-weighted brain MRI. Here we examine the added value of diffusion-weighted MRI (dMRI) - a variant of MRI, sensitive to microstructural tissue properties - as an additional input in CNN-based models for PD classification. Our evaluations used data from 3 separate cohorts - from Chang Gung University, the University of Pennsylvania, and the PPMI dataset. We trained CNNs on various combinations of these cohorts to find the best predictive model. Although tests on more diverse data are warranted, deep-learned models from dMRI show promise for PD classification. Clinical Relevance This study supports the use of diffusion-weighted images as an alternative to anatomical images for AI-based detection of Parkinson’s disease.","@Article{Chattopadhyay2023ComparisonOA,
 author = {Tamoghna Chattopadhyay and Amit Singh and Emily Laltoo and C. Boyle and Conor Owens-Walton and Yao‐Liang Chen and P. Cook and C. McMillan and Chih-Chien Tsai and J-J Wang and Yih-Ru Wu and Y. D. van der Werf and P. Thompson},
 booktitle = {bioRxiv},
 journal = {bioRxiv},
 title = {Comparison of Anatomical and Diffusion MRI for detecting Parkinson’s Disease using Deep Convolutional Neural Network},
 year = {2023}
}
",0,bioRxiv,bioRxiv,
003f528634735ece46286609f6a074fbfa435722,Pre-trained deep learning models for brain MRI image classification,Srigiri Krishnapriya,2023,"Brain tumors are serious conditions caused by uncontrolled and abnormal cell division. Tumors can have devastating implications if not accurately and promptly detected. Magnetic resonance imaging (MRI) is one of the methods frequently used to detect brain tumors owing to its excellent resolution. In the past few decades, substantial research has been conducted in the field of classifying brain images, ranging from traditional methods to deep-learning techniques such as convolutional neural networks (CNN). To accomplish classification, machine-learning methods require manually created features. In contrast, CNN achieves classification by extracting visual features from unprocessed images. The size of the training dataset had a significant impact on the features that CNN extracts. The CNN tends to overfit when its size is small. Deep CNNs (DCNN) with transfer learning have therefore been developed. The aim of this work was to investigate the brain MR image categorization potential of pre-trained DCNN VGG-19, VGG-16, ResNet50, and Inception V3 models using data augmentation and transfer learning techniques. Validation of the test set utilizing accuracy, recall, Precision, and F1 score showed that the pre-trained VGG-19 model with transfer learning exhibited the best performance. In addition, these methods offer an end-to-end classification of raw images without the need for manual attribute extraction.","Brain tumors are serious conditions caused by uncontrolled and abnormal cell division. Tumors can have devastating implications if not accurately and promptly detected. Magnetic resonance imaging (MRI) is one of the methods frequently used to detect brain tumors owing to its excellent resolution. In the past few decades, substantial research has been conducted in the field of classifying brain images, ranging from traditional methods to deep-learning techniques such as convolutional neural networks (CNN). To accomplish classification, machine-learning methods require manually created features. In contrast, CNN achieves classification by extracting visual features from unprocessed images. The size of the training dataset had a significant impact on the features that CNN extracts. The CNN tends to overfit when its size is small. Deep CNNs (DCNN) with transfer learning have therefore been developed. The aim of this work was to investigate the brain MR image categorization potential of pre-trained DCNN VGG-19, VGG-16, ResNet50, and Inception V3 models using data augmentation and transfer learning techniques. Validation of the test set utilizing accuracy, recall, Precision, and F1 score showed that the pre-trained VGG-19 model with transfer learning exhibited the best performance. In addition, these methods offer an end-to-end classification of raw images without the need for manual attribute extraction.","@Article{Krishnapriya2023PretrainedDL,
 author = {Srigiri Krishnapriya and Y. Karuna},
 booktitle = {Frontiers in Human Neuroscience},
 journal = {Frontiers in Human Neuroscience},
 title = {Pre-trained deep learning models for brain MRI image classification},
 volume = {17},
 year = {2023}
}
",0,Frontiers in Human Neuroscience,Frontiers in Human Neuroscience,