Full Code of Huanshere/VideoLingo for AI

main 29e240dcd2a1 cached

132 files

541.1 KB

165.2k tokens

205 symbols

1 requests

Download .txt

Showing preview only (575K chars total). Download the full file or copy to clipboard to get everything.

Repository: Huanshere/VideoLingo
Branch: main
Commit: 29e240dcd2a1
Files: 132
Total size: 541.1 KB

Directory structure:
gitextract_wyfw8khg/

├── .cursorrules
├── .gitignore
├── .streamlit/
│   └── config.toml
├── Dockerfile
├── LICENSE
├── OneKeyStart.bat
├── README.md
├── VideoLingo_colab.ipynb
├── batch/
│   ├── OneKeyBatch.bat
│   ├── README.md
│   ├── README.zh.md
│   └── utils/
│       ├── batch_processor.py
│       ├── settings_check.py
│       └── video_processor.py
├── config.yaml
├── core/
│   ├── _10_gen_audio.py
│   ├── _11_merge_audio.py
│   ├── _12_dub_to_vid.py
│   ├── _1_ytdlp.py
│   ├── _2_asr.py
│   ├── _3_1_split_nlp.py
│   ├── _3_2_split_meaning.py
│   ├── _4_1_summarize.py
│   ├── _4_2_translate.py
│   ├── _5_split_sub.py
│   ├── _6_gen_sub.py
│   ├── _7_sub_into_vid.py
│   ├── _8_1_audio_task.py
│   ├── _8_2_dub_chunks.py
│   ├── _9_refer_audio.py
│   ├── __init__.py
│   ├── asr_backend/
│   │   ├── __init__.py
│   │   ├── audio_preprocess.py
│   │   ├── demucs_vl.py
│   │   ├── elevenlabs_asr.py
│   │   ├── whisperX_302.py
│   │   └── whisperX_local.py
│   ├── prompts.py
│   ├── spacy_utils/
│   │   ├── __init__.py
│   │   ├── load_nlp_model.py
│   │   ├── split_by_comma.py
│   │   ├── split_by_connector.py
│   │   ├── split_by_mark.py
│   │   └── split_long_by_root.py
│   ├── st_utils/
│   │   ├── __init__.py
│   │   ├── download_video_section.py
│   │   ├── imports_and_utils.py
│   │   └── sidebar_setting.py
│   ├── translate_lines.py
│   ├── tts_backend/
│   │   ├── _302_f5tts.py
│   │   ├── azure_tts.py
│   │   ├── custom_tts.py
│   │   ├── edge_tts.py
│   │   ├── estimate_duration.py
│   │   ├── fish_tts.py
│   │   ├── gpt_sovits_tts.py
│   │   ├── openai_tts.py
│   │   ├── sf_cosyvoice2.py
│   │   ├── sf_fishtts.py
│   │   └── tts_main.py
│   └── utils/
│       ├── __init__.py
│       ├── ask_gpt.py
│       ├── config_utils.py
│       ├── decorator.py
│       ├── delete_retry_dubbing.py
│       ├── models.py
│       ├── onekeycleanup.py
│       └── pypi_autochoose.py
├── custom_terms.xlsx
├── docs/
│   ├── .gitignore
│   ├── components/
│   │   ├── landing/
│   │   │   ├── comments.tsx
│   │   │   ├── faq.tsx
│   │   │   ├── features.tsx
│   │   │   ├── github-stats.tsx
│   │   │   ├── hero.tsx
│   │   │   └── index.tsx
│   │   └── ui/
│   │       ├── accordion.tsx
│   │       ├── badge.tsx
│   │       ├── button.tsx
│   │       ├── card.tsx
│   │       ├── hero-video-dialog.tsx
│   │       ├── rainbow-button.tsx
│   │       └── tooltip.tsx
│   ├── components.json
│   ├── lib/
│   │   └── utils.ts
│   ├── middleware.js
│   ├── next-env.d.ts
│   ├── next.config.js
│   ├── package.json
│   ├── pages/
│   │   ├── _app.mdx
│   │   ├── _meta.en-US.json
│   │   ├── _meta.ja.json
│   │   ├── _meta.zh-CN.json
│   │   ├── docs/
│   │   │   ├── _meta.en-US.json
│   │   │   ├── _meta.ja.json
│   │   │   ├── _meta.zh-CN.json
│   │   │   ├── docker.en-US.md
│   │   │   ├── docker.zh-CN.md
│   │   │   ├── introduction.en-US.md
│   │   │   ├── introduction.zh-CN.md
│   │   │   ├── start.en-US.md
│   │   │   ├── start.zh-CN.md
│   │   │   ├── tech.en-US.md
│   │   │   └── tech.zh-CN.md
│   │   ├── globals.css
│   │   ├── index.en-US.mdx
│   │   ├── index.ja.mdx
│   │   └── index.zh-CN.mdx
│   ├── postcss.config.js
│   ├── public/
│   │   └── site.webmanifest
│   ├── tailwind.config.js
│   ├── theme.config.jsx
│   └── tsconfig.json
├── install.py
├── launch.py
├── requirements.txt
├── setup.py
├── st.py
└── translations/
    ├── README.es.md
    ├── README.fr.md
    ├── README.ja.md
    ├── README.ru.md
    ├── README.zh-TW.md
    ├── README.zh.md
    ├── en.json
    ├── es.json
    ├── fr.json
    ├── ja.json
    ├── ru.json
    ├── translations.py
    ├── zh-CN.json
    └── zh-HK.json

================================================
FILE CONTENTS
================================================

================================================
FILE: .cursorrules
================================================
2. 使用
# ------------
# comment
# ------------ 
进行大块的注释
3. 避免使用复杂的函数内注释，以及函数变量中不要有类型定义
4. 使用英文注释和print

================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
#   For a library or package, you might want to ignore these files since the code is
#   intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
#   However, in case of collaboration, if having platform-specific dependencies or dependencies
#   having no cross-platform support, pipenv may install dependencies that don't work, or not
#   install all needed dependencies.
#Pipfile.lock

# poetry
#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
#   This is especially recommended for binary packages to ensure reproducibility, and is more
#   commonly ignored for libraries.
#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
#poetry.lock

# pdm
#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
#pdm.lock
#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
#   in version control.
#   https://pdm.fming.dev/#use-with-ide
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# history and output
/output/
/history/
_model_cache/

batch/input/
batch/output/
batch/tasks_setting.xlsx

# large files
/ffmpeg.exe
/ffmpeg
/ffprobe.exe
/ffprobe
.DS_Store
AllinOne.ipynb
config.backup.yaml

# runtime
runtime/
dev/
installer_files/
logs/

================================================
FILE: .streamlit/config.toml
================================================
[server]
maxUploadSize = 4096

================================================
FILE: Dockerfile
================================================
ARG CUDA_VERSION=12.4.1
FROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu20.04

# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ARG PYTHON_VERSION=3.10

# Change software sources and install basic tools and system dependencies
RUN sed -i 's/archive.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list && \
    sed -i 's/security.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list && \
    apt-get update && apt-get install -y --no-install-recommends \
    software-properties-common git curl sudo ffmpeg fonts-noto wget \
    && add-apt-repository ppa:deadsnakes/ppa \
    && apt-get update -y \
    && apt-get install -y python${PYTHON_VERSION} python${PYTHON_VERSION}-dev python${PYTHON_VERSION}-venv \
    && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python${PYTHON_VERSION} 1 \
    && update-alternatives --set python3 /usr/bin/python${PYTHON_VERSION} \
    && ln -sf /usr/bin/python${PYTHON_VERSION}-config /usr/bin/python3-config \
    && curl -sS https://bootstrap.pypa.io/get-pip.py | python${PYTHON_VERSION} \
    && python3 --version && python3 -m pip --version

# Clean apt cache
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

# Workaround for CUDA compatibility issues
RUN ldconfig /usr/local/cuda-$(echo $CUDA_VERSION | cut -d. -f1,2)/compat/

# Set working directory and clone repository
WORKDIR /app
RUN git clone https://github.com/Huanshere/VideoLingo.git .

# Install PyTorch and torchaudio
RUN pip install torch==2.0.0 torchaudio==2.0.0 --index-url https://download.pytorch.org/whl/cu118

# Clean up unnecessary files
RUN rm -rf .git

# Upgrade pip and install basic dependencies
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

# Install dependencies
COPY requirements.txt .
RUN pip install -e .

# Set CUDA-related environment variables
ENV CUDA_HOME=/usr/local/cuda
ENV PATH=${CUDA_HOME}/bin:${PATH}
ENV LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}

# Set CUDA architecture list
ARG TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6+PTX"
ENV TORCH_CUDA_ARCH_LIST=${TORCH_CUDA_ARCH_LIST}

EXPOSE 8501

CMD ["streamlit", "run", "st.py"]

================================================
FILE: LICENSE
================================================
                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.


================================================
FILE: OneKeyStart.bat
================================================
@echo off
chcp 65001 >nul 2>&1
call conda activate videolingo 2>nul
set PYTHONWARNINGS=ignore
python "%~dp0launch.py"
if %errorlevel% neq 0 (
    echo.
    echo  Pre-flight checks or Streamlit failed. See logs\ for details.
    echo.
)
pause


================================================
FILE: README.md
================================================
<div align="center">

<img src="/docs/logo.png" alt="VideoLingo Logo" height="140">

# Connect the World, Frame by Frame

<a href="https://trendshift.io/repositories/12200" target="_blank"><img src="https://trendshift.io/api/badge/repositories/12200" alt="Huanshere%2FVideoLingo | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>

[**English**](/README.md)｜[**简体中文**](/translations/README.zh.md)｜[**繁體中文**](/translations/README.zh-TW.md)｜[**日本語**](/translations/README.ja.md)｜[**Español**](/translations/README.es.md)｜[**Русский**](/translations/README.ru.md)｜[**Français**](/translations/README.fr.md)

</div>

## 🌟 Overview ([Try VL Now!](https://videolingo.io))

VideoLingo is an all-in-one video translation, localization, and dubbing tool aimed at generating Netflix-quality subtitles. It eliminates stiff machine translations and multi-line subtitles while adding high-quality dubbing, enabling global knowledge sharing across language barriers.

Key features:
- 🎥 YouTube video download via yt-dlp

- **🎙️ Word-level and Low-illusion subtitle recognition with WhisperX**

- **📝 NLP and AI-powered subtitle segmentation**

- **📚 Custom + AI-generated terminology for coherent translation**

- **🔄 3-step Translate-Reflect-Adaptation for cinematic quality**

- **✅ Netflix-standard, Single-line subtitles Only**

- **🗣️ Dubbing with GPT-SoVITS, Azure, OpenAI, and more**

- 🚀 One-click startup and processing in Streamlit

- 🌍 Multi-language support in Streamlit UI

- 📝 Detailed logging with progress resumption

Difference from similar projects: **Single-line subtitles only, superior translation quality, seamless dubbing experience**

## 🎥 Demo

<table>
<tr>
<td width="33%">

### Dual Subtitles
---
https://github.com/user-attachments/assets/a5c3d8d1-2b29-4ba9-b0d0-25896829d951

</td>
<td width="33%">

### Cosy2 Voice Clone
---
https://github.com/user-attachments/assets/e065fe4c-3694-477f-b4d6-316917df7c0a

</td>
<td width="33%">

### GPT-SoVITS with my voice
---
https://github.com/user-attachments/assets/47d965b2-b4ab-4a0b-9d08-b49a7bf3508c

</td>
</tr>
</table>

### Language Support

**Input Language Support(more to come):**

🇺🇸 English 🤩 | 🇷🇺 Russian 😊 | 🇫🇷 French 🤩 | 🇩🇪 German 🤩 | 🇮🇹 Italian 🤩 | 🇪🇸 Spanish 🤩 | 🇯🇵 Japanese 😐 | 🇨🇳 Chinese* 😊

> *Chinese uses a separate punctuation-enhanced whisper model, for now...

**Translation supports all languages, while dubbing language depends on the chosen TTS method.**

## Installation

Meet any problem? Chat with our free online AI agent [**here**](https://share.fastgpt.in/chat/share?shareId=066w11n3r9aq6879r4z0v9rh) to help you.

> **Note:** For Windows users with NVIDIA GPU, follow these steps before installation:
> 1. Install [CUDA Toolkit 12.6](https://developer.download.nvidia.com/compute/cuda/12.6.0/local_installers/cuda_12.6.0_560.76_windows.exe)
> 2. Install [CUDNN 9.3.0](https://developer.download.nvidia.com/compute/cudnn/9.3.0/local_installers/cudnn_9.3.0_windows.exe)
> 3. Add `C:\Program Files\NVIDIA\CUDNN\v9.3\bin\12.6` to your system PATH
> 4. Restart your computer

> **Note:** FFmpeg is required. Please install it via package managers:
> - Windows: ```choco install ffmpeg``` (via [Chocolatey](https://chocolatey.org/))
> - macOS: ```brew install ffmpeg``` (via [Homebrew](https://brew.sh/))
> - Linux: ```sudo apt install ffmpeg``` (Debian/Ubuntu)

1. Clone the repository

```bash
git clone https://github.com/Huanshere/VideoLingo.git
cd VideoLingo
```

2. Install dependencies(requires `python=3.10`)

```bash
conda create -n videolingo python=3.10.0 -y
conda activate videolingo
python install.py
```

3. Start the application

```bash
streamlit run st.py
```

### Docker
Alternatively, you can use Docker (requires CUDA 12.4 and NVIDIA Driver version >550), see [Docker docs](/docs/pages/docs/docker.en-US.md):

```bash
docker build -t videolingo .
docker run -d -p 8501:8501 --gpus all videolingo
```

## APIs
VideoLingo supports OpenAI-Like API format and various TTS interfaces:
- LLM: `claude-3-5-sonnet`, `gpt-4.1`, `deepseek-v3`, `gemini-2.0-flash`, ... (sorted by performance, be cautious with gemini-2.5-flash...)
- WhisperX: Run whisperX (large-v3) locally or use 302.ai API
- TTS: `azure-tts`, `openai-tts`, `siliconflow-fishtts`, **`fish-tts`**, `GPT-SoVITS`, `edge-tts`, `*custom-tts`(You can modify your own TTS in custom_tts.py!)

> **Note:** VideoLingo works with **[302.ai](https://gpt302.saaslink.net/C2oHR9)** - one API key for all services (LLM, WhisperX, TTS). Or run locally with Ollama and Edge-TTS for free, no API needed!

For detailed installation, API configuration, and batch mode instructions, please refer to the documentation: [English](/docs/pages/docs/start.en-US.md) | [中文](/docs/pages/docs/start.zh-CN.md)

## Current Limitations

1. WhisperX transcription performance may be affected by video background noise, as it uses wav2vac model for alignment. For videos with loud background music, please enable Voice Separation Enhancement. Additionally, subtitles ending with numbers or special characters may be truncated early due to wav2vac's inability to map numeric characters (e.g., "1") to their spoken form ("one").

2. Using weaker models can lead to errors during processes due to strict JSON format requirements for responses (tried my best to prompt llm😊). If this error occurs, please delete the `output` folder and retry with a different LLM, otherwise repeated execution will read the previous erroneous response causing the same error.

3. The dubbing feature may not be 100% perfect due to differences in speech rates and intonation between languages, as well as the impact of the translation step. However, this project has implemented extensive engineering processing for speech rates to ensure the best possible dubbing results.

4. **Multilingual video transcription recognition will only retain the main language**. This is because whisperX uses a specialized model for a single language when forcibly aligning word-level subtitles, and will delete unrecognized languages.

5. **For now, cannot dub multiple characters separately**, as whisperX's speaker distinction capability is not sufficiently reliable.

## 📄 License

This project is licensed under the Apache 2.0 License. Special thanks to the following open source projects for their contributions:

[whisperX](https://github.com/m-bain/whisperX), [yt-dlp](https://github.com/yt-dlp/yt-dlp), [json_repair](https://github.com/mangiucugna/json_repair), [BELLE](https://github.com/LianjiaTech/BELLE)

## 📬 Contact Me

- Submit [Issues](https://github.com/Huanshere/VideoLingo/issues) or [Pull Requests](https://github.com/Huanshere/VideoLingo/pulls) on GitHub
- DM me on Twitter: [@Huanshere](https://twitter.com/Huanshere)
- Email me at: team@videolingo.io

## ⭐ Star History

[![Star History Chart](https://api.star-history.com/svg?repos=Huanshere/VideoLingo&type=Timeline)](https://star-history.com/#Huanshere/VideoLingo&Timeline)

---

<p align="center">If you find VideoLingo helpful, please give me a ⭐️!</p>


================================================
FILE: VideoLingo_colab.ipynb
================================================
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RkeSbYF2HoM_"
      },
      "source": [
        "# Welcome to VideoLingo! 🎉🚀\n",
        "#### This colab file allows you to quickly experience the full functionality in just 5 minutes! ⏱️✨ Before you begin, you may need to prepare some keys. 🔑🗝️ Please read https://videolingo.io/docs/start to get ready. 📚👍\n",
        "#### *Please use a T4 GPU to execute this colab for optimal performance."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "h0jE67Gc-1nO"
      },
      "source": [
        "## 1. Clone VideoLingo repo 📥"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "NC3i2T7D51oS",
        "outputId": "19821917-8ee4-4123-a099-fd35405fcdfe"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Cloning into 'VideoLingo'...\n",
            "remote: Enumerating objects: 2578, done.\u001b[K\n",
            "remote: Counting objects: 100% (595/595), done.\u001b[K\n",
            "remote: Compressing objects: 100% (221/221), done.\u001b[K\n",
            "remote: Total 2578 (delta 408), reused 378 (delta 374), pack-reused 1983 (from 1)\u001b[K\n",
            "Receiving objects: 100% (2578/2578), 10.44 MiB | 12.60 MiB/s, done.\n",
            "Resolving deltas: 100% (1644/1644), done.\n"
          ]
        }
      ],
      "source": [
        "!git clone https://github.com/Huanshere/VideoLingo.git\n",
        "%cd VideoLingo"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "A5sIHzRs8JI1"
      },
      "source": [
        "## 2. Installation 🚀\n",
        "this takes around 4 mins"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "tSymHAEg6Vzr",
        "outputId": "8c5059e4-37d5-4540-cba5-9c7aa46fe226"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Requirement already satisfied: rich in /usr/local/lib/python3.10/dist-packages (13.8.1)\n",
            "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich) (3.0.0)\n",
            "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich) (2.18.0)\n",
            "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich) (0.1.2)\n",
            "\u001b[1;35m╭──────────────────────────╮\u001b[0m\n",
            "\u001b[1;35m│\u001b[0m\u001b[1;35m \u001b[0m\u001b[1;35mStarting installation...\u001b[0m\u001b[1;35m \u001b[0m\u001b[1;35m│\u001b[0m\n",
            "\u001b[1;35m╰──────────────────────────╯\u001b[0m\n",
            "config.py file has been created. Please fill in the API key and base URL in the config.py file.\n",
            "\u001b[36m╭──────────────────────────────────────────────────────────────────────────────────────────────────╮\u001b[0m\n",
            "\u001b[36m│\u001b[0m\u001b[36m \u001b[0m\u001b[36mInstalling requests...\u001b[0m\u001b[36m                                                                          \u001b[0m\u001b[36m \u001b[0m\u001b[36m│\u001b[0m\n",
            "\u001b[36m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n",
            "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (2.32.3)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests) (3.3.2)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests) (3.10)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests) (2.2.3)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests) (2024.8.30)\n",
            "\u001b[3m                        Whisper Model Selection                         \u001b[0m\n",
            "┏━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓\n",
            "┃\u001b[1m \u001b[0m\u001b[1mOption\u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mModel        \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mDescription                                \u001b[0m\u001b[1m \u001b[0m┃\n",
            "┡━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩\n",
            "│\u001b[36m \u001b[0m\u001b[36m1     \u001b[0m\u001b[36m \u001b[0m│\u001b[35m \u001b[0m\u001b[35mwhisperX 💻  \u001b[0m\u001b[35m \u001b[0m│\u001b[32m \u001b[0m\u001b[32mlocal model (can also use online model api)\u001b[0m\u001b[32m \u001b[0m│\n",
            "│\u001b[36m \u001b[0m\u001b[36m2     \u001b[0m\u001b[36m \u001b[0m│\u001b[35m \u001b[0m\u001b[35mwhisperXapi ☁️\u001b[0m\u001b[35m \u001b[0m│\u001b[32m \u001b[0m\u001b[32monline model through api only              \u001b[0m\u001b[32m \u001b[0m│\n",
            "└────────┴───────────────┴─────────────────────────────────────────────┘\n",
            "If you're unsure about the differences between models, please see \n",
            "\u001b[4;94mhttps://github.com/Huanshere/VideoLingo/\u001b[0m\n",
            "\u001b[36m╭──────────────────────────────────────────────────────────────────────────────────────────────────╮\u001b[0m\n",
            "\u001b[36m│\u001b[0m\u001b[36m \u001b[0m\u001b[36mInstalling PyTorch with CUDA support...\u001b[0m\u001b[36m                                                         \u001b[0m\u001b[36m \u001b[0m\u001b[36m│\u001b[0m\n",
            "\u001b[36m╰──────────────────────────────────────────────────────────────────────────────────────────────────╯\u001b[0m\n",
            "Looking in indexes: https://download.pytorch.org/whl/cu118\n",
            "Collecting torch==2.0.0\n",
            "  Downloading https://download.pytorch.org/whl/cu118/torch-2.0.0%2Bcu118-cp310-cp310-linux_x86_64.whl (2267.3 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.3/2.3 GB\u001b[0m \u001b[31m626.3 kB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting torchaudio==2.0.0\n",
            "  Downloading https://download.pytorch.org/whl/cu118/torchaudio-2.0.0%2Bcu118-cp310-cp310-linux_x86_64.whl (4.4 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m4.4/4.4 MB\u001b[0m \u001b[31m84.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch==2.0.0) (3.16.1)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch==2.0.0) (4.12.2)\n",
            "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch==2.0.0) (1.13.3)\n",
            "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch==2.0.0) (3.3)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch==2.0.0) (3.1.4)\n",
            "Collecting triton==2.0.0 (from torch==2.0.0)\n",
            "  Downloading https://download.pytorch.org/whl/triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m63.3/63.3 MB\u001b[0m \u001b[31m11.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch==2.0.0) (3.30.3)\n",
            "Collecting lit (from triton==2.0.0->torch==2.0.0)\n",
            "  Downloading https://download.pytorch.org/whl/lit-15.0.7.tar.gz (132 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m132.3/132.3 kB\u001b[0m \u001b[31m11.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch==2.0.0) (2.1.5)\n",
            "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->torch==2.0.0) (1.3.0)\n",
            "Building wheels for collected packages: lit\n",
            "  Building wheel for lit (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for lit: filename=lit-15.0.7-py3-none-any.whl size=89990 sha256=f9f0ca0ce885e2ddbc5583433ea047a853b4034c8e2712e685d93fbed24f9204\n",
            "  Stored in directory: /root/.cache/pip/wheels/27/2c/b6/3ed2983b1b44fe0dea1bb35234b09f2c22fb8ebb308679c922\n",
            "Successfully built lit\n",
            "Installing collected packages: lit, triton, torch, torchaudio\n",
            "  Attempting uninstall: torch\n",
            "    Found existing installation: torch 2.4.1+cu121\n",
            "    Uninstalling torch-2.4.1+cu121:\n",
            "      Successfully uninstalled torch-2.4.1+cu121\n",
            "  Attempting uninstall: torchaudio\n",
            "    Found existing installation: torchaudio 2.4.1+cu121\n",
            "    Uninstalling torchaudio-2.4.1+cu121:\n",
            "      Successfully uninstalled torchaudio-2.4.1+cu121\n",
            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "torchvision 0.19.1+cu121 requires torch==2.4.1, but you have torch 2.0.0+cu118 which is incompatible.\u001b[0m\u001b[31m\n",
            "\u001b[0mSuccessfully installed lit-15.0.7 torch-2.0.0+cu118 torchaudio-2.0.0+cu118 triton-2.0.0\n",
            "Installing whisperX...\n",
            "Obtaining file:///content/VideoLingo/third_party/whisperX\n",
            "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Requirement already satisfied: torch>=2 in /usr/local/lib/python3.10/dist-packages (from whisperx==3.1.1) (2.0.0+cu118)\n",
            "Requirement already satisfied: torchaudio>=2 in /usr/local/lib/python3.10/dist-packages (from whisperx==3.1.1) (2.0.0+cu118)\n",
            "Collecting faster-whisper==1.0.0 (from whisperx==3.1.1)\n",
            "  Downloading faster_whisper-1.0.0-py3-none-any.whl.metadata (14 kB)\n",
            "Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from whisperx==3.1.1) (4.44.2)\n",
            "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from whisperx==3.1.1) (2.2.2)\n",
            "Requirement already satisfied: setuptools>=65 in /usr/local/lib/python3.10/dist-packages (from whisperx==3.1.1) (71.0.4)\n",
            "Requirement already satisfied: nltk in /usr/local/lib/python3.10/dist-packages (from whisperx==3.1.1) (3.8.1)\n",
            "Collecting pyannote.audio==3.1.1 (from whisperx==3.1.1)\n",
            "  Downloading pyannote.audio-3.1.1-py2.py3-none-any.whl.metadata (9.3 kB)\n",
            "Collecting av==11.* (from faster-whisper==1.0.0->whisperx==3.1.1)\n",
            "  Downloading av-11.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.5 kB)\n",
            "Collecting ctranslate2<5,>=4.0 (from faster-whisper==1.0.0->whisperx==3.1.1)\n",
            "  Downloading ctranslate2-4.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)\n",
            "Requirement already satisfied: huggingface-hub>=0.13 in /usr/local/lib/python3.10/dist-packages (from faster-whisper==1.0.0->whisperx==3.1.1) (0.24.7)\n",
            "Collecting tokenizers<0.16,>=0.13 (from faster-whisper==1.0.0->whisperx==3.1.1)\n",
            "  Downloading tokenizers-0.15.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)\n",
            "Collecting onnxruntime<2,>=1.14 (from faster-whisper==1.0.0->whisperx==3.1.1)\n",
            "  Downloading onnxruntime-1.19.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)\n",
            "Collecting asteroid-filterbanks>=0.4 (from pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading asteroid_filterbanks-0.4.0-py3-none-any.whl.metadata (3.3 kB)\n",
            "Requirement already satisfied: einops>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pyannote.audio==3.1.1->whisperx==3.1.1) (0.8.0)\n",
            "Collecting lightning>=2.0.1 (from pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading lightning-2.4.0-py3-none-any.whl.metadata (38 kB)\n",
            "Collecting omegaconf<3.0,>=2.1 (from pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading omegaconf-2.3.0-py3-none-any.whl.metadata (3.9 kB)\n",
            "Collecting pyannote.core>=5.0.0 (from pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading pyannote.core-5.0.0-py3-none-any.whl.metadata (1.4 kB)\n",
            "Collecting pyannote.database>=5.0.1 (from pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading pyannote.database-5.1.0-py3-none-any.whl.metadata (1.2 kB)\n",
            "Collecting pyannote.metrics>=3.2 (from pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading pyannote.metrics-3.2.1-py3-none-any.whl.metadata (1.3 kB)\n",
            "Collecting pyannote.pipeline>=3.0.1 (from pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading pyannote.pipeline-3.0.1-py3-none-any.whl.metadata (897 bytes)\n",
            "Collecting pytorch-metric-learning>=2.1.0 (from pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading pytorch_metric_learning-2.6.0-py3-none-any.whl.metadata (17 kB)\n",
            "Requirement already satisfied: rich>=12.0.0 in /usr/local/lib/python3.10/dist-packages (from pyannote.audio==3.1.1->whisperx==3.1.1) (13.8.1)\n",
            "Collecting semver>=3.0.0 (from pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading semver-3.0.2-py3-none-any.whl.metadata (5.0 kB)\n",
            "Requirement already satisfied: soundfile>=0.12.1 in /usr/local/lib/python3.10/dist-packages (from pyannote.audio==3.1.1->whisperx==3.1.1) (0.12.1)\n",
            "Collecting speechbrain>=0.5.14 (from pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading speechbrain-1.0.1-py3-none-any.whl.metadata (24 kB)\n",
            "Collecting tensorboardX>=2.6 (from pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading tensorboardX-2.6.2.2-py2.py3-none-any.whl.metadata (5.8 kB)\n",
            "Collecting torch-audiomentations>=0.11.0 (from pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading torch_audiomentations-0.11.1-py3-none-any.whl.metadata (14 kB)\n",
            "Collecting torchmetrics>=0.11.0 (from pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading torchmetrics-1.4.2-py3-none-any.whl.metadata (19 kB)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=2->whisperx==3.1.1) (3.16.1)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=2->whisperx==3.1.1) (4.12.2)\n",
            "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=2->whisperx==3.1.1) (1.13.3)\n",
            "Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=2->whisperx==3.1.1) (3.3)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=2->whisperx==3.1.1) (3.1.4)\n",
            "Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=2->whisperx==3.1.1) (2.0.0)\n",
            "Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=2->whisperx==3.1.1) (3.30.3)\n",
            "Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=2->whisperx==3.1.1) (15.0.7)\n",
            "Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk->whisperx==3.1.1) (8.1.7)\n",
            "Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from nltk->whisperx==3.1.1) (1.4.2)\n",
            "Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk->whisperx==3.1.1) (2024.9.11)\n",
            "Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from nltk->whisperx==3.1.1) (4.66.5)\n",
            "Requirement already satisfied: numpy>=1.22.4 in /usr/local/lib/python3.10/dist-packages (from pandas->whisperx==3.1.1) (1.26.4)\n",
            "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->whisperx==3.1.1) (2.8.2)\n",
            "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->whisperx==3.1.1) (2024.2)\n",
            "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-packages (from pandas->whisperx==3.1.1) (2024.2)\n",
            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers->whisperx==3.1.1) (24.1)\n",
            "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers->whisperx==3.1.1) (6.0.2)\n",
            "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers->whisperx==3.1.1) (2.32.3)\n",
            "Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib/python3.10/dist-packages (from transformers->whisperx==3.1.1) (0.4.5)\n",
            "INFO: pip is looking at multiple versions of transformers to determine which version is compatible with other requirements. This could take a while.\n",
            "Collecting transformers (from whisperx==3.1.1)\n",
            "  Downloading transformers-4.45.1-py3-none-any.whl.metadata (44 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m44.4/44.4 kB\u001b[0m \u001b[31m3.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.45.0-py3-none-any.whl.metadata (44 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m44.4/44.4 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.44.1-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.7/43.7 kB\u001b[0m \u001b[31m3.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.44.0-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.7/43.7 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.43.4-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.7/43.7 kB\u001b[0m \u001b[31m3.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.43.3-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.7/43.7 kB\u001b[0m \u001b[31m3.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.43.2-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.7/43.7 kB\u001b[0m \u001b[31m552.2 kB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hINFO: pip is still looking at multiple versions of transformers to determine which version is compatible with other requirements. This could take a while.\n",
            "  Downloading transformers-4.43.1-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.7/43.7 kB\u001b[0m \u001b[31m3.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.43.0-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.7/43.7 kB\u001b[0m \u001b[31m3.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.42.4-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.6/43.6 kB\u001b[0m \u001b[31m3.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.42.3-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.6/43.6 kB\u001b[0m \u001b[31m3.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.42.2-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.6/43.6 kB\u001b[0m \u001b[31m3.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hINFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.\n",
            "  Downloading transformers-4.42.1-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.6/43.6 kB\u001b[0m \u001b[31m3.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.42.0-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.6/43.6 kB\u001b[0m \u001b[31m3.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.41.2-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.8/43.8 kB\u001b[0m \u001b[31m3.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.41.1-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.8/43.8 kB\u001b[0m \u001b[31m3.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.41.0-py3-none-any.whl.metadata (43 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m43.8/43.8 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.40.2-py3-none-any.whl.metadata (137 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m138.0/138.0 kB\u001b[0m \u001b[31m8.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.40.1-py3-none-any.whl.metadata (137 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m138.0/138.0 kB\u001b[0m \u001b[31m12.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.40.0-py3-none-any.whl.metadata (137 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m137.6/137.6 kB\u001b[0m \u001b[31m11.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Downloading transformers-4.39.3-py3-none-any.whl.metadata (134 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m11.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.13->faster-whisper==1.0.0->whisperx==3.1.1) (2024.6.1)\n",
            "Collecting lightning-utilities<2.0,>=0.10.0 (from lightning>=2.0.1->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading lightning_utilities-0.11.7-py3-none-any.whl.metadata (5.2 kB)\n",
            "INFO: pip is looking at multiple versions of lightning to determine which version is compatible with other requirements. This could take a while.\n",
            "Collecting lightning>=2.0.1 (from pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading lightning-2.3.3-py3-none-any.whl.metadata (35 kB)\n",
            "Collecting pytorch-lightning (from lightning>=2.0.1->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading pytorch_lightning-2.4.0-py3-none-any.whl.metadata (21 kB)\n",
            "Collecting antlr4-python3-runtime==4.9.* (from omegaconf<3.0,>=2.1->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m117.0/117.0 kB\u001b[0m \u001b[31m10.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting coloredlogs (from onnxruntime<2,>=1.14->faster-whisper==1.0.0->whisperx==3.1.1)\n",
            "  Downloading coloredlogs-15.0.1-py2.py3-none-any.whl.metadata (12 kB)\n",
            "Requirement already satisfied: flatbuffers in /usr/local/lib/python3.10/dist-packages (from onnxruntime<2,>=1.14->faster-whisper==1.0.0->whisperx==3.1.1) (24.3.25)\n",
            "Requirement already satisfied: protobuf in /usr/local/lib/python3.10/dist-packages (from onnxruntime<2,>=1.14->faster-whisper==1.0.0->whisperx==3.1.1) (3.20.3)\n",
            "Requirement already satisfied: sortedcontainers>=2.0.4 in /usr/local/lib/python3.10/dist-packages (from pyannote.core>=5.0.0->pyannote.audio==3.1.1->whisperx==3.1.1) (2.4.0)\n",
            "Requirement already satisfied: scipy>=1.1 in /usr/local/lib/python3.10/dist-packages (from pyannote.core>=5.0.0->pyannote.audio==3.1.1->whisperx==3.1.1) (1.13.1)\n",
            "Requirement already satisfied: typer>=0.12.1 in /usr/local/lib/python3.10/dist-packages (from pyannote.database>=5.0.1->pyannote.audio==3.1.1->whisperx==3.1.1) (0.12.5)\n",
            "Requirement already satisfied: scikit-learn>=0.17.1 in /usr/local/lib/python3.10/dist-packages (from pyannote.metrics>=3.2->pyannote.audio==3.1.1->whisperx==3.1.1) (1.5.2)\n",
            "Collecting docopt>=0.6.2 (from pyannote.metrics>=3.2->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading docopt-0.6.2.tar.gz (25 kB)\n",
            "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Requirement already satisfied: tabulate>=0.7.7 in /usr/local/lib/python3.10/dist-packages (from pyannote.metrics>=3.2->pyannote.audio==3.1.1->whisperx==3.1.1) (0.9.0)\n",
            "Requirement already satisfied: matplotlib>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from pyannote.metrics>=3.2->pyannote.audio==3.1.1->whisperx==3.1.1) (3.7.1)\n",
            "Collecting optuna>=3.1 (from pyannote.pipeline>=3.0.1->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading optuna-4.0.0-py3-none-any.whl.metadata (16 kB)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas->whisperx==3.1.1) (1.16.0)\n",
            "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich>=12.0.0->pyannote.audio==3.1.1->whisperx==3.1.1) (3.0.0)\n",
            "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich>=12.0.0->pyannote.audio==3.1.1->whisperx==3.1.1) (2.18.0)\n",
            "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.10/dist-packages (from soundfile>=0.12.1->pyannote.audio==3.1.1->whisperx==3.1.1) (1.17.1)\n",
            "Collecting hyperpyyaml (from speechbrain>=0.5.14->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading HyperPyYAML-1.2.2-py3-none-any.whl.metadata (7.6 kB)\n",
            "Requirement already satisfied: sentencepiece in /usr/local/lib/python3.10/dist-packages (from speechbrain>=0.5.14->pyannote.audio==3.1.1->whisperx==3.1.1) (0.2.0)\n",
            "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=2->whisperx==3.1.1) (1.3.0)\n",
            "Collecting julius<0.3,>=0.2.3 (from torch-audiomentations>=0.11.0->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading julius-0.2.7.tar.gz (59 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m59.6/59.6 kB\u001b[0m \u001b[31m5.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Requirement already satisfied: librosa>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from torch-audiomentations>=0.11.0->pyannote.audio==3.1.1->whisperx==3.1.1) (0.10.2.post1)\n",
            "Collecting torch-pitch-shift>=1.2.2 (from torch-audiomentations>=0.11.0->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading torch_pitch_shift-1.2.5-py3-none-any.whl.metadata (2.5 kB)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=2->whisperx==3.1.1) (2.1.5)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->whisperx==3.1.1) (3.3.2)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->whisperx==3.1.1) (3.10)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->whisperx==3.1.1) (2.2.3)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->whisperx==3.1.1) (2024.8.30)\n",
            "Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.0->soundfile>=0.12.1->pyannote.audio==3.1.1->whisperx==3.1.1) (2.22)\n",
            "Requirement already satisfied: aiohttp!=4.0.0a0,!=4.0.0a1 in /usr/local/lib/python3.10/dist-packages (from fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote.audio==3.1.1->whisperx==3.1.1) (3.10.6)\n",
            "Requirement already satisfied: audioread>=2.1.9 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.6.0->torch-audiomentations>=0.11.0->pyannote.audio==3.1.1->whisperx==3.1.1) (3.0.1)\n",
            "Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.6.0->torch-audiomentations>=0.11.0->pyannote.audio==3.1.1->whisperx==3.1.1) (4.4.2)\n",
            "Requirement already satisfied: numba>=0.51.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.6.0->torch-audiomentations>=0.11.0->pyannote.audio==3.1.1->whisperx==3.1.1) (0.60.0)\n",
            "Requirement already satisfied: pooch>=1.1 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.6.0->torch-audiomentations>=0.11.0->pyannote.audio==3.1.1->whisperx==3.1.1) (1.8.2)\n",
            "Requirement already satisfied: soxr>=0.3.2 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.6.0->torch-audiomentations>=0.11.0->pyannote.audio==3.1.1->whisperx==3.1.1) (0.5.0.post1)\n",
            "Requirement already satisfied: lazy-loader>=0.1 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.6.0->torch-audiomentations>=0.11.0->pyannote.audio==3.1.1->whisperx==3.1.1) (0.4)\n",
            "Requirement already satisfied: msgpack>=1.0 in /usr/local/lib/python3.10/dist-packages (from librosa>=0.6.0->torch-audiomentations>=0.11.0->pyannote.audio==3.1.1->whisperx==3.1.1) (1.0.8)\n",
            "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich>=12.0.0->pyannote.audio==3.1.1->whisperx==3.1.1) (0.1.2)\n",
            "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote.audio==3.1.1->whisperx==3.1.1) (1.3.0)\n",
            "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote.audio==3.1.1->whisperx==3.1.1) (0.12.1)\n",
            "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote.audio==3.1.1->whisperx==3.1.1) (4.54.1)\n",
            "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote.audio==3.1.1->whisperx==3.1.1) (1.4.7)\n",
            "Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote.audio==3.1.1->whisperx==3.1.1) (10.4.0)\n",
            "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=2.0.0->pyannote.metrics>=3.2->pyannote.audio==3.1.1->whisperx==3.1.1) (3.1.4)\n",
            "Collecting alembic>=1.5.0 (from optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading alembic-1.13.3-py3-none-any.whl.metadata (7.4 kB)\n",
            "Collecting colorlog (from optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading colorlog-6.8.2-py3-none-any.whl.metadata (10 kB)\n",
            "Requirement already satisfied: sqlalchemy>=1.3.0 in /usr/local/lib/python3.10/dist-packages (from optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote.audio==3.1.1->whisperx==3.1.1) (2.0.35)\n",
            "Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.17.1->pyannote.metrics>=3.2->pyannote.audio==3.1.1->whisperx==3.1.1) (3.5.0)\n",
            "Collecting primePy>=1.3 (from torch-pitch-shift>=1.2.2->torch-audiomentations>=0.11.0->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading primePy-1.3-py3-none-any.whl.metadata (4.8 kB)\n",
            "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.10/dist-packages (from typer>=0.12.1->pyannote.database>=5.0.1->pyannote.audio==3.1.1->whisperx==3.1.1) (1.5.4)\n",
            "Collecting humanfriendly>=9.1 (from coloredlogs->onnxruntime<2,>=1.14->faster-whisper==1.0.0->whisperx==3.1.1)\n",
            "  Downloading humanfriendly-10.0-py2.py3-none-any.whl.metadata (9.2 kB)\n",
            "Collecting ruamel.yaml>=0.17.28 (from hyperpyyaml->speechbrain>=0.5.14->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading ruamel.yaml-0.18.6-py3-none-any.whl.metadata (23 kB)\n",
            "INFO: pip is looking at multiple versions of pytorch-lightning to determine which version is compatible with other requirements. This could take a while.\n",
            "Collecting pytorch-lightning (from lightning>=2.0.1->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading pytorch_lightning-2.3.3-py3-none-any.whl.metadata (21 kB)\n",
            "Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote.audio==3.1.1->whisperx==3.1.1) (2.4.0)\n",
            "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote.audio==3.1.1->whisperx==3.1.1) (1.3.1)\n",
            "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote.audio==3.1.1->whisperx==3.1.1) (24.2.0)\n",
            "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote.audio==3.1.1->whisperx==3.1.1) (1.4.1)\n",
            "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote.audio==3.1.1->whisperx==3.1.1) (6.1.0)\n",
            "Requirement already satisfied: yarl<2.0,>=1.12.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote.audio==3.1.1->whisperx==3.1.1) (1.12.1)\n",
            "Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp!=4.0.0a0,!=4.0.0a1->fsspec[http]<2026.0,>=2022.5.0->lightning>=2.0.1->pyannote.audio==3.1.1->whisperx==3.1.1) (4.0.3)\n",
            "Collecting Mako (from alembic>=1.5.0->optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading Mako-1.3.5-py3-none-any.whl.metadata (2.9 kB)\n",
            "Requirement already satisfied: llvmlite<0.44,>=0.43.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba>=0.51.0->librosa>=0.6.0->torch-audiomentations>=0.11.0->pyannote.audio==3.1.1->whisperx==3.1.1) (0.43.0)\n",
            "Requirement already satisfied: platformdirs>=2.5.0 in /usr/local/lib/python3.10/dist-packages (from pooch>=1.1->librosa>=0.6.0->torch-audiomentations>=0.11.0->pyannote.audio==3.1.1->whisperx==3.1.1) (4.3.6)\n",
            "Collecting ruamel.yaml.clib>=0.2.7 (from ruamel.yaml>=0.17.28->hyperpyyaml->speechbrain>=0.5.14->pyannote.audio==3.1.1->whisperx==3.1.1)\n",
            "  Downloading ruamel.yaml.clib-0.2.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl.metadata (2.2 kB)\n",
            "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy>=1.3.0->optuna>=3.1->pyannote.pipeline>=3.0.1->pyannote.audio==3.1.1->whisperx==3.1.1) (3.1.1)\n",
            "Downloading faster_whisper-1.0.0-py3-none-any.whl (1.5 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.5/1.5 MB\u001b[0m \u001b[31m41.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pyannote.audio-3.1.1-py2.py3-none-any.whl (208 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m208.7/208.7 kB\u001b[0m \u001b[31m19.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading av-11.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (32.9 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m32.9/32.9 MB\u001b[0m \u001b[31m54.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading transformers-4.39.3-py3-none-any.whl (8.8 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m8.8/8.8 MB\u001b[0m \u001b[31m68.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading asteroid_filterbanks-0.4.0-py3-none-any.whl (29 kB)\n",
            "Downloading ctranslate2-4.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (37.2 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m37.2/37.2 MB\u001b[0m \u001b[31m13.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading lightning-2.3.3-py3-none-any.whl (808 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m808.5/808.5 kB\u001b[0m \u001b[31m39.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading omegaconf-2.3.0-py3-none-any.whl (79 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m79.5/79.5 kB\u001b[0m \u001b[31m6.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading onnxruntime-1.19.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (13.2 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.2/13.2 MB\u001b[0m \u001b[31m100.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pyannote.core-5.0.0-py3-none-any.whl (58 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.5/58.5 kB\u001b[0m \u001b[31m5.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pyannote.database-5.1.0-py3-none-any.whl (48 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m48.1/48.1 kB\u001b[0m \u001b[31m2.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pyannote.metrics-3.2.1-py3-none-any.whl (51 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m51.4/51.4 kB\u001b[0m \u001b[31m4.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pyannote.pipeline-3.0.1-py3-none-any.whl (31 kB)\n",
            "Downloading pytorch_metric_learning-2.6.0-py3-none-any.whl (119 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m119.3/119.3 kB\u001b[0m \u001b[31m11.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading semver-3.0.2-py3-none-any.whl (17 kB)\n",
            "Downloading speechbrain-1.0.1-py3-none-any.whl (807 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m807.2/807.2 kB\u001b[0m \u001b[31m47.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading tensorboardX-2.6.2.2-py2.py3-none-any.whl (101 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m101.7/101.7 kB\u001b[0m \u001b[31m9.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading tokenizers-0.15.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.6 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.6/3.6 MB\u001b[0m \u001b[31m96.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading torch_audiomentations-0.11.1-py3-none-any.whl (50 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m50.1/50.1 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading torchmetrics-1.4.2-py3-none-any.whl (869 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m869.2/869.2 kB\u001b[0m \u001b[31m51.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading lightning_utilities-0.11.7-py3-none-any.whl (26 kB)\n",
            "Downloading optuna-4.0.0-py3-none-any.whl (362 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m362.8/362.8 kB\u001b[0m \u001b[31m27.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading torch_pitch_shift-1.2.5-py3-none-any.whl (5.0 kB)\n",
            "Downloading coloredlogs-15.0.1-py2.py3-none-any.whl (46 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m46.0/46.0 kB\u001b[0m \u001b[31m4.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading HyperPyYAML-1.2.2-py3-none-any.whl (16 kB)\n",
            "Downloading pytorch_lightning-2.3.3-py3-none-any.whl (812 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m812.3/812.3 kB\u001b[0m \u001b[31m48.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading alembic-1.13.3-py3-none-any.whl (233 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m233.2/233.2 kB\u001b[0m \u001b[31m21.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m86.8/86.8 kB\u001b[0m \u001b[31m7.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading primePy-1.3-py3-none-any.whl (4.0 kB)\n",
            "Downloading ruamel.yaml-0.18.6-py3-none-any.whl (117 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m117.8/117.8 kB\u001b[0m \u001b[31m11.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading colorlog-6.8.2-py3-none-any.whl (11 kB)\n",
            "Downloading ruamel.yaml.clib-0.2.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (526 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m526.7/526.7 kB\u001b[0m \u001b[31m33.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading Mako-1.3.5-py3-none-any.whl (78 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m78.6/78.6 kB\u001b[0m \u001b[31m7.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hBuilding wheels for collected packages: antlr4-python3-runtime, docopt, julius\n",
            "  Building wheel for antlr4-python3-runtime (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.9.3-py3-none-any.whl size=144554 sha256=0d45790bcba89ef25b40e28a352826b1e3b8e0a996f3c4c71c77a2e039838c51\n",
            "  Stored in directory: /root/.cache/pip/wheels/12/93/dd/1f6a127edc45659556564c5730f6d4e300888f4bca2d4c5a88\n",
            "  Building wheel for docopt (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for docopt: filename=docopt-0.6.2-py2.py3-none-any.whl size=13704 sha256=0a394444811d2361dc868cfb7f23c797a47cad9a1ee5485aa86c20551b33a0fe\n",
            "  Stored in directory: /root/.cache/pip/wheels/fc/ab/d4/5da2067ac95b36618c629a5f93f809425700506f72c9732fac\n",
            "  Building wheel for julius (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for julius: filename=julius-0.2.7-py3-none-any.whl size=21869 sha256=c4d0f47e1e8d2846ed38f7ba03d944e5ed1068278a16c671c7b946053b7f7446\n",
            "  Stored in directory: /root/.cache/pip/wheels/b9/b2/05/f883527ffcb7f2ead5438a2c23439aa0c881eaa9a4c80256f4\n",
            "Successfully built antlr4-python3-runtime docopt julius\n",
            "Installing collected packages: primePy, docopt, antlr4-python3-runtime, tensorboardX, semver, ruamel.yaml.clib, omegaconf, Mako, lightning-utilities, humanfriendly, ctranslate2, colorlog, av, ruamel.yaml, pyannote.core, coloredlogs, alembic, tokenizers, optuna, onnxruntime, hyperpyyaml, transformers, pyannote.database, faster-whisper, pyannote.pipeline, pyannote.metrics, torchmetrics, torch-pitch-shift, pytorch-lightning, julius, torch-audiomentations, speechbrain, pytorch-metric-learning, lightning, asteroid-filterbanks, pyannote.audio, whisperx\n",
            "  Attempting uninstall: tokenizers\n",
            "    Found existing installation: tokenizers 0.19.1\n",
            "    Uninstalling tokenizers-0.19.1:\n",
            "      Successfully uninstalled tokenizers-0.19.1\n",
            "  Attempting uninstall: transformers\n",
            "    Found existing installation: transformers 4.44.2\n",
            "    Uninstalling transformers-4.44.2:\n",
            "      Successfully uninstalled transformers-4.44.2\n",
            "  Running setup.py develop for whisperx\n",
            "Successfully installed Mako-1.3.5 alembic-1.13.3 antlr4-python3-runtime-4.9.3 asteroid-filterbanks-0.4.0 av-11.0.0 coloredlogs-15.0.1 colorlog-6.8.2 ctranslate2-4.4.0 docopt-0.6.2 faster-whisper-1.0.0 humanfriendly-10.0 hyperpyyaml-1.2.2 julius-0.2.7 lightning-2.3.3 lightning-utilities-0.11.7 omegaconf-2.3.0 onnxruntime-1.19.2 optuna-4.0.0 primePy-1.3 pyannote.audio-3.1.1 pyannote.core-5.0.0 pyannote.database-5.1.0 pyannote.metrics-3.2.1 pyannote.pipeline-3.0.1 pytorch-lightning-2.3.3 pytorch-metric-learning-2.6.0 ruamel.yaml-0.18.6 ruamel.yaml.clib-0.2.8 semver-3.0.2 speechbrain-1.0.1 tensorboardX-2.6.2.2 tokenizers-0.15.2 torch-audiomentations-0.11.1 torch-pitch-shift-1.2.5 torchmetrics-1.4.2 transformers-4.39.3 whisperx-3.1.1\n",
            "Converting requirements.txt to GBK encoding...\n",
            "Conversion completed.\n",
            "Installing dependencies from requirements.txt...\n",
            "Collecting azure-cognitiveservices-speech==1.40.0 (from -r requirements.txt (line 1))\n",
            "  Downloading azure_cognitiveservices_speech-1.40.0-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)\n",
            "Requirement already satisfied: librosa==0.10.2.post1 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 2)) (0.10.2.post1)\n",
            "Requirement already satisfied: moviepy==1.0.3 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 3)) (1.0.3)\n",
            "Requirement already satisfied: numpy==1.26.4 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 4)) (1.26.4)\n",
            "Collecting openai==1.47.0 (from -r requirements.txt (line 5))\n",
            "  Downloading openai-1.47.0-py3-none-any.whl.metadata (24 kB)\n",
            "Requirement already satisfied: opencv-python==4.10.0.84 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 6)) (4.10.0.84)\n",
            "Requirement already satisfied: openpyxl==3.1.5 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 7)) (3.1.5)\n",
            "Collecting pandas==2.2.3 (from -r requirements.txt (line 8))\n",
            "  Downloading pandas-2.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (89 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m89.9/89.9 kB\u001b[0m \u001b[31m5.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting pydub==0.25.1 (from -r requirements.txt (line 9))\n",
            "  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)\n",
            "Requirement already satisfied: PyYAML==6.0.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 10)) (6.0.2)\n",
            "Collecting replicate==0.33.0 (from -r requirements.txt (line 11))\n",
            "  Downloading replicate-0.33.0-py3-none-any.whl.metadata (25 kB)\n",
            "Requirement already satisfied: requests==2.32.3 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 12)) (2.32.3)\n",
            "Collecting resampy==0.4.3 (from -r requirements.txt (line 13))\n",
            "  Downloading resampy-0.4.3-py3-none-any.whl.metadata (3.0 kB)\n",
            "Requirement already satisfied: spacy==3.7.6 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 14)) (3.7.6)\n",
            "Collecting streamlit==1.38.0 (from -r requirements.txt (line 15))\n",
            "  Downloading streamlit-1.38.0-py2.py3-none-any.whl.metadata (8.5 kB)\n",
            "Collecting yt-dlp==2024.8.6 (from -r requirements.txt (line 16))\n",
            "  Downloading yt_dlp-2024.8.6-py3-none-any.whl.metadata (170 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m170.1/170.1 kB\u001b[0m \u001b[31m11.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hCollecting json-repair (from -r requirements.txt (line 17))\n",
            "  Downloading json_repair-0.29.7-py3-none-any.whl.metadata (10 kB)\n",
            "Requirement already satisfied: audioread>=2.1.9 in /usr/local/lib/python3.10/dist-packages (from librosa==0.10.2.post1->-r requirements.txt (line 2)) (3.0.1)\n",
            "Requirement already satisfied: scipy>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from librosa==0.10.2.post1->-r requirements.txt (line 2)) (1.13.1)\n",
            "Requirement already satisfied: scikit-learn>=0.20.0 in /usr/local/lib/python3.10/dist-packages (from librosa==0.10.2.post1->-r requirements.txt (line 2)) (1.5.2)\n",
            "Requirement already satisfied: joblib>=0.14 in /usr/local/lib/python3.10/dist-packages (from librosa==0.10.2.post1->-r requirements.txt (line 2)) (1.4.2)\n",
            "Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.10/dist-packages (from librosa==0.10.2.post1->-r requirements.txt (line 2)) (4.4.2)\n",
            "Requirement already satisfied: numba>=0.51.0 in /usr/local/lib/python3.10/dist-packages (from librosa==0.10.2.post1->-r requirements.txt (line 2)) (0.60.0)\n",
            "Requirement already satisfied: soundfile>=0.12.1 in /usr/local/lib/python3.10/dist-packages (from librosa==0.10.2.post1->-r requirements.txt (line 2)) (0.12.1)\n",
            "Requirement already satisfied: pooch>=1.1 in /usr/local/lib/python3.10/dist-packages (from librosa==0.10.2.post1->-r requirements.txt (line 2)) (1.8.2)\n",
            "Requirement already satisfied: soxr>=0.3.2 in /usr/local/lib/python3.10/dist-packages (from librosa==0.10.2.post1->-r requirements.txt (line 2)) (0.5.0.post1)\n",
            "Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from librosa==0.10.2.post1->-r requirements.txt (line 2)) (4.12.2)\n",
            "Requirement already satisfied: lazy-loader>=0.1 in /usr/local/lib/python3.10/dist-packages (from librosa==0.10.2.post1->-r requirements.txt (line 2)) (0.4)\n",
            "Requirement already satisfied: msgpack>=1.0 in /usr/local/lib/python3.10/dist-packages (from librosa==0.10.2.post1->-r requirements.txt (line 2)) (1.0.8)\n",
            "Requirement already satisfied: tqdm<5.0,>=4.11.2 in /usr/local/lib/python3.10/dist-packages (from moviepy==1.0.3->-r requirements.txt (line 3)) (4.66.5)\n",
            "Requirement already satisfied: proglog<=1.0.0 in /usr/local/lib/python3.10/dist-packages (from moviepy==1.0.3->-r requirements.txt (line 3)) (0.1.10)\n",
            "Requirement already satisfied: imageio<3.0,>=2.5 in /usr/local/lib/python3.10/dist-packages (from moviepy==1.0.3->-r requirements.txt (line 3)) (2.35.1)\n",
            "Requirement already satisfied: imageio-ffmpeg>=0.2.0 in /usr/local/lib/python3.10/dist-packages (from moviepy==1.0.3->-r requirements.txt (line 3)) (0.5.1)\n",
            "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from openai==1.47.0->-r requirements.txt (line 5)) (3.7.1)\n",
            "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai==1.47.0->-r requirements.txt (line 5)) (1.7.0)\n",
            "Collecting httpx<1,>=0.23.0 (from openai==1.47.0->-r requirements.txt (line 5))\n",
            "  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)\n",
            "Collecting jiter<1,>=0.4.0 (from openai==1.47.0->-r requirements.txt (line 5))\n",
            "  Downloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.6 kB)\n",
            "Requirement already satisfied: pydantic<3,>=1.9.0 in /usr/local/lib/python3.10/dist-packages (from openai==1.47.0->-r requirements.txt (line 5)) (2.9.2)\n",
            "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai==1.47.0->-r requirements.txt (line 5)) (1.3.1)\n",
            "Requirement already satisfied: et-xmlfile in /usr/local/lib/python3.10/dist-packages (from openpyxl==3.1.5->-r requirements.txt (line 7)) (1.1.0)\n",
            "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas==2.2.3->-r requirements.txt (line 8)) (2.8.2)\n",
            "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas==2.2.3->-r requirements.txt (line 8)) (2024.2)\n",
            "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-packages (from pandas==2.2.3->-r requirements.txt (line 8)) (2024.2)\n",
            "Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from replicate==0.33.0->-r requirements.txt (line 11)) (24.1)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests==2.32.3->-r requirements.txt (line 12)) (3.3.2)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests==2.32.3->-r requirements.txt (line 12)) (3.10)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests==2.32.3->-r requirements.txt (line 12)) (2.2.3)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests==2.32.3->-r requirements.txt (line 12)) (2024.8.30)\n",
            "Requirement already satisfied: spacy-legacy<3.1.0,>=3.0.11 in /usr/local/lib/python3.10/dist-packages (from spacy==3.7.6->-r requirements.txt (line 14)) (3.0.12)\n",
            "Requirement already satisfied: spacy-loggers<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from spacy==3.7.6->-r requirements.txt (line 14)) (1.0.5)\n",
            "Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.10/dist-packages (from spacy==3.7.6->-r requirements.txt (line 14)) (1.0.10)\n",
            "Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy==3.7.6->-r requirements.txt (line 14)) (2.0.8)\n",
            "Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.10/dist-packages (from spacy==3.7.6->-r requirements.txt (line 14)) (3.0.9)\n",
            "Requirement already satisfied: thinc<8.3.0,>=8.2.2 in /usr/local/lib/python3.10/dist-packages (from spacy==3.7.6->-r requirements.txt (line 14)) (8.2.5)\n",
            "Requirement already satisfied: wasabi<1.2.0,>=0.9.1 in /usr/local/lib/python3.10/dist-packages (from spacy==3.7.6->-r requirements.txt (line 14)) (1.1.3)\n",
            "Requirement already satisfied: srsly<3.0.0,>=2.4.3 in /usr/local/lib/python3.10/dist-packages (from spacy==3.7.6->-r requirements.txt (line 14)) (2.4.8)\n",
            "Requirement already satisfied: catalogue<2.1.0,>=2.0.6 in /usr/local/lib/python3.10/dist-packages (from spacy==3.7.6->-r requirements.txt (line 14)) (2.0.10)\n",
            "Requirement already satisfied: weasel<0.5.0,>=0.1.0 in /usr/local/lib/python3.10/dist-packages (from spacy==3.7.6->-r requirements.txt (line 14)) (0.4.1)\n",
            "Requirement already satisfied: typer<1.0.0,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from spacy==3.7.6->-r requirements.txt (line 14)) (0.12.5)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from spacy==3.7.6->-r requirements.txt (line 14)) (3.1.4)\n",
            "Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from spacy==3.7.6->-r requirements.txt (line 14)) (71.0.4)\n",
            "Requirement already satisfied: langcodes<4.0.0,>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from spacy==3.7.6->-r requirements.txt (line 14)) (3.4.1)\n",
            "Requirement already satisfied: altair<6,>=4.0 in /usr/local/lib/python3.10/dist-packages (from streamlit==1.38.0->-r requirements.txt (line 15)) (4.2.2)\n",
            "Requirement already satisfied: blinker<2,>=1.0.0 in /usr/lib/python3/dist-packages (from streamlit==1.38.0->-r requirements.txt (line 15)) (1.4)\n",
            "Requirement already satisfied: cachetools<6,>=4.0 in /usr/local/lib/python3.10/dist-packages (from streamlit==1.38.0->-r requirements.txt (line 15)) (5.5.0)\n",
            "Requirement already satisfied: click<9,>=7.0 in /usr/local/lib/python3.10/dist-packages (from streamlit==1.38.0->-r requirements.txt (line 15)) (8.1.7)\n",
            "Requirement already satisfied: pillow<11,>=7.1.0 in /usr/local/lib/python3.10/dist-packages (from streamlit==1.38.0->-r requirements.txt (line 15)) (10.4.0)\n",
            "Requirement already satisfied: protobuf<6,>=3.20 in /usr/local/lib/python3.10/dist-packages (from streamlit==1.38.0->-r requirements.txt (line 15)) (3.20.3)\n",
            "Requirement already satisfied: pyarrow>=7.0 in /usr/local/lib/python3.10/dist-packages (from streamlit==1.38.0->-r requirements.txt (line 15)) (16.1.0)\n",
            "Requirement already satisfied: rich<14,>=10.14.0 in /usr/local/lib/python3.10/dist-packages (from streamlit==1.38.0->-r requirements.txt (line 15)) (13.8.1)\n",
            "Collecting tenacity<9,>=8.1.0 (from streamlit==1.38.0->-r requirements.txt (line 15))\n",
            "  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)\n",
            "Requirement already satisfied: toml<2,>=0.10.1 in /usr/local/lib/python3.10/dist-packages (from streamlit==1.38.0->-r requirements.txt (line 15)) (0.10.2)\n",
            "Collecting gitpython!=3.1.19,<4,>=3.0.7 (from streamlit==1.38.0->-r requirements.txt (line 15))\n",
            "  Downloading GitPython-3.1.43-py3-none-any.whl.metadata (13 kB)\n",
            "Collecting pydeck<1,>=0.8.0b4 (from streamlit==1.38.0->-r requirements.txt (line 15))\n",
            "  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)\n",
            "Requirement already satisfied: tornado<7,>=6.0.3 in /usr/local/lib/python3.10/dist-packages (from streamlit==1.38.0->-r requirements.txt (line 15)) (6.3.3)\n",
            "Collecting watchdog<5,>=2.1.5 (from streamlit==1.38.0->-r requirements.txt (line 15))\n",
            "  Downloading watchdog-4.0.2-py3-none-manylinux2014_x86_64.whl.metadata (38 kB)\n",
            "Collecting brotli (from yt-dlp==2024.8.6->-r requirements.txt (line 16))\n",
            "  Downloading Brotli-1.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (5.5 kB)\n",
            "Collecting mutagen (from yt-dlp==2024.8.6->-r requirements.txt (line 16))\n",
            "  Downloading mutagen-1.47.0-py3-none-any.whl.metadata (1.7 kB)\n",
            "Collecting pycryptodomex (from yt-dlp==2024.8.6->-r requirements.txt (line 16))\n",
            "  Downloading pycryptodomex-3.21.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)\n",
            "Collecting websockets>=12.0 (from yt-dlp==2024.8.6->-r requirements.txt (line 16))\n",
            "  Downloading websockets-13.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)\n",
            "Requirement already satisfied: entrypoints in /usr/local/lib/python3.10/dist-packages (from altair<6,>=4.0->streamlit==1.38.0->-r requirements.txt (line 15)) (0.4)\n",
            "Requirement already satisfied: jsonschema>=3.0 in /usr/local/lib/python3.10/dist-packages (from altair<6,>=4.0->streamlit==1.38.0->-r requirements.txt (line 15)) (4.23.0)\n",
            "Requirement already satisfied: toolz in /usr/local/lib/python3.10/dist-packages (from altair<6,>=4.0->streamlit==1.38.0->-r requirements.txt (line 15)) (0.12.1)\n",
            "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai==1.47.0->-r requirements.txt (line 5)) (1.2.2)\n",
            "Collecting gitdb<5,>=4.0.1 (from gitpython!=3.1.19,<4,>=3.0.7->streamlit==1.38.0->-r requirements.txt (line 15))\n",
            "  Downloading gitdb-4.0.11-py3-none-any.whl.metadata (1.2 kB)\n",
            "Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai==1.47.0->-r requirements.txt (line 5))\n",
            "  Downloading httpcore-1.0.6-py3-none-any.whl.metadata (21 kB)\n",
            "Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai==1.47.0->-r requirements.txt (line 5))\n",
            "  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)\n",
            "Requirement already satisfied: language-data>=1.2 in /usr/local/lib/python3.10/dist-packages (from langcodes<4.0.0,>=3.2.0->spacy==3.7.6->-r requirements.txt (line 14)) (1.2.0)\n",
            "Requirement already satisfied: llvmlite<0.44,>=0.43.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba>=0.51.0->librosa==0.10.2.post1->-r requirements.txt (line 2)) (0.43.0)\n",
            "Requirement already satisfied: platformdirs>=2.5.0 in /usr/local/lib/python3.10/dist-packages (from pooch>=1.1->librosa==0.10.2.post1->-r requirements.txt (line 2)) (4.3.6)\n",
            "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->openai==1.47.0->-r requirements.txt (line 5)) (0.7.0)\n",
            "Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1.9.0->openai==1.47.0->-r requirements.txt (line 5)) (2.23.4)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->spacy==3.7.6->-r requirements.txt (line 14)) (2.1.5)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas==2.2.3->-r requirements.txt (line 8)) (1.16.0)\n",
            "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich<14,>=10.14.0->streamlit==1.38.0->-r requirements.txt (line 15)) (3.0.0)\n",
            "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich<14,>=10.14.0->streamlit==1.38.0->-r requirements.txt (line 15)) (2.18.0)\n",
            "Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=0.20.0->librosa==0.10.2.post1->-r requirements.txt (line 2)) (3.5.0)\n",
            "Requirement already satisfied: cffi>=1.0 in /usr/local/lib/python3.10/dist-packages (from soundfile>=0.12.1->librosa==0.10.2.post1->-r requirements.txt (line 2)) (1.17.1)\n",
            "Requirement already satisfied: blis<0.8.0,>=0.7.8 in /usr/local/lib/python3.10/dist-packages (from thinc<8.3.0,>=8.2.2->spacy==3.7.6->-r requirements.txt (line 14)) (0.7.11)\n",
            "Requirement already satisfied: confection<1.0.0,>=0.0.1 in /usr/local/lib/python3.10/dist-packages (from thinc<8.3.0,>=8.2.2->spacy==3.7.6->-r requirements.txt (line 14)) (0.1.5)\n",
            "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.10/dist-packages (from typer<1.0.0,>=0.3.0->spacy==3.7.6->-r requirements.txt (line 14)) (1.5.4)\n",
            "Requirement already satisfied: cloudpathlib<1.0.0,>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from weasel<0.5.0,>=0.1.0->spacy==3.7.6->-r requirements.txt (line 14)) (0.19.0)\n",
            "Requirement already satisfied: smart-open<8.0.0,>=5.2.1 in /usr/local/lib/python3.10/dist-packages (from weasel<0.5.0,>=0.1.0->spacy==3.7.6->-r requirements.txt (line 14)) (7.0.4)\n",
            "Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.0->soundfile>=0.12.1->librosa==0.10.2.post1->-r requirements.txt (line 2)) (2.22)\n",
            "Collecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython!=3.1.19,<4,>=3.0.7->streamlit==1.38.0->-r requirements.txt (line 15))\n",
            "  Downloading smmap-5.0.1-py3-none-any.whl.metadata (4.3 kB)\n",
            "Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6,>=4.0->streamlit==1.38.0->-r requirements.txt (line 15)) (24.2.0)\n",
            "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6,>=4.0->streamlit==1.38.0->-r requirements.txt (line 15)) (2023.12.1)\n",
            "Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6,>=4.0->streamlit==1.38.0->-r requirements.txt (line 15)) (0.35.1)\n",
            "Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6,>=4.0->streamlit==1.38.0->-r requirements.txt (line 15)) (0.20.0)\n",
            "Requirement already satisfied: marisa-trie>=0.7.7 in /usr/local/lib/python3.10/dist-packages (from language-data>=1.2->langcodes<4.0.0,>=3.2.0->spacy==3.7.6->-r requirements.txt (line 14)) (1.2.0)\n",
            "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich<14,>=10.14.0->streamlit==1.38.0->-r requirements.txt (line 15)) (0.1.2)\n",
            "Requirement already satisfied: wrapt in /usr/local/lib/python3.10/dist-packages (from smart-open<8.0.0,>=5.2.1->weasel<0.5.0,>=0.1.0->spacy==3.7.6->-r requirements.txt (line 14)) (1.16.0)\n",
            "Downloading azure_cognitiveservices_speech-1.40.0-py3-none-manylinux1_x86_64.whl (40.1 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m40.1/40.1 MB\u001b[0m \u001b[31m24.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading openai-1.47.0-py3-none-any.whl (375 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m375.6/375.6 kB\u001b[0m \u001b[31m24.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pandas-2.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (13.1 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.1/13.1 MB\u001b[0m \u001b[31m73.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)\n",
            "Downloading replicate-0.33.0-py3-none-any.whl (45 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m45.3/45.3 kB\u001b[0m \u001b[31m3.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading resampy-0.4.3-py3-none-any.whl (3.1 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.1/3.1 MB\u001b[0m \u001b[31m69.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading streamlit-1.38.0-py2.py3-none-any.whl (8.7 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m8.7/8.7 MB\u001b[0m \u001b[31m71.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading yt_dlp-2024.8.6-py3-none-any.whl (3.1 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.1/3.1 MB\u001b[0m \u001b[31m54.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading json_repair-0.29.7-py3-none-any.whl (17 kB)\n",
            "Downloading GitPython-3.1.43-py3-none-any.whl (207 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m207.3/207.3 kB\u001b[0m \u001b[31m18.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m76.4/76.4 kB\u001b[0m \u001b[31m7.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading httpcore-1.0.6-py3-none-any.whl (78 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m78.0/78.0 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading jiter-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (318 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m318.9/318.9 kB\u001b[0m \u001b[31m24.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.9/6.9 MB\u001b[0m \u001b[31m83.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading tenacity-8.5.0-py3-none-any.whl (28 kB)\n",
            "Downloading watchdog-4.0.2-py3-none-manylinux2014_x86_64.whl (82 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m82.9/82.9 kB\u001b[0m \u001b[31m7.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading websockets-13.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (164 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m164.1/164.1 kB\u001b[0m \u001b[31m13.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading Brotli-1.1.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.0 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.0/3.0 MB\u001b[0m \u001b[31m66.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading mutagen-1.47.0-py3-none-any.whl (194 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.4/194.4 kB\u001b[0m \u001b[31m15.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading pycryptodomex-3.21.0-cp36-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m2.3/2.3 MB\u001b[0m \u001b[31m64.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading gitdb-4.0.11-py3-none-any.whl (62 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m62.7/62.7 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading h11-0.14.0-py3-none-any.whl (58 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m5.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading smmap-5.0.1-py3-none-any.whl (24 kB)\n",
            "Installing collected packages: pydub, brotli, websockets, watchdog, tenacity, smmap, pycryptodomex, mutagen, json-repair, jiter, h11, azure-cognitiveservices-speech, yt-dlp, resampy, pydeck, pandas, httpcore, gitdb, httpx, gitpython, replicate, openai, streamlit\n",
            "  Attempting uninstall: tenacity\n",
            "    Found existing installation: tenacity 9.0.0\n",
            "    Uninstalling tenacity-9.0.0:\n",
            "      Successfully uninstalled tenacity-9.0.0\n",
            "  Attempting uninstall: pandas\n",
            "    Found existing installation: pandas 2.2.2\n",
            "    Uninstalling pandas-2.2.2:\n",
            "      Successfully uninstalled pandas-2.2.2\n",
            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "cudf-cu12 24.6.1 requires pandas<2.2.3dev0,>=2.0, but you have pandas 2.2.3 which is incompatible.\n",
            "google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 2.2.3 which is incompatible.\u001b[0m\u001b[31m\n",
            "\u001b[0mSuccessfully installed azure-cognitiveservices-speech-1.40.0 brotli-1.1.0 gitdb-4.0.11 gitpython-3.1.43 h11-0.14.0 httpcore-1.0.6 httpx-0.27.2 jiter-0.5.0 json-repair-0.29.7 mutagen-1.47.0 openai-1.47.0 pandas-2.2.3 pycryptodomex-3.21.0 pydeck-0.9.1 pydub-0.25.1 replicate-0.33.0 resampy-0.4.3 smmap-5.0.1 streamlit-1.38.0 tenacity-8.5.0 watchdog-4.0.2 websockets-13.1 yt-dlp-2024.8.6\n",
            "Downloading UVR model: HP2_all_vocals.pth...\n",
            "Downloaded: 0.01%\n",
            "HP2_all_vocals.pth downloaded successfully.\n",
            "Downloading UVR model: VR-DeEchoAggressive.pth...\n",
            "Downloaded: 0.00%\n",
            "VR-DeEchoAggressive.pth downloaded successfully.\n",
            "Downloading FFmpeg...\n",
            "FFmpeg has been downloaded to ffmpeg.tar.xz\n",
            "Extracting FFmpeg...\n",
            "Cleaning up...\n",
            "FFmpeg extraction completed.\n",
            "\u001b[1;32m╭───────────────────────────────────────╮\u001b[0m\n",
            "\u001b[1;32m│\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32mAll installation steps are completed!\u001b[0m\u001b[1;32m \u001b[0m\u001b[1;32m│\u001b[0m\n",
            "\u001b[1;32m╰───────────────────────────────────────╯\u001b[0m\n",
            "Please use the following command to start Streamlit:\n",
            "\u001b[1;36mstreamlit run st.py\u001b[0m\n"
          ]
        }
      ],
      "source": [
        "!python install.py"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5J2jjEY2BrM3"
      },
      "source": [
        "## 3. Register and Obtain Ngrok Token 🔑\n",
        "\n",
        "1. Visit the [Ngrok official website](https://ngrok.com/) and register for an account.\n",
        "2. After logging in, find the \"Your Authtoken\" section on the dashboard page, or directly visit [Ngrok Token](https://dashboard.ngrok.com/get-started/your-authtoken).\n",
        "3. Copy your Ngrok Authtoken.\n",
        "\n",
        "After completing these steps, please fill in your ngrok token in the next section of code and proceed.\n",
        "\n",
        "---\n",
        "\n",
        "## 3. 注册并获取 Ngrok 令牌 🔑\n",
        "\n",
        "1. 访问 [Ngrok 官方网站](https://ngrok.com/) 并注册账户。\n",
        "2. 登录后，在仪表板页面找到\"Your Authtoken\"部分，或直接访问 [Ngrok 令牌](https://dashboard.ngrok.com/get-started/your-authtoken)。\n",
        "3. 复制您的 Ngrok Authtoken。\n",
        "\n",
        "完成这些步骤后，请在下一节代码中填入您的 ngrok 令牌并继续。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "tm25rblnBqhl",
        "outputId": "93691228-1e7b-48e7-b7f3-c27007a4a13f"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Requirement already satisfied: pyngrok in /usr/local/lib/python3.10/dist-packages (7.2.0)\n",
            "Requirement already satisfied: PyYAML>=5.1 in /usr/local/lib/python3.10/dist-packages (from pyngrok) (6.0.2)\n"
          ]
        }
      ],
      "source": [
        "!pip install pyngrok\n",
        "from pyngrok import ngrok\n",
        "\n",
        "#! SET Ngrok Authtoken Here\n",
        "ngrok.set_auth_token(\"YOUR_TOKEN_HERE\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MDJXtgMuGXMH"
      },
      "source": [
        "## 🎈 4. Streamlit GO !!!\n",
        "Click the NgrokChannel URL to start your VideoLingo Journey.\n",
        "\n",
        "> tips: You can set your Language down the sidebar on the left."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 294
        },
        "id": "qr_a4mS29k5_",
        "outputId": "e6915708-d030-45d0-d6a8-cb177960c137"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\n",
            "Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.\n",
            "\n"
          ]
        },
        {
          "data": {
            "text/html": [
              "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">╭───────────────────────────────────╮\n",
              "│ Streamlit is available at Ngrok ⬇️ │\n",
              "╰───────────────────────────────────╯\n",
              "</pre>\n"
            ],
            "text/plain": [
              "╭───────────────────────────────────╮\n",
              "│ Streamlit is available at Ngrok ⬇️ │\n",
              "╰───────────────────────────────────╯\n"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Click 👉 NgrokTunnel: \"https://0308-34-125-196-161.ngrok-free.app\" -> \"http://localhost:8501\"\n",
            "\n",
            "  You can now view your Streamlit app in your browser.\n",
            "\n",
            "  Local URL: http://localhost:8501\n",
            "  Network URL: http://172.28.0.12:8501\n",
            "  External URL: http://34.125.196.161:8501\n",
            "\n",
            "  Stopping...\n",
            "Interrupted by user, shutting down...\n"
          ]
        }
      ],
      "source": [
        "import subprocess\n",
        "import threading\n",
        "import sys\n",
        "from pyngrok import ngrok\n",
        "from rich import print as rprint\n",
        "from rich.panel import Panel\n",
        "\n",
        "def print_output(process):\n",
        "    for line in iter(process.stdout.readline, ''):\n",
        "        sys.stdout.write(line)\n",
        "    for line in iter(process.stderr.readline, ''):\n",
        "        sys.stderr.write(line)\n",
        "\n",
        "# Start Streamlit\n",
        "streamlit_process = subprocess.Popen(\n",
        "    [\"streamlit\", \"run\", \"st.py\"],\n",
        "    stdout=subprocess.PIPE,\n",
        "    stderr=subprocess.PIPE,\n",
        "    universal_newlines=True,\n",
        "    bufsize=1\n",
        ")\n",
        "\n",
        "# Create and start the output printing thread\n",
        "output_thread = threading.Thread(target=print_output, args=(streamlit_process,))\n",
        "output_thread.start()\n",
        "\n",
        "# Create a tunnel using ngrok\n",
        "public_url = ngrok.connect(8501)\n",
        "rprint(Panel(f\"Streamlit is available at Ngrok ⬇️\", expand=False))\n",
        "print(f\"Click 👉 {public_url}\")\n",
        "\n",
        "# Keep the program running\n",
        "ngrok_process = ngrok.get_ngrok_process()\n",
        "try:\n",
        "    streamlit_process.wait()\n",
        "except KeyboardInterrupt:\n",
        "    print(\"Interrupted by user, shutting down...\")\n",
        "finally:\n",
        "    ngrok.kill()\n",
        "    streamlit_process.terminate()\n",
        "    output_thread.join()\n"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "T4",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}


================================================
FILE: batch/OneKeyBatch.bat
================================================
@echo off
cd /D "%~dp0"
cd ..

call conda activate videolingo

@rem 运行批处理脚本
call python batch\utils\batch_processor.py

:end
pause


================================================
FILE: batch/README.md
================================================
# VideoLingo Batch Mode

[English](./README.md) | [简体中文](./README.zh.md)

Before utilizing the batch mode, ensure you have used the Streamlit mode and properly configured the parameters in `config.yaml`.

## Usage Guide

### 1. Video File Preparation

- Place your video files in the `input` folder
- YouTube links can be specified in the next step

### 2. Task Configuration

Edit the `tasks_setting.xlsx` file:

| Field | Description | Acceptable Values |
|-------|-------------|-------------------|
| Video File | Video filename (without `input/` prefix) or YouTube URL | - |
| Source Language | Source language | 'en', 'zh', ... or leave empty for default |
| Target Language | Translation language | Use natural language description, or leave empty for default |
| Dubbing | Enable dubbing | 0 or empty: no dubbing; 1: enable dubbing |

Example:

| Video File | Source Language | Target Language | Dubbing |
|------------|-----------------|-----------------|---------|
| https://www.youtube.com/xxx | | German | |
| Kungfu Panda.mp4 | |  | 1 |

### 3. Executing Batch Processing

1. Double-click to run `OneKeyBatch.bat`
2. Output files will be saved in the `output` folder
3. Task status can be monitored in the `Status` column of `tasks_setting.xlsx`

> Note: Keep `tasks_setting.xlsx` closed during execution to prevent interruptions due to file access conflicts.

## Important Considerations

### Handling Interruptions

If the command line is closed unexpectedly, language settings in `config.yaml` may be altered. Check settings before retrying.

### Error Management

- Failed files will be moved to the `output/ERROR` folder
- Error messages are recorded in the `Status` column of `tasks_setting.xlsx`
- To retry:
  1. Move the single video folder from `ERROR` to the root directory
  2. Rename it to `output`
  3. Use Streamlit mode to process again


================================================
FILE: batch/README.zh.md
================================================
# VideoLingo Batch Mode

[English](./README.md) | [简体中文](./README.zh.md)

在使用批处理模式前，请确保你已经使用过 Streamlit 模式并正确设置了 `config.yaml` 中的参数。

## 使用方法

### 1. 准备视频文件

- 将要处理的视频文件放入 `input` 文件夹
- YouTube 链接可在下一步填写

### 2. 配置任务

编辑 `tasks_setting.xlsx` 文件：

| 字段 | 说明 | 可选值 |
|------|------|--------|
| Video File | 视频文件名（无需 `input/` 前缀）或 YouTube 链接 | - |
| Source Language | 源语言 | 'en', 'zh', ... 或留空使用默认设置 |
| Target Language | 翻译语言 | 使用自然语言描述，或留空使用默认设置 |
| Dubbing | 是否配音 | 0 或留空：不配音；1：配音 |

示例：

| Video File | Source Language | Target Language | Dubbing |
|------------|-----------------|-----------------|---------|
| https://www.youtube.com/xxx | | German | |
| Kungfu Panda.mp4 | |  | 1 |

### 3. 运行批处理

1. 双击运行 `OneKeyBatch.bat`
2. 输出文件将保存在 `output` 文件夹
3. 任务状态可在 `tasks_setting.xlsx` 的 `Status` 列查看

> 注意在运行时保持 `tasks_setting.xlsx` 关闭，否则会因占用无法写入而中断。

## 注意事项

### 中断处理

如果中途关闭命令行，`config.yaml` 中的语言设置可能会改变。重试前请检查设置。

### 错误处理

- 处理失败的文件会被移至 `output/ERROR` 文件夹
- 错误信息记录在 `tasks_setting.xlsx` 的 `Status` 列
- 如需重试：
  1. 将 `ERROR` 下的单个视频文件夹移至根目录
  2. 重命名为 `output`
  3. 使用 Streamlit 模式重新执行


================================================
FILE: batch/utils/batch_processor.py
================================================
import os
import gc
from batch.utils.settings_check import check_settings
from batch.utils.video_processor import process_video
from core.utils.config_utils import load_key, update_key
import pandas as pd
from rich.console import Console
from rich.panel import Panel
import time
import shutil

console = Console()

def record_and_update_config(source_language, target_language):
    original_source_lang = load_key('whisper.language')
    original_target_lang = load_key('target_language')
    
    if source_language and not pd.isna(source_language):
        update_key('whisper.language', source_language)
    if target_language and not pd.isna(target_language):
        update_key('target_language', target_language)
    
    return original_source_lang, original_target_lang

def process_batch():
    if not check_settings():
        raise Exception("Settings check failed")

    df = pd.read_excel('batch/tasks_setting.xlsx')
    for index, row in df.iterrows():
        if pd.isna(row['Status']) or 'Error' in str(row['Status']):
            total_tasks = len(df)
            video_file = row['Video File']
            
            if not pd.isna(row['Status']) and 'Error' in str(row['Status']):
                console.print(Panel(f"Retrying failed task: {video_file}\nTask {index + 1}/{total_tasks}", 
                                 title="[bold yellow]Retry Task", expand=False))
                
                # Restore files from batch/output/ERROR to output
                error_folder = os.path.join('batch', 'output', 'ERROR', os.path.splitext(video_file)[0])
                
                if os.path.exists(error_folder):
                    # Ensure the output folder exists
                    os.makedirs('output', exist_ok=True)
                    
                    # Copy all contents from ERROR folder for the specific video to output
                    for item in os.listdir(error_folder):
                        src_path = os.path.join(error_folder, item)
                        dst_path = os.path.join('output', item)
                        
                        if os.path.isdir(src_path):
                            if os.path.exists(dst_path):
                                shutil.rmtree(dst_path)
                            shutil.copytree(src_path, dst_path)
                        else:
                            if os.path.exists(dst_path):
                                os.remove(dst_path)
                            shutil.copy2(src_path, dst_path)
                            
                    console.print(f"[green]Restored files from ERROR folder for {video_file}")
                else:
                    console.print(f"[yellow]Warning: Error folder not found: {error_folder}")
            else:
                console.print(Panel(f"Now processing task: {video_file}\nTask {index + 1}/{total_tasks}", 
                                 title="[bold blue]Current Task", expand=False))
            
            source_language = row['Source Language']
            target_language = row['Target Language']
            
            original_source_lang, original_target_lang = record_and_update_config(source_language, target_language)
            
            try:
                dubbing = 0 if pd.isna(row['Dubbing']) else int(row['Dubbing'])
                is_retry = not pd.isna(row['Status']) and 'Error' in str(row['Status'])
                status, error_step, error_message = process_video(video_file, dubbing, is_retry)
                status_msg = "Done" if status else f"Error: {error_step} - {error_message}"
            except Exception as e:
                status_msg = f"Error: Unhandled exception - {str(e)}"
                console.print(f"[bold red]Error processing {video_file}: {status_msg}")
            finally:
                update_key('whisper.language', original_source_lang)
                update_key('target_language', original_target_lang)
                
                df.at[index, 'Status'] = status_msg
                df.to_excel('batch/tasks_setting.xlsx', index=False)
                
                gc.collect()
                
                time.sleep(1)
        else:
            print(f"Skipping task: {row['Video File']} - Status: {row['Status']}")

    console.print(Panel("All tasks processed!\nCheck out in `batch/output`!", 
                       title="[bold green]Batch Processing Complete", expand=False))

if __name__ == "__main__":
    process_batch()

================================================
FILE: batch/utils/settings_check.py
================================================
import os
import pandas as pd
from rich.console import Console
from rich.panel import Panel

# Constants
SETTINGS_FILE = 'batch/tasks_setting.xlsx'
INPUT_FOLDER = os.path.join('batch', 'input')
VALID_DUBBING_VALUES = [0, 1]

console = Console()

def check_settings():
    os.makedirs(INPUT_FOLDER, exist_ok=True)
    df = pd.read_excel(SETTINGS_FILE)
    input_files = set(os.listdir(INPUT_FOLDER))
    excel_files = set(df['Video File'].tolist())
    files_not_in_excel = input_files - excel_files

    all_passed = True
    local_video_tasks = 0
    url_tasks = 0

    if files_not_in_excel:
        console.print(Panel(
            "\n".join([f"- {file}" for file in files_not_in_excel]),
            title="[bold red]Warning: Files in input folder not mentioned in Excel sheet",
            expand=False
        ))
        all_passed = False

    for index, row in df.iterrows():
        video_file = row['Video File']
        source_language = row['Source Language']
        dubbing = row['Dubbing']

        if video_file.startswith('http'):
            url_tasks += 1
        elif os.path.isfile(os.path.join(INPUT_FOLDER, video_file)):
            local_video_tasks += 1
        else:
            console.print(Panel(f"Invalid video file or URL 「{video_file}」", title=f"[bold red]Error in row {index + 2}", expand=False))
            all_passed = False

        if not pd.isna(dubbing):
            if int(dubbing) not in VALID_DUBBING_VALUES:
                console.print(Panel(f"Invalid dubbing value 「{dubbing}」", title=f"[bold red]Error in row {index + 2}", expand=False))
                all_passed = False

    if all_passed:
        console.print(Panel(f"✅ All settings passed the check!\nDetected {local_video_tasks} local video tasks and {url_tasks} URL tasks.", title="[bold green]Success", expand=False))

    return all_passed


if __name__ == "__main__":  
    check_settings()

================================================
FILE: batch/utils/video_processor.py
================================================
import os
from core.st_utils.imports_and_utils import *
from core.utils.onekeycleanup import cleanup
from core.utils import load_key
import shutil
from functools import partial
from rich.panel import Panel
from rich.console import Console
from core import *

console = Console()

INPUT_DIR = 'batch/input'
OUTPUT_DIR = 'output'
SAVE_DIR = 'batch/output'
ERROR_OUTPUT_DIR = 'batch/output/ERROR'
YTB_RESOLUTION_KEY = "ytb_resolution"

def process_video(file, dubbing=False, is_retry=False):
    if not is_retry:
        prepare_output_folder(OUTPUT_DIR)
    
    text_steps = [
        ("🎥 Processing input file", partial(process_input_file, file)),
        ("🎙️ Transcribing with Whisper", partial(_2_asr.transcribe)),
        ("✂️ Splitting sentences", split_sentences),
        ("📝 Summarizing and translating", summarize_and_translate),
        ("⚡ Processing and aligning subtitles", process_and_align_subtitles),
        ("🎬 Merging subtitles to video", _7_sub_into_vid.merge_subtitles_to_video),
    ]
    
    if dubbing:
        dubbing_steps = [
            ("🔊 Generating audio tasks", gen_audio_tasks),
            ("🎵 Extracting reference audio", _9_refer_audio.extract_refer_audio_main),
            ("🗣️ Generating audio", _10_gen_audio.gen_audio),
            ("🔄 Merging full audio", _11_merge_audio.merge_full_audio),
            ("🎞️ Merging dubbing to video", _12_dub_to_vid.merge_video_audio),
        ]
        text_steps.extend(dubbing_steps)
    
    current_step = ""
    for step_name, step_func in text_steps:
        current_step = step_name
        for attempt in range(3):
            try:
                console.print(Panel(
                    f"[bold green]{step_name}[/]",
                    subtitle=f"Attempt {attempt + 1}/3" if attempt > 0 else None,
                    border_style="blue"
                ))
                result = step_func()
                if result is not None:
                    globals().update(result)
                break
            except Exception as e:
                if attempt == 2:
                    error_panel = Panel(
                        f"[bold red]Error in step '{current_step}':[/]\n{str(e)}",
                        border_style="red"
                    )
                    console.print(error_panel)
                    cleanup(ERROR_OUTPUT_DIR)
                    return False, current_step, str(e)
                console.print(Panel(
                    f"[yellow]Attempt {attempt + 1} failed. Retrying...[/]",
                    border_style="yellow"
                ))
    
    console.print(Panel("[bold green]All steps completed successfully! 🎉[/]", border_style="green"))
    cleanup(SAVE_DIR)
    return True, "", ""

def prepare_output_folder(output_folder):
    if os.path.exists(output_folder):
        shutil.rmtree(output_folder)
    os.makedirs(output_folder)

def process_input_file(file):
    if file.startswith('http'):
        _1_ytdlp.download_video_ytdlp(file, resolution=load_key(YTB_RESOLUTION_KEY))
        video_file = _1_ytdlp.find_video_files()
    else:
        input_file = os.path.join('batch', 'input', file)
        output_file = os.path.join(OUTPUT_DIR, file)
        shutil.copy(input_file, output_file)
        video_file = output_file
    return {'video_file': video_file}

def split_sentences():
    _3_1_split_nlp.split_by_spacy()
    _3_2_split_meaning.split_sentences_by_meaning()

def summarize_and_translate():
    _4_1_summarize.get_summary()
    _4_2_translate.translate_all()

def process_and_align_subtitles():
    _5_split_sub.split_for_sub_main()
    _6_gen_sub.align_timestamp_main()

def gen_audio_tasks():
    _8_1_audio_task.gen_audio_task_main()
    _8_2_dub_chunks.gen_dub_chunks()


================================================
FILE: config.yaml
================================================
# * Settings marked with * are advanced settings that won't appear in the Streamlit page and can only be modified manually in config.py
# recommend to set in streamlit page
# -------------------
# version: "3.0.0"
# author: "Huanshere"
# -------------------

## ======================== Basic Settings ======================== ##

display_language: "zh-CN"

# API settings
api:
  key: 'your-api-key'
  base_url: 'https://yunwu.ai'
  model: 'gpt-4.1-2025-04-14'
  llm_support_json: false
# *Number of LLM multi-threaded accesses, set to 1 if using local LLM
max_workers: 4

# Language settings, written into the prompt, can be described in natural language
target_language: '简体中文'

# Whether to use Demucs for vocal separation before transcription
demucs: true

whisper:
  # ["large-v3", "large-v3-turbo"]. Note: for zh model will force to use Belle/large-v3
  model: 'large-v3'
  # Whisper specified recognition language ISO 639-1
  language: 'en'
  detected_language: 'en'
  # Whisper running mode ["local", "cloud", "elevenlabs"]. Specifies where to run, cloud uses 302.ai API
  runtime: 'local'
  # 302.ai API key
  whisperX_302_api_key: 'your_302_api_key'
  # ElevenLabs API key (experimental)
  elevenlabs_api_key: 'your_elevenlabs_api_key'

# Whether to burn subtitles into the video
burn_subtitles: true

## ======================== Advanced Settings ======================== ##
# *🔬 h264_nvenc GPU acceleration for ffmpeg, make sure your GPU supports it
ffmpeg_gpu: false

# *Youtube settings
youtube:
  cookies_path: ''

# *Default resolution for downloading YouTube videos [360, 1080, best]
ytb_resolution: '1080'

subtitle:
  # *Maximum length of each subtitle line in characters
  max_length: 75
  # *Translated subtitles are slightly larger than source subtitles, affecting the reference length for subtitle splitting
  target_multiplier: 1.2

# *Summary length, set low to 2k if using local LLM
summary_length: 8000

# *Maximum number of words for the first rough cut, below 18 will cut too finely affecting translation, above 22 is too long and will make subsequent subtitle splitting difficult to align
max_split_length: 20

# *Whether to reflect the translation result in the original text
reflect_translate: true

# *Whether to pause after extracting professional terms and before translation, allowing users to manually adjust the terminology table output\log\terminology.json
pause_before_translate: false

## ======================== Dubbing Settings ======================== ##
# TTS selection [sf_fish_tts, openai_tts, gpt_sovits, azure_tts, fish_tts, edge_tts, custom_tts]
tts_method: 'azure_tts'

# SiliconFlow FishTTS
sf_fish_tts:
  # SiliconFlow API key
  api_key: 'YOUR_API_KEY'
  # only for mode "preset"
  voice: 'anna'
  # *only for mode "custom", dont set manually
  custom_name: ''
  voice_id: ''
  # preset, custom, dynamic
  mode: "preset"

# OpenAI TTS-1 API configuration, 302.ai API only
openai_tts:
  api_key: 'YOUR_302_API_KEY'
  voice: 'alloy'

# Azure configuration, 302.ai API only
azure_tts:
  api_key: 'YOUR_302_API_KEY'
  voice: 'zh-CN-YunfengNeural'

# FishTTS configuration, 302.ai API only
fish_tts:
  api_key: 'YOUR_302_API_KEY'
  character: 'AD学姐'
  character_id_dict:
    'AD学姐': '7f92f8afb8ec43bf81429cc1c9199cb1'
    '丁真': '54a5170264694bfc8e9ad98df7bd89c3'

# SiliconFlow CosyVoice2 Clone
sf_cosyvoice2:
  api_key: 'YOUR_SF_KEY'

# Edge TTS configuration
edge_tts:
  voice: 'zh-CN-XiaoxiaoNeural'

# SoVITS configuration
gpt_sovits:
  character: 'Huanyuv2'
  refer_mode: 3

f5tts:
  302_api: 'YOUR_302_API_KEY'

# *Audio speed range
speed_factor:
  min: 1
  accept: 1.2 # Maximum acceptable speed
  max: 1.4

# *Merge audio configuration
min_subtitle_duration: 2.5 # Minimum subtitle duration, will be forcibly extended
min_trim_duration: 3.5 # Subtitles shorter than this value won't be split
tolerance: 1.5 # Allowed extension time to the next subtitle





## ======================== Additional settings ======================== ##

# Whisper model directory
model_dir: './_model_cache'

# Supported upload video formats
allowed_video_formats:
- 'mp4'
- 'mov'
- 'avi'
- 'mkv'
- 'flv'
- 'wmv'
- 'webm'

allowed_audio_formats:
- 'wav'
- 'mp3'
- 'flac'
- 'm4a'

# Spacy models
spacy_model_map:
  en: 'en_core_web_md'
  ru: 'ru_core_news_md'
  fr: 'fr_core_news_md'
  ja: 'ja_core_news_md'
  es: 'es_core_news_md'
  de: 'de_core_news_md'
  it: 'it_core_news_md'
  zh: 'zh_core_web_md'

# Languages that use space as separator
language_split_with_space:
- 'en'
- 'es'
- 'fr'
- 'de'
- 'it'
- 'ru'

# Languages that do not use space as separator
language_split_without_space:
- 'zh'
- 'ja'


================================================
FILE: core/_10_gen_audio.py
================================================
import os
import time
import shutil
import subprocess
from typing import Tuple

import pandas as pd
from pydub import AudioSegment
from rich.console import Console
from rich.progress import Progress
from concurrent.futures import ThreadPoolExecutor, as_completed

from core.utils import *
from core.utils.models import *
from core.asr_backend.audio_preprocess import get_audio_duration
from core.tts_backend.tts_main import tts_main

console = Console()

TEMP_FILE_TEMPLATE = f"{_AUDIO_TMP_DIR}/{{}}_temp.wav"
OUTPUT_FILE_TEMPLATE = f"{_AUDIO_SEGS_DIR}/{{}}.wav"
WARMUP_SIZE = 5

def parse_df_srt_time(time_str: str) -> float:
    """Convert SRT time format to seconds"""
    hours, minutes, seconds = time_str.strip().split(':')
    seconds, milliseconds = seconds.split('.')
    return int(hours) * 3600 + int(minutes) * 60 + int(seconds) + int(milliseconds) / 1000

def adjust_audio_speed(input_file: str, output_file: str, speed_factor: float) -> None:
    """Adjust audio speed and handle edge cases"""
    # If the speed factor is close to 1, directly copy the file
    if abs(speed_factor - 1.0) < 0.001:
        shutil.copy2(input_file, output_file)
        return
        
    atempo = speed_factor
    cmd = ['ffmpeg', '-i', input_file, '-filter:a', f'atempo={atempo}', '-y', output_file]
    input_duration = get_audio_duration(input_file)
    max_retries = 2
    for attempt in range(max_retries):
        try:
            subprocess.run(cmd, check=True, stderr=subprocess.PIPE)
            output_duration = get_audio_duration(output_file)
            expected_duration = input_duration / speed_factor
            diff = output_duration - expected_duration
            # If the output duration exceeds the expected duration, but the input audio is less than 3 seconds, and the error is within 0.1 seconds, truncate to the expected length
            if output_duration >= expected_duration * 1.02 and input_duration < 3 and diff <= 0.1:
                audio = AudioSegment.from_wav(output_file)
                trimmed_audio = audio[:(expected_duration * 1000)]  # pydub uses milliseconds
                trimmed_audio.export(output_file, format="wav")
                print(f"✂️ Trimmed to expected duration: {expected_duration:.2f} seconds")
                return
            elif output_duration >= expected_duration * 1.02:
                raise Exception(f"Audio duration abnormal: input file={input_file}, output file={output_file}, speed factor={speed_factor}, input duration={input_duration:.2f}s, output duration={output_duration:.2f}s")
            return
        except subprocess.CalledProcessError as e:
            if attempt < max_retries - 1:
                rprint(f"[yellow]⚠️ Audio speed adjustment failed, retrying in 1s ({attempt + 1}/{max_retries})[/yellow]")
                time.sleep(1)
            else:
                rprint(f"[red]❌ Audio speed adjustment failed, max retries reached ({max_retries})[/red]")
                raise e

def process_row(row: pd.Series, tasks_df: pd.DataFrame) -> Tuple[int, float]:
    """Helper function for processing single row data"""
    number = row['number']
    lines = eval(row['lines']) if isinstance(row['lines'], str) else row['lines']
    real_dur = 0
    for line_index, line in enumerate(lines):
        temp_file = TEMP_FILE_TEMPLATE.format(f"{number}_{line_index}")
        tts_main(line, temp_file, number, tasks_df)
        real_dur += get_audio_duration(temp_file)
    return number, real_dur

def generate_tts_audio(tasks_df: pd.DataFrame) -> pd.DataFrame:
    """Generate TTS audio sequentially and calculate actual duration"""
    tasks_df['real_dur'] = 0
    rprint("[bold green]🎯 Starting TTS audio generation...[/bold green]")
    
    with Progress() as progress:
        task = progress.add_task("[cyan]🔄 Generating TTS audio...", total=len(tasks_df))
        
        # warm up for first 5 rows
        warmup_size = min(WARMUP_SIZE, len(tasks_df))
        for _, row in tasks_df.head(warmup_size).iterrows():
            try:
                number, real_dur = process_row(row, tasks_df)
                tasks_df.loc[tasks_df['number'] == number, 'real_dur'] = real_dur
                progress.advance(task)
            except Exception as e:
                rprint(f"[red]❌ Error in warmup: {str(e)}[/red]")
                raise e
        
        # for gpt_sovits, do not use parallel to avoid mistakes
        max_workers = load_key("max_workers") if load_key("tts_method") != "gpt_sovits" else 1
        # parallel processing for remaining tasks
        if len(tasks_df) > warmup_size:
            remaining_tasks = tasks_df.iloc[warmup_size:].copy()
            with ThreadPoolExecutor(max_workers=max_workers) as executor:
                futures = [
                    executor.submit(process_row, row, tasks_df.copy())
                    for _, row in remaining_tasks.iterrows()
                ]
                
                for future in as_completed(futures):
                    try:
                        number, real_dur = future.result()
                        tasks_df.loc[tasks_df['number'] == number, 'real_dur'] = real_dur
                        progress.advance(task)
                    except Exception as e:
                        rprint(f"[red]❌ Error: {str(e)}[/red]")
                        raise e

    rprint("[bold green]✨ TTS audio generation completed![/bold green]")
    return tasks_df

def process_chunk(chunk_df: pd.DataFrame, accept: float, min_speed: float) -> tuple[float, bool]:
    """Process audio chunk and calculate speed factor"""
    chunk_durs = chunk_df['real_dur'].sum()
    tol_durs = chunk_df['tol_dur'].sum()
    durations = tol_durs - chunk_df.iloc[-1]['tolerance']
    all_gaps = chunk_df['gap'].sum() - chunk_df.iloc[-1]['gap']
    
    keep_gaps = True
    speed_var_error = 0.1

    if (chunk_durs + all_gaps) / accept < durations:
        speed_factor = max(min_speed, (chunk_durs + all_gaps) / (durations-speed_var_error))
    elif chunk_durs / accept < durations:
        speed_factor = max(min_speed, chunk_durs / (durations-speed_var_error))
        keep_gaps = False
    elif (chunk_durs + all_gaps) / accept < tol_durs:
        speed_factor = max(min_speed, (chunk_durs + all_gaps) / (tol_durs-speed_var_error))
    else:
        speed_factor = chunk_durs / (tol_durs-speed_var_error)
        keep_gaps = False
        
    return round(speed_factor, 3), keep_gaps

def merge_chunks(tasks_df: pd.DataFrame) -> pd.DataFrame:
    """Merge audio chunks and adjust timeline"""
    rprint("[bold blue]🔄 Starting audio chunks processing...[/bold blue]")
    accept = load_key("speed_factor.accept")
    min_speed = load_key("speed_factor.min")
    chunk_start = 0
    
    tasks_df['new_sub_times'] = None
    
    for index, row in tasks_df.iterrows():
        if row['cut_off'] == 1:
            chunk_df = tasks_df.iloc[chunk_start:index+1].reset_index(drop=True)
            speed_factor, keep_gaps = process_chunk(chunk_df, accept, min_speed)
            
            # 🎯 Step1: Start processing new timeline
            chunk_start_time = parse_df_srt_time(chunk_df.iloc[0]['start_time'])
            chunk_end_time = parse_df_srt_time(chunk_df.iloc[-1]['end_time']) + chunk_df.iloc[-1]['tolerance'] # 加上tolerance才是这一块的结束
            cur_time = chunk_start_time
            for i, row in chunk_df.iterrows():
                # If i is not 0, which is not the first row of the chunk, cur_time needs to be added with the gap of the previous row, remember to divide by speed_factor
                if i != 0 and keep_gaps:
                    cur_time += chunk_df.iloc[i-1]['gap']/speed_factor
                new_sub_times = []
                number = row['number']
                lines = eval(row['lines']) if isinstance(row['lines'], str) else row['lines']
                for line_index, line in enumerate(lines):
                    # 🔄 Step2: Start speed change and save as OUTPUT_FILE_TEMPLATE
                    temp_file = TEMP_FILE_TEMPLATE.format(f"{number}_{line_index}")
                    output_file = OUTPUT_FILE_TEMPLATE.format(f"{number}_{line_index}")
                    adjust_audio_speed(temp_file, output_file, speed_factor)
                    ad_dur = get_audio_duration(output_file)
                    new_sub_times.append([cur_time, cur_time+ad_dur])
                    cur_time += ad_dur
                # 🔄 Step3: Find corresponding main DataFrame index and update new_sub_times
                main_df_idx = tasks_df[tasks_df['number'] == row['number']].index[0]
                tasks_df.at[main_df_idx, 'new_sub_times'] = new_sub_times
                # 🎯 Step4: Choose emoji based on speed_factor and accept comparison
                emoji = "⚡" if speed_factor <= accept else "⚠️"
                rprint(f"[cyan]{emoji} Processed chunk {chunk_start} to {index} with speed factor {speed_factor}[/cyan]")
            # 🔄 Step5: Check if the last row exceeds the range
            if cur_time > chunk_end_time:
                time_diff = cur_time - chunk_end_time
                if time_diff <= 0.6:  # If exceeding time is within 0.6 seconds, truncate the last audio
                    rprint(f"[yellow]⚠️ Chunk {chunk_start} to {index} exceeds by {time_diff:.3f}s, truncating last audio[/yellow]")
                    # Get the last audio file
                    last_number = tasks_df.iloc[index]['number']
                    last_lines = eval(tasks_df.iloc[index]['lines']) if isinstance(tasks_df.iloc[index]['lines'], str) else tasks_df.iloc[index]['lines']
                    last_line_index = len(last_lines) - 1
                    last_file = OUTPUT_FILE_TEMPLATE.format(f"{last_number}_{last_line_index}")
                    
                    # Calculate the duration to keep
                    audio = AudioSegment.from_wav(last_file)
                    original_duration = len(audio) / 1000  # Convert to seconds
                    new_duration = original_duration - time_diff
                    trimmed_audio = audio[:(new_duration * 1000)]  # pydub uses milliseconds
                    trimmed_audio.export(last_file, format="wav")
                    
                    # Update the last timestamp
                    last_times = tasks_df.at[index, 'new_sub_times']
                    last_times[-1][1] = chunk_end_time
                    tasks_df.at[index, 'new_sub_times'] = last_times
                else:
                    raise Exception(f"Chunk {chunk_start} to {index} exceeds the chunk end time {chunk_end_time:.2f} seconds with current time {cur_time:.2f} seconds")
            chunk_start = index+1
    
    rprint("[bold green]✅ Audio chunks processing completed![/bold green]")
    return tasks_df

def gen_audio() -> None:
    """Main function: Generate audio and process timeline"""
    rprint("[bold magenta]🚀 Starting audio generation process...[/bold magenta]")
    
    # 🎯 Step1: Create necessary directories
    os.makedirs(_AUDIO_TMP_DIR, exist_ok=True)
    os.makedirs(_AUDIO_SEGS_DIR, exist_ok=True)
    
    # 📝 Step2: Load task file
    tasks_df = pd.read_excel(_8_1_AUDIO_TASK)
    rprint("[green]📊 Loaded task file successfully[/green]")
    
    # 🔊 Step3: Generate TTS audio
    tasks_df = generate_tts_audio(tasks_df)
    
    # 🔄 Step4: Merge audio chunks
    tasks_df = merge_chunks(tasks_df)
    
    # 💾 Step5: Save results
    tasks_df.to_excel(_8_1_AUDIO_TASK, index=False)
    rprint("[bold green]🎉 Audio generation completed successfully![/bold green]")

if __name__ == "__main__":
    gen_audio()


================================================
FILE: core/_11_merge_audio.py
================================================
import os
import pandas as pd
import subprocess
from pydub import AudioSegment
from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn
from rich.console import Console
from core.utils import *
from core.utils.models import *
console = Console()

DUB_VOCAL_FILE = 'output/dub.mp3'

DUB_SUB_FILE = 'output/dub.srt'
OUTPUT_FILE_TEMPLATE = f"{_AUDIO_SEGS_DIR}/{{}}.wav"

def load_and_flatten_data(excel_file):
    """Load and flatten Excel data"""
    df = pd.read_excel(excel_file)
    lines = [eval(line) if isinstance(line, str) else line for line in df['lines'].tolist()]
    lines = [item for sublist in lines for item in sublist]
    
    new_sub_times = [eval(time) if isinstance(time, str) else time for time in df['new_sub_times'].tolist()]
    new_sub_times = [item for sublist in new_sub_times for item in sublist]
    
    return df, lines, new_sub_times

def get_audio_files(df):
    """Generate a list of audio file paths"""
    audios = []
    for index, row in df.iterrows():
        number = row['number']
        line_count = len(eval(row['lines']) if isinstance(row['lines'], str) else row['lines'])
        for line_index in range(line_count):
            temp_file = OUTPUT_FILE_TEMPLATE.format(f"{number}_{line_index}")
            audios.append(temp_file)
    return audios

def process_audio_segment(audio_file):
    """Process a single audio segment with MP3 compression"""
    temp_file = f"{audio_file}_temp.mp3"
    ffmpeg_cmd = [
        'ffmpeg', '-y',
        '-i', audio_file,
        '-ar', '16000',
        '-ac', '1',
        '-b:a', '64k',
        temp_file
    ]
    subprocess.run(ffmpeg_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    audio_segment = AudioSegment.from_mp3(temp_file)
    os.remove(temp_file)
    return audio_segment

def merge_audio_segments(audios, new_sub_times, sample_rate):
    merged_audio = AudioSegment.silent(duration=0, frame_rate=sample_rate)
    
    with Progress(SpinnerColumn(), TextColumn("[progress.description]{task.description}"), BarColumn(), TaskProgressColumn()) as progress:
        merge_task = progress.add_task("🎵 Merging audio segments...", total=len(audios))
        
        for i, (audio_file, time_range) in enumerate(zip(audios, new_sub_times)):
            if not os.path.exists(audio_file):
                console.print(f"[bold yellow]⚠️  Warning: File {audio_file} does not exist, skipping...[/bold yellow]")
                progress.advance(merge_task)
                continue
                
            audio_segment = process_audio_segment(audio_file)
            start_time, end_time = time_range
            
            # Add silence segment
            if i > 0:
                prev_end = new_sub_times[i-1][1]
                silence_duration = start_time - prev_end
                if silence_duration > 0:
                    silence = AudioSegment.silent(duration=int(silence_duration * 1000), frame_rate=sample_rate)
                    merged_audio += silence
            elif start_time > 0:
                silence = AudioSegment.silent(duration=int(start_time * 1000), frame_rate=sample_rate)
                merged_audio += silence
                
            merged_audio += audio_segment
            progress.advance(merge_task)
    
    return merged_audio

def create_srt_subtitle():
    df, lines, new_sub_times = load_and_flatten_data(_8_1_AUDIO_TASK)
    
    with open(DUB_SUB_FILE, 'w', encoding='utf-8') as f:
        for i, ((start_time, end_time), line) in enumerate(zip(new_sub_times, lines), 1):
            start_str = f"{int(start_time//3600):02d}:{int((start_time%3600)//60):02d}:{int(start_time%60):02d},{int((start_time*1000)%1000):03d}"
            end_str = f"{int(end_time//3600):02d}:{int((end_time%3600)//60):02d}:{int(end_time%60):02d},{int((end_time*1000)%1000):03d}"
            
            f.write(f"{i}\n")
            f.write(f"{start_str} --> {end_str}\n")
            f.write(f"{line}\n\n")
    
    rprint(f"[bold green]✅ Subtitle file created: {DUB_SUB_FILE}[/bold green]")

def merge_full_audio():
    """Main function: Process the complete audio merging process"""
    console.print("\n[bold cyan]🎬 Starting audio merging process...[/bold cyan]")
    
    with console.status("[bold cyan]📊 Loading data from Excel...[/bold cyan]"):
        df, lines, new_sub_times = load_and_flatten_data(_8_1_AUDIO_TASK)
    console.print("[bold green]✅ Data loaded successfully[/bold green]")
    
    with console.status("[bold cyan]🔍 Getting audio file list...[/bold cyan]"):
        audios = get_audio_files(df)
    console.print(f"[bold green]✅ Found {len(audios)} audio segments[/bold green]")
    
    with console.status("[bold cyan]📝 Generating subtitle file...[/bold cyan]"):
        create_srt_subtitle()
    
    if not os.path.exists(audios[0]):
        console.print(f"[bold red]❌ Error: First audio file {audios[0]} does not exist![/bold red]")
        return
    
    sample_rate = 16000
    console.print(f"[bold green]✅ Sample rate: {sample_rate}Hz[/bold green]")

    console.print("[bold cyan]🔄 Starting audio merge process...[/bold cyan]")
    merged_audio = merge_audio_segments(audios, new_sub_times, sample_rate)
    
    with console.status("[bold cyan]💾 Exporting final audio file...[/bold cyan]"):
        merged_audio = merged_audio.set_frame_rate(16000).set_channels(1)
        merged_audio.export(DUB_VOCAL_FILE, format="mp3", parameters=["-b:a", "64k"])
    console.print(f"[bold green]✅ Audio file successfully merged![/bold green]")
    console.print(f"[bold green]📁 Output file: {DUB_VOCAL_FILE}[/bold green]")

if __name__ == "__main__":
    merge_full_audio()

================================================
FILE: core/_12_dub_to_vid.py
================================================
import platform
import subprocess

import cv2
import numpy as np
from rich.console import Console

from core._1_ytdlp import find_video_files
from core.asr_backend.audio_preprocess import normalize_audio_volume
from core.utils import *
from core.utils.models import *

console = Console()

DUB_VIDEO = "output/output_dub.mp4"
DUB_SUB_FILE = 'output/dub.srt'
DUB_AUDIO = 'output/dub.mp3'

TRANS_FONT_SIZE = 17
TRANS_FONT_NAME = 'Arial'
if platform.system() == 'Linux':
    TRANS_FONT_NAME = 'NotoSansCJK-Regular'
if platform.system() == 'Darwin':
    TRANS_FONT_NAME = 'Arial Unicode MS'

TRANS_FONT_COLOR = '&H00FFFF'
TRANS_OUTLINE_COLOR = '&H000000'
TRANS_OUTLINE_WIDTH = 1 
TRANS_BACK_COLOR = '&H33000000'

def merge_video_audio():
    """Merge video and audio, and reduce video volume"""
    VIDEO_FILE = find_video_files()
    background_file = _BACKGROUND_AUDIO_FILE
    
    if not load_key("burn_subtitles"):
        rprint("[bold yellow]Warning: A 0-second black video will be generated as a placeholder as subtitles are not burned in.[/bold yellow]")

        # Create a black frame
        frame = np.zeros((1080, 1920, 3), dtype=np.uint8)
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        out = cv2.VideoWriter(DUB_VIDEO, fourcc, 1, (1920, 1080))
        out.write(frame)
        out.release()

        rprint("[bold green]Placeholder video has been generated.[/bold green]")
        return

    # Normalize dub audio
    normalized_dub_audio = 'output/normalized_dub.wav'
    normalize_audio_volume(DUB_AUDIO, normalized_dub_audio)
    
    # Merge video and audio with translated subtitles
    video = cv2.VideoCapture(VIDEO_FILE)
    TARGET_WIDTH = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
    TARGET_HEIGHT = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
    video.release()
    rprint(f"[bold green]Video resolution: {TARGET_WIDTH}x{TARGET_HEIGHT}[/bold green]")
    
    subtitle_filter = (
        f"subtitles={DUB_SUB_FILE}:force_style='FontSize={TRANS_FONT_SIZE},"
        f"FontName={TRANS_FONT_NAME},PrimaryColour={TRANS_FONT_COLOR},"
        f"OutlineColour={TRANS_OUTLINE_COLOR},OutlineWidth={TRANS_OUTLINE_WIDTH},"
        f"BackColour={TRANS_BACK_COLOR},Alignment=2,MarginV=27,BorderStyle=4'"
    )
    
    cmd = [
        'ffmpeg', '-y', '-i', VIDEO_FILE, '-i', background_file, '-i', normalized_dub_audio,
        '-filter_complex',
        f'[0:v]scale={TARGET_WIDTH}:{TARGET_HEIGHT}:force_original_aspect_ratio=decrease,'
        f'pad={TARGET_WIDTH}:{TARGET_HEIGHT}:(ow-iw)/2:(oh-ih)/2,'
        f'{subtitle_filter}[v];'
        f'[1:a][2:a]amix=inputs=2:duration=first:dropout_transition=3[a]'
    ]

    if load_key("ffmpeg_gpu"):
        rprint("[bold green]Using GPU acceleration...[/bold green]")
        cmd.extend(['-map', '[v]', '-map', '[a]', '-c:v', 'h264_nvenc'])
    else:
        cmd.extend(['-map', '[v]', '-map', '[a]'])
    
    cmd.extend(['-c:a', 'aac', '-b:a', '96k', DUB_VIDEO])
    
    subprocess.run(cmd)
    rprint(f"[bold green]Video and audio successfully merged into {DUB_VIDEO}[/bold green]")

if __name__ == '__main__':
    merge_video_audio()


================================================
FILE: core/_1_ytdlp.py
================================================
import os,sys
import glob
import re
import subprocess
from core.utils import *

def sanitize_filename(filename):
    # Remove or replace illegal characters
    filename = re.sub(r'[<>:"/\\|?*]', '', filename)
    # Ensure filename doesn't start or end with a dot or space
    filename = filename.strip('. ')
    # Use default name if filename is empty
    return filename if filename else 'video'

def update_ytdlp():
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", "--upgrade", "yt-dlp"])
        if 'yt_dlp' in sys.modules:
            del sys.modules['yt_dlp']
        rprint("[green]yt-dlp updated[/green]")
    except subprocess.CalledProcessError as e:
        rprint("[yellow]Warning: Failed to update yt-dlp: {e}[/yellow]")
    from yt_dlp import YoutubeDL
    return YoutubeDL

def download_video_ytdlp(url, save_path='output', resolution='1080'):
    os.makedirs(save_path, exist_ok=True)
    ydl_opts = {
        'format': 'bestvideo+bestaudio/best' if resolution == 'best' else f'bestvideo[height<={resolution}]+bestaudio/best[height<={resolution}]',
        'outtmpl': f'{save_path}/%(title)s.%(ext)s',
        'noplaylist': True,
        'writethumbnail': True,
        'postprocessors': [{'key': 'FFmpegThumbnailsConvertor', 'format': 'jpg'}],
    }

    # Read Youtube Cookie File
    cookies_path = load_key("youtube.cookies_path")
    if os.path.exists(cookies_path):
        ydl_opts["cookiefile"] = str(cookies_path)

    # Get YoutubeDL class after updating
    YoutubeDL = update_ytdlp()
    with YoutubeDL(ydl_opts) as ydl:
        ydl.download([url])
    
    # Check and rename files after download
    for file in os.listdir(save_path):
        if os.path.isfile(os.path.join(save_path, file)):
            filename, ext = os.path.splitext(file)
            new_filename = sanitize_filename(filename)
            if new_filename != filename:
                os.rename(os.path.join(save_path, file), os.path.join(save_path, new_filename + ext))

def find_video_files(save_path='output'):
    video_files = [file for file in glob.glob(save_path + "/*") if os.path.splitext(file)[1][1:].lower() in load_key("allowed_video_formats")]
    # change \\ to /, this happen on windows
    if sys.platform.startswith('win'):
        video_files = [file.replace("\\", "/") for file in video_files]
    video_files = [file for file in video_files if not file.startswith("output/output")]
    if len(video_files) != 1:
        raise ValueError(f"Number of videos found {len(video_files)} is not unique. Please check.")
    return video_files[0]

if __name__ == '__main__':
    # Example usage
    url = input('Please enter the URL of the video you want to download: ')
    resolution = input('Please enter the desired resolution (360/480/720/1080, default 1080): ')
    resolution = int(resolution) if resolution.isdigit() else 1080
    download_video_ytdlp(url, resolution=resolution)
    print(f"🎥 Video has been downloaded to {find_video_files()}")


================================================
FILE: core/_2_asr.py
================================================
from core.utils import *
from core.asr_backend.demucs_vl import demucs_audio
from core.asr_backend.audio_preprocess import process_transcription, convert_video_to_audio, split_audio, save_results, normalize_audio_volume
from core._1_ytdlp import find_video_files
from core.utils.models import *

@check_file_exists(_2_CLEANED_CHUNKS)
def transcribe():
    # 1. video to audio
    video_file = find_video_files()
    convert_video_to_audio(video_file)

    # 2. Demucs vocal separation:
    if load_key("demucs"):
        demucs_audio()
        vocal_audio = normalize_audio_volume(_VOCAL_AUDIO_FILE, _VOCAL_AUDIO_FILE, format="mp3")
    else:
        vocal_audio = _RAW_AUDIO_FILE

    # 3. Extract audio
    segments = split_audio(_RAW_AUDIO_FILE)
    
    # 4. Transcribe audio by clips
    all_results = []
    runtime = load_key("whisper.runtime")
    if runtime == "local":
        from core.asr_backend.whisperX_local import transcribe_audio as ts
        rprint("[cyan]🎤 Transcribing audio with local model...[/cyan]")
    elif runtime == "cloud":
        from core.asr_backend.whisperX_302 import transcribe_audio_302 as ts
        rprint("[cyan]🎤 Transcribing audio with 302 API...[/cyan]")
    elif runtime == "elevenlabs":
        from core.asr_backend.elevenlabs_asr import transcribe_audio_elevenlabs as ts
        rprint("[cyan]🎤 Transcribing audio with ElevenLabs API...[/cyan]")

    for start, end in segments:
        result = ts(_RAW_AUDIO_FILE, vocal_audio, start, end)
        all_results.append(result)
    
    # 5. Combine results
    combined_result = {'segments': []}
    for result in all_results:
        combined_result['segments'].extend(result['segments'])
    
    # 6. Process df
    df = process_transcription(combined_result)
    save_results(df)
        
if __name__ == "__main__":
    transcribe()

================================================
FILE: core/_3_1_split_nlp.py
================================================
from core.spacy_utils import *
from core.utils.models import _3_1_SPLIT_BY_NLP
from core.utils import check_file_exists

@check_file_exists(_3_1_SPLIT_BY_NLP)
def split_by_spacy():
    nlp = init_nlp()
    split_by_mark(nlp)
    split_by_comma_main(nlp)
    split_sentences_main(nlp)
    split_long_by_root_main(nlp)
    return

if __name__ == '__main__':
    split_by_spacy()

================================================
FILE: core/_3_2_split_meaning.py
================================================
import concurrent.futures
from difflib import SequenceMatcher
import math
from core.prompts import get_split_prompt
from core.spacy_utils.load_nlp_model import init_nlp
from core.utils import *
from rich.console import Console
from rich.table import Table
from core.utils.models import _3_1_SPLIT_BY_NLP, _3_2_SPLIT_BY_MEANING
console = Console()

def tokenize_sentence(sentence, nlp):
    doc = nlp(sentence)
    return [token.text for token in doc]

def find_split_positions(original, modified):
    split_positions = []
    parts = modified.split('[br]')
    start = 0
    whisper_language = load_key("whisper.language")
    language = load_key("whisper.detected_language") if whisper_language == 'auto' else whisper_language
    joiner = get_joiner(language)

    for i in range(len(parts) - 1):
        max_similarity = 0
        best_split = None

        for j in range(start, len(original)):
            original_left = original[start:j]
            modified_left = joiner.join(parts[i].split())

            left_similarity = SequenceMatcher(None, original_left, modified_left).ratio()

            if left_similarity > max_similarity:
                max_similarity = left_similarity
                best_split = j

        if max_similarity < 0.9:
            console.print(f"[yellow]Warning: low similarity found at the best split point: {max_similarity}[/yellow]")
        if best_split is not None:
            split_positions.append(best_split)
            start = best_split
        else:
            console.print(f"[yellow]Warning: Unable to find a suitable split point for the {i+1}th part.[/yellow]")

    return split_positions

def split_sentence(sentence, num_parts, word_limit=20, index=-1, retry_attempt=0):
    """Split a long sentence using GPT and return the result as a string."""
    split_prompt = get_split_prompt(sentence, num_parts, word_limit)
    def valid_split(response_data):
        choice = response_data["choice"]
        if f'split{choice}' not in response_data:
            return {"status": "error", "message": "Missing required key: `split`"}
        if "[br]" not in response_data[f"split{choice}"]:
            return {"status": "error", "message": "Split failed, no [br] found"}
        return {"status": "success", "message": "Split completed"}
    
    response_data = ask_gpt(split_prompt + " " * retry_attempt, resp_type='json', valid_def=valid_split, log_title='split_by_meaning')
    choice = response_data["choice"]
    best_split = response_data[f"split{choice}"]
    split_points = find_split_positions(sentence, best_split)
    # split the sentence based on the split points
    for i, split_point in enumerate(split_points):
        if i == 0:
            best_split = sentence[:split_point] + '\n' + sentence[split_point:]
        else:
            parts = best_split.split('\n')
            last_part = parts[-1]
            parts[-1] = last_part[:split_point - split_points[i-1]] + '\n' + last_part[split_point - split_points[i-1]:]
            best_split = '\n'.join(parts)
    if index != -1:
        console.print(f'[green]✅ Sentence {index} has been successfully split[/green]')
    table = Table(title="")
    table.add_column("Type", style="cyan")
    table.add_column("Sentence")
    table.add_row("Original", sentence, style="yellow")
    table.add_row("Split", best_split.replace('\n', ' ||'), style="yellow")
    console.print(table)
    
    return best_split

def parallel_split_sentences(sentences, max_length, max_workers, nlp, retry_attempt=0):
    """Split sentences in parallel using a thread pool."""
    new_sentences = [None] * len(sentences)
    futures = []

    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        for index, sentence in enumerate(sentences):
            # Use tokenizer to split the sentence
            tokens = tokenize_sentence(sentence, nlp)
            # print("Tokenization result:", tokens)
            num_parts = math.ceil(len(tokens) / max_length)
            if len(tokens) > max_length:
                future = executor.submit(split_sentence, sentence, num_parts, max_length, index=index, retry_attempt=retry_attempt)
                futures.append((future, index, num_parts, sentence))
            else:
                new_sentences[index] = [sentence]

        for future, index, num_parts, sentence in futures:
            split_result = future.result()
            if split_result:
                split_lines = split_result.strip().split('\n')
                new_sentences[index] = [line.strip() for line in split_lines]
            else:
                new_sentences[index] = [sentence]

    return [sentence for sublist in new_sentences for sentence in sublist]

@check_file_exists(_3_2_SPLIT_BY_MEANING)
def split_sentences_by_meaning():
    """The main function to split sentences by meaning."""
    # read input sentences
    with open(_3_1_SPLIT_BY_NLP, 'r', encoding='utf-8') as f:
        sentences = [line.strip() for line in f.readlines()]

    nlp = init_nlp()
    # 🔄 process sentences multiple times to ensure all are split
    for retry_attempt in range(3):
        sentences = parallel_split_sentences(sentences, max_length=load_key("max_split_length"), max_workers=load_key("max_workers"), nlp=nlp, retry_attempt=retry_attempt)

    # 💾 save results
    with open(_3_2_SPLIT_BY_MEANING, 'w', encoding='utf-8') as f:
        f.write('\n'.join(sentences))
    console.print('[green]✅ All sentences have been successfully split![/green]')

if __name__ == '__main__':
    # print(split_sentence('Which makes no sense to the... average guy who always pushes the character creation slider all the way to the right.', 2, 22))
    split_sentences_by_meaning()

================================================
FILE: core/_4_1_summarize.py
================================================
import json
from core.prompts import get_summary_prompt
import pandas as pd
from core.utils import *
from core.utils.models import _3_2_SPLIT_BY_MEANING, _4_1_TERMINOLOGY

CUSTOM_TERMS_PATH = 'custom_terms.xlsx'

def combine_chunks():
    """Combine the text chunks identified by whisper into a single long text"""
    with open(_3_2_SPLIT_BY_MEANING, 'r', encoding='utf-8') as file:
        sentences = file.readlines()
    cleaned_sentences = [line.strip() for line in sentences]
    combined_text = ' '.join(cleaned_sentences)
    return combined_text[:load_key('summary_length')]  #! Return only the first x characters

def search_things_to_note_in_prompt(sentence):
    """Search for terms to note in the given sentence"""
    with open(_4_1_TERMINOLOGY, 'r', encoding='utf-8') as file:
        things_to_note = json.load(file)
    things_to_note_list = [term['src'] for term in things_to_note['terms'] if term['src'].lower() in sentence.lower()]
    if things_to_note_list:
        prompt = '\n'.join(
            f'{i+1}. "{term["src"]}": "{term["tgt"]}",'
            f' meaning: {term["note"]}'
            for i, term in enumerate(things_to_note['terms'])
            if term['src'] in things_to_note_list
        )
        return prompt
    else:
        return None

def get_summary():
    src_content = combine_chunks()
    custom_terms = pd.read_excel(CUSTOM_TERMS_PATH)
    custom_terms_json = {
        "terms": 
            [
                {
                    "src": str(row.iloc[0]),
                    "tgt": str(row.iloc[1]), 
                    "note": str(row.iloc[2])
                }
                for _, row in custom_terms.iterrows()
            ]
    }
    if len(custom_terms) > 0:
        rprint(f"📖 Custom Terms Loaded: {len(custom_terms)} terms")
        rprint("📝 Terms Content:", json.dumps(custom_terms_json, indent=2, ensure_ascii=False))
    summary_prompt = get_summary_prompt(src_content, custom_terms_json)
    rprint("📝 Summarizing and extracting terminology ...")
    
    def valid_summary(response_data):
        required_keys = {'src', 'tgt', 'note'}
        if 'terms' not in response_data:
            return {"status": "error", "message": "Invalid response format"}
        for term in response_data['terms']:
            if not all(key in term for key in required_keys):
                return {"status": "error", "message": "Invalid response format"}   
        return {"status": "success", "message": "Summary completed"}

    summary = ask_gpt(summary_prompt, resp_type='json', valid_def=valid_summary, log_title='summary')
    summary['terms'].extend(custom_terms_json['terms'])
    
    with open(_4_1_TERMINOLOGY, 'w', encoding='utf-8') as f:
        json.dump(summary, f, ensure_ascii=False, indent=4)

    rprint(f'💾 Summary log saved to → `{_4_1_TERMINOLOGY}`')

if __name__ == '__main__':
    get_summary()

================================================
FILE: core/_4_2_translate.py
================================================
import pandas as pd
import json
import concurrent.futures
from core.translate_lines import translate_lines
from core._4_1_summarize import search_things_to_note_in_prompt
from core._8_1_audio_task import check_len_then_trim
from core._6_gen_sub import align_timestamp
from core.utils import *
from rich.console import Console
from rich.progress import Progress, SpinnerColumn, TextColumn
from difflib import SequenceMatcher
from core.utils.models import *
console = Console()

# Function to split text into chunks
def split_chunks_by_chars(chunk_size, max_i): 
    """Split text into chunks based on character count, return a list of multi-line text chunks"""
    with open(_3_2_SPLIT_BY_MEANING, "r", encoding="utf-8") as file:
        sentences = file.read().strip().split('\n')

    chunks = []
    chunk = ''
    sentence_count = 0
    for sentence in sentences:
        if len(chunk) + len(sentence + '\n') > chunk_size or sentence_count == max_i:
            chunks.append(chunk.strip())
            chunk = sentence + '\n'
            sentence_count = 1
        else:
            chunk += sentence + '\n'
            sentence_count += 1
    chunks.append(chunk.strip())
    return chunks

# Get context from surrounding chunks
def get_previous_content(chunks, chunk_index):
    return None if chunk_index == 0 else chunks[chunk_index - 1].split('\n')[-3:] # Get last 3 lines
def get_after_content(chunks, chunk_index):
    return None if chunk_index == len(chunks) - 1 else chunks[chunk_index + 1].split('\n')[:2] # Get first 2 lines

# 🔍 Translate a single chunk
def translate_chunk(chunk, chunks, theme_prompt, i):
    things_to_note_prompt = search_things_to_note_in_prompt(chunk)
    previous_content_prompt = get_previous_content(chunks, i)
    after_content_prompt = get_after_content(chunks, i)
    translation, english_result = translate_lines(chunk, previous_content_prompt, after_content_prompt, things_to_note_prompt, theme_prompt, i)
    return i, english_result, translation

# Add similarity calculation function
def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

# 🚀 Main function to translate all chunks
@check_file_exists(_4_2_TRANSLATION)
def translate_all():
    console.print("[bold green]Start Translating All...[/bold green]")
    chunks = split_chunks_by_chars(chunk_size=600, max_i=10)
    with open(_4_1_TERMINOLOGY, 'r', encoding='utf-8') as file:
        theme_prompt = json.load(file).get('theme')

    # 🔄 Use concurrent execution for translation
    with Progress(SpinnerColumn(), TextColumn("[progress.description]{task.description}"), transient=True) as progress:
        task = progress.add_task("[cyan]Translating chunks...", total=len(chunks))
        with concurrent.futures.ThreadPoolExecutor(max_workers=load_key("max_workers")) as executor:
            futures = []
            for i, chunk in enumerate(chunks):
                future = executor.submit(translate_chunk, chunk, chunks, theme_prompt, i)
                futures.append(future)
            results = []
            for future in concurrent.futures.as_completed(futures):
                results.append(future.result())
                progress.update(task, advance=1)

    results.sort(key=lambda x: x[0])  # Sort results based on original order
    
    # 💾 Save results to lists and Excel file
    src_text, trans_text = [], []
    for i, chunk in enumerate(chunks):
        chunk_lines = chunk.split('\n')
        src_text.extend(chunk_lines)
        
        # Calculate similarity between current chunk and translation results
        chunk_text = ''.join(chunk_lines).lower()
        matching_results = [(r, similar(''.join(r[1].split('\n')).lower(), chunk_text)) 
                          for r in results]
        best_match = max(matching_results, key=lambda x: x[1])
        
        # Check similarity and handle exceptions
        if best_match[1] < 0.9:
            console.print(f"[yellow]Warning: No matching translation found for chunk {i}[/yellow]")
            raise ValueError(f"Translation matching failed (chunk {i})")
        elif best_match[1] < 1.0:
            console.print(f"[yellow]Warning: Similar match found (chunk {i}, similarity: {best_match[1]:.3f})[/yellow]")
            
        trans_text.extend(best_match[0][2].split('\n'))
    
    # Trim long translation text
    df_text = pd.read_excel(_2_CLEANED_CHUNKS)
    df_text['text'] = df_text['text'].str.strip('"').str.strip()
    df_translate = pd.DataFrame({'Source': src_text, 'Translation': trans_text})
    subtitle_output_configs = [('trans_subs_for_audio.srt', ['Translation'])]
    df_time = align_timestamp(df_text, df_translate, subtitle_output_configs, output_dir=None, for_display=False)
    console.print(df_time)
    # apply check_len_then_trim to df_time['Translation'], only when duration > MIN_TRIM_DURATION.
    df_time['Translation'] = df_time.apply(lambda x: check_len_then_trim(x['Translation'], x['duration']) if x['duration'] > load_key("min_trim_duration") else x['Translation'], axis=1)
    console.print(df_time)
    
    df_time.to_excel(_4_2_TRANSLATION, index=False)
    console.print("[bold green]✅ Translation completed and results saved.[/bold green]")

if __name__ == '__main__':
    translate_all()

================================================
FILE: core/_5_split_sub.py
================================================
import pandas as pd
from typing import List, Tuple
import concurrent.futures

from core._3_2_split_meaning import split_sentence
from core.prompts import get_align_prompt
from rich.panel import Panel
from rich.console import Console
from rich.table import Table
from core.utils import *
from core.utils.models import *
console = Console()

# ! You can modify your own weights here
# Chinese and Japanese 2.5 characters, Korean 2 characters, Thai 1.5 characters, full-width symbols 2 characters, other English-based and half-width symbols 1 character
def calc_len(text: str) -> float:
    text = str(text) # force convert
    def char_weight(char):
        code = ord(char)
        if 0x4E00 <= code <= 0x9FFF or 0x3040 <= code <= 0x30FF:  # Chinese and Japanese
            return 1.75
        elif 0xAC00 <= code <= 0xD7A3 or 0x1100 <= code <= 0x11FF:  # Korean
            return 1.5
        elif 0x0E00 <= code <= 0x0E7F:  # Thai
            return 1
        elif 0xFF01 <= code <= 0xFF5E:  # full-width symbols
            return 1.75
        else:  # other characters (e.g. English and half-width symbols)
            return 1

    return sum(char_weight(char) for char in text)

def align_subs(src_sub: str, tr_sub: str, src_part: str) -> Tuple[List[str], List[str], str]:
    align_prompt = get_align_prompt(src_sub, tr_sub, src_part)
    
    def valid_align(response_data):
        if 'align' not in response_data:
            return {"status": "error", "message": "Missing required key: `align`"}
        if len(response_data['align']) < 2:
            return {"status": "error", "message": "Align does not contain more than 1 part as expected!"}
        return {"status": "success", "message": "Align completed"}
    parsed = ask_gpt(align_prompt, resp_type='json', valid_def=valid_align, log_title='align_subs')
    align_data = parsed['align']
    src_parts = src_part.split('\n')
    tr_parts = [item[f'target_part_{i+1}'].strip() for i, item in enumerate(align_data)]
    
    whisper_language = load_key("whisper.language")
    language = load_key("whisper.detected_language") if whisper_language == 'auto' else whisper_language
    joiner = get_joiner(language)
    tr_remerged = joiner.join(tr_parts)
    
    table = Table(title="🔗 Aligned parts")
    table.add_column("Language", style="cyan")
    table.add_column("Parts", style="magenta")
    table.add_row("SRC_LANG", "\n".join(src_parts))
    table.add_row("TARGET_LANG", "\n".join(tr_parts))
    console.print(table)
    
    return src_parts, tr_parts, tr_remerged

def split_align_subs(src_lines: List[str], tr_lines: List[str]):
    subtitle_set = load_key("subtitle")
    MAX_SUB_LENGTH = subtitle_set["max_length"]
    TARGET_SUB_MULTIPLIER = subtitle_set["target_multiplier"]
    remerged_tr_lines = tr_lines.copy()
    
    to_split = []
    for i, (src, tr) in enumerate(zip(src_lines, tr_lines)):
        src, tr = str(src), str(tr)
        if len(src) > MAX_SUB_LENGTH or calc_len(tr) * TARGET_SUB_MULTIPLIER > MAX_SUB_LENGTH:
            to_split.append(i)
            table = Table(title=f"📏 Line {i} needs to be split")
            table.add_column("Type", style="cyan")
            table.add_column("Content", style="magenta")
            table.add_row("Source Line", src)
            table.add_row("Target Line", tr)
            console.print(table)
    
    @except_handler("Error in split_align_subs")
    def process(i):
        split_src = split_sentence(src_lines[i], num_parts=2).strip()
        src_parts, tr_parts, tr_remerged = align_subs(src_lines[i], tr_lines[i], split_src)
        src_lines[i] = src_parts
        tr_lines[i] = tr_parts
        remerged_tr_lines[i] = tr_remerged
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=load_key("max_workers")) as executor:
        executor.map(process, to_split)
    
    # Flatten `src_lines` and `tr_lines`
    src_lines = [item for sublist in src_lines for item in (sublist if isinstance(sublist, list) else [sublist])]
    tr_lines = [item for sublist in tr_lines for item in (sublist if isinstance(sublist, list) else [sublist])]
    
    return src_lines, tr_lines, remerged_tr_lines

def split_for_sub_main():
    console.print("[bold green]🚀 Start splitting subtitles...[/bold green]")
    
    df = pd.read_excel(_4_2_TRANSLATION)
    src = df['Source'].tolist()
    trans = df['Translation'].tolist()
    
    subtitle_set = load_key("subtitle")
    MAX_SUB_LENGTH = subtitle_set["max_length"]
    TARGET_SUB_MULTIPLIER = subtitle_set["target_multiplier"]
    
    for attempt in range(3):  # 多次切割
        console.print(Panel(f"🔄 Split attempt {attempt + 1}", expand=False))
        split_src, split_trans, remerged = split_align_subs(src.copy(), trans)
        
        # 检查是否所有字幕都符合长度要求
        if all(len(src) <= MAX_SUB_LENGTH for src in split_src) and \
           all(calc_len(tr) * TARGET_SUB_MULTIPLIER <= MAX_SUB_LENGTH for tr in split_trans):
            break
        
        # 更新源数据继续下一轮分割
        src, trans = split_src, split_trans

    # 确保二者有相同的长度，防止报错
    if len(src) > len(remerged):
        remerged += [None] * (len(src) - len(remerged))
    elif len(remerged) > len(src):
        src += [None] * (len(remerged) - len(src))
    
    pd.DataFrame({'Source': split_src, 'Translation': split_trans}).to_excel(_5_SPLIT_SUB, index=False)
    pd.DataFrame({'Source': src, 'Translation': remerged}).to_excel(_5_REMERGED, index=False)

if __name__ == '__main__':
    split_for_sub_main()


================================================
FILE: core/_6_gen_sub.py
================================================
import pandas as pd
import os
import re
from rich.panel import Panel
from rich.console import Console
import autocorrect_py as autocorrect
from core.utils import *
from core.utils.models import *
console = Console()

SUBTITLE_OUTPUT_CONFIGS = [ 
    ('src.srt', ['Source']),
    ('trans.srt', ['Translation']),
    ('src_trans.srt', ['Source', 'Translation']),
    ('trans_src.srt', ['Translation', 'Source'])
]

AUDIO_SUBTITLE_OUTPUT_CONFIGS = [
    ('src_subs_for_audio.srt', ['Source']),
    ('trans_subs_for_audio.srt', ['Translation'])
]

def convert_to_srt_format(start_time, end_time):
    """Convert time (in seconds) to the format: hours:minutes:seconds,milliseconds"""
    def seconds_to_hmsm(seconds):
        hours = int(seconds // 3600)
        minutes = int((seconds % 3600) // 60)
        seconds = seconds % 60
        milliseconds = int(seconds * 1000) % 1000
        return f"{hours:02d}:{minutes:02d}:{int(seconds):02d},{milliseconds:03d}"

    start_srt = seconds_to_hmsm(start_time)
    end_srt = seconds_to_hmsm(end_time)
    return f"{start_srt} --> {end_srt}"

def remove_punctuation(text):
    text = re.sub(r'\s+', ' ', text)
    text = re.sub(r'[^\w\s]', '', text)
    return text.strip()

def show_difference(str1, str2):
    """Show the difference positions between two strings"""
    min_len = min(len(str1), len(str2))
    diff_positions = []
    
    for i in range(min_len):
        if str1[i] != str2[i]:
            diff_positions.append(i)
    
    if len(str1) != len(str2):
        diff_positions.extend(range(min_len, max(len(str1), len(str2))))
    
    print("Difference positions:")
    print(f"Expected sentence: {str1}")
    print(f"Actual match: {str2}")
    print("Position markers: " + "".join("^" if i in diff_positions else " " for i in range(max(len(str1), len(str2)))))
    print(f"Difference indices: {diff_positions}")

def get_sentence_timestamps(df_words, df_sentences):
    time_stamp_list = []
    
    # Build complete string and position mapping
    full_words_str = ''
    position_to_word_idx = {}
    
    for idx, word in enumerate(df_words['text']):
        clean_word = remove_punctuation(word.lower())
        start_pos = len(full_words_str)
        full_words_str += clean_word
        for pos in range(start_pos, len(full_words_str)):
            position_to_word_idx[pos] = idx
    
    current_pos = 0
    for idx, sentence in df_sentences['Source'].items():
        clean_sentence = remove_punctuation(sentence.lower()).replace(" ", "")
        sentence_len = len(clean_sentence)
        
        match_found = False
        while current_pos <= len(full_words_str) - sentence_len:
            if full_words_str[current_pos:current_pos+sentence_len] == clean_sentence:
                start_word_idx = position_to_word_idx[current_pos]
                end_word_idx = position_to_word_idx[current_pos + sentence_len - 1]
                
                time_stamp_list.append((
                    float(df_words['start'][start_word_idx]),
                    float(df_words['end'][end_word_idx])
                ))
                
                current_pos += sentence_len
                match_found = True
                break
            current_pos += 1
            
        if not match_found:
            print(f"\n⚠️ Warning: No exact match found for sentence: {sentence}")
            show_difference(clean_sentence, 
                          full_words_str[current_pos:current_pos+len(clean_sentence)])
            print("\nOriginal sentence:", df_sentences['Source'][idx])
            raise ValueError("❎ No match found for sentence.")
    
    return time_stamp_list

def align_timestamp(df_text, df_translate, subtitle_output_configs: list, output_dir: str, for_display: bool = True):
    """Align timestamps and add a new timestamp column to df_translate"""
    df_trans_time = df_translate.copy()

    # Assign an ID to each word in df_text['text'] and create a new DataFrame
    words = df_text['text'].str.split(expand=True).stack().reset_index(level=1, drop=True).reset_index()
    words.columns = ['id', 'word']
    words['id'] = words['id'].astype(int)

    # Process timestamps ⏰
    time_stamp_list = get_sentence_timestamps(df_text, df_translate)
    df_trans_time['timestamp'] = time_stamp_list
    df_trans_time['duration'] = df_trans_time['timestamp'].apply(lambda x: x[1] - x[0])

    # Remove gaps 🕳️
    for i in range(len(df_trans_time)-1):
        delta_time = df_trans_time.loc[i+1, 'timestamp'][0] - df_trans_time.loc[i, 'timestamp'][1]
        if 0 < delta_time < 1:
            df_trans_time.at[i, 'timestamp'] = (df_trans_time.loc[i, 'timestamp'][0], df_trans_time.loc[i+1, 'timestamp'][0])

    # Convert start and end timestamps to SRT format
    df_trans_time['timestamp'] = df_trans_time['timestamp'].apply(lambda x: convert_to_srt_format(x[0], x[1]))

    # Polish subtitles: replace punctuation in Translation if for_display
    if for_display:
        df_trans_time['Translation'] = df_trans_time['Translation'].apply(lambda x: re.sub(r'[，。]', ' ', x).strip())

    # Output subtitles 📜
    def generate_subtitle_string(df, columns):
        return ''.join([f"{i+1}\n{row['timestamp']}\n{row[columns[0]].strip()}\n{row[columns[1]].strip() if len(columns) > 1 else ''}\n\n" for i, row in df.iterrows()]).strip()

    if output_dir:
        os.makedirs(output_dir, exist_ok=True)
        for filename, columns in subtitle_output_configs:
            subtitle_str = generate_subtitle_string(df_trans_time, columns)
            with open(os.path.join(output_dir, filename), 'w', encoding='utf-8') as f:
                f.write(subtitle_str)
    
    return df_trans_time

# ✨ Beautify the translation
def clean_translation(x):
    if pd.isna(x):
        return ''
    cleaned = str(x).strip('。').strip('，')
    return autocorrect.format(cleaned)

def align_timestamp_main():
    df_text = pd.read_excel(_2_CLEANED_CHUNKS)
    df_text['text'] = df_text['text'].str.strip('"').str.strip()
    df_translate = pd.read_excel(_5_SPLIT_SUB)
    df_translate['Translation'] = df_translate['Translation'].apply(clean_translation)
    
    align_timestamp(df_text, df_translate, SUBTITLE_OUTPUT_CONFIGS, _OUTPUT_DIR)
    console.print(Panel("[bold green]🎉📝 Subtitles generation completed! Please check in the `output` folder 👀[/bold green]"))

    # for audio
    df_translate_for_audio = pd.read_excel(_5_REMERGED) # use remerged file to avoid unmatched lines when dubbing
    df_translate_for_audio['Translation'] = df_translate_for_audio['Translation'].apply(clean_translation)
    
    align_timestamp(df_text, df_translate_for_audio, AUDIO_SUBTITLE_OUTPUT_CONFIGS, _AUDIO_DIR)
    console.print(Panel(f"[bold green]🎉📝 Audio subtitles generation completed! Please check in the `{_AUDIO_DIR}` folder 👀[/bold green]"))
    

if __name__ == '__main__':
    align_timestamp_main()

================================================
FILE: core/_7_sub_into_vid.py
================================================
import os, subprocess, time
from core._1_ytdlp import find_video_files
import cv2
import numpy as np
import platform
from core.utils import *

SRC_FONT_SIZE = 15
TRANS_FONT_SIZE = 17
FONT_NAME = 'Arial'
TRANS_FONT_NAME = 'Arial'

# Linux need to install google noto fonts: apt-get install fonts-noto
if platform.system() == 'Linux':
    FONT_NAME = 'NotoSansCJK-Regular'
    TRANS_FONT_NAME = 'NotoSansCJK-Regular'
# Mac OS has different font names
elif platform.system() == 'Darwin':
    FONT_NAME = 'Arial Unicode MS'
    TRANS_FONT_NAME = 'Arial Unicode MS'

SRC_FONT_COLOR = '&HFFFFFF'
SRC_OUTLINE_COLOR = '&H000000'
SRC_OUTLINE_WIDTH = 1
SRC_SHADOW_COLOR = '&H80000000'
TRANS_FONT_COLOR = '&H00FFFF'
TRANS_OUTLINE_COLOR = '&H000000'
TRANS_OUTLINE_WIDTH = 1 
TRANS_BACK_COLOR = '&H33000000'

OUTPUT_DIR = "output"
OUTPUT_VIDEO = f"{OUTPUT_DIR}/output_sub.mp4"
SRC_SRT = f"{OUTPUT_DIR}/src.srt"
TRANS_SRT = f"{OUTPUT_DIR}/trans.srt"
    
def check_gpu_available():
    try:
        result = subprocess.run(['ffmpeg', '-encoders'], capture_output=True, text=True)
        return 'h264_nvenc' in result.stdout
    except:
        return False

def merge_subtitles_to_video():
    video_file = find_video_files()
    os.makedirs(os.path.dirname(OUTPUT_VIDEO), exist_ok=True)

    # Check resolution
    if not load_key("burn_subtitles"):
        rprint("[bold yellow]Warning: A 0-second black video will be generated as a placeholder as subtitles are not burned in.[/bold yellow]")

        # Create a black frame
        frame = np.zeros((1080, 1920, 3), dtype=np.uint8)
        fourcc = cv2.VideoWriter_fourcc(*'mp4v')
        out = cv2.VideoWriter(OUTPUT_VIDEO, fourcc, 1, (1920, 1080))
        out.write(frame)
        out.release()

        rprint("[bold green]Placeholder video has been generated.[/bold green]")
        return

    if not os.path.exists(SRC_SRT) or not os.path.exists(TRANS_SRT):
        rprint("Subtitle files not found in the 'output' directory.")
        exit(1)

    video = cv2.VideoCapture(video_file)
    TARGET_WIDTH = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))
    TARGET_HEIGHT = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))
    video.release()
    rprint(f"[bold green]Video resolution: {TARGET_WIDTH}x{TARGET_HEIGHT}[/bold green]")
    ffmpeg_cmd = [
        'ffmpeg', '-i', video_file,
        '-vf', (
            f"scale={TARGET_WIDTH}:{TARGET_HEIGHT}:force_original_aspect_ratio=decrease,"
            f"pad={TARGET_WIDTH}:{TARGET_HEIGHT}:(ow-iw)/2:(oh-ih)/2,"
            f"subtitles={SRC_SRT}:force_style='FontSize={SRC_FONT_SIZE},FontName={FONT_NAME}," 
            f"PrimaryColour={SRC_FONT_COLOR},OutlineColour={SRC_OUTLINE_COLOR},OutlineWidth={SRC_OUTLINE_WIDTH},"
            f"ShadowColour={SRC_SHADOW_COLOR},BorderStyle=1',"
            f"subtitles={TRANS_SRT}:force_style='FontSize={TRANS_FONT_SIZE},FontName={TRANS_FONT_NAME},"
            f"PrimaryColour={TRANS_FONT_COLOR},OutlineColour={TRANS_OUTLINE_COLOR},OutlineWidth={TRANS_OUTLINE_WIDTH},"
            f"BackColour={TRANS_BACK_COLOR},Alignment=2,MarginV=27,BorderStyle=4'"
        ).encode('utf-8'),
    ]

    ffmpeg_gpu = load_key("ffmpeg_gpu")
    if ffmpeg_gpu:
        rprint("[bold green]will use GPU acceleration.[/bold green]")
        ffmpeg_cmd.extend(['-c:v', 'h264_nvenc'])
    ffmpeg_cmd.extend(['-y', OUTPUT_VIDEO])

    rprint("🎬 Start merging subtitles to video...")
    start_time = time.time()
    process = subprocess.Popen(ffmpeg_cmd)

    try:
        process.wait()
        if process.returncode == 0:
            rprint(f"\n✅ Done! Time taken: {time.time() - start_time:.2f} seconds")
        else:
            rprint("\n❌ FFmpeg execution error")
    except Exception as e:
        rprint(f"\n❌ Error occurred: {e}")
        if process.poll() is None:
            process.kill()

if __name__ == "__main__":
    merge_subtitles_to_video()

================================================
FILE: core/_8_1_audio_task.py
================================================
import datetime
import re
import pandas as pd
from rich.console import Console
from rich.panel import Panel
from core.prompts import get_subtitle_trim_prompt
from core.tts_backend.estimate_duration import init_estimator, estimate_duration
from core.utils import *
from core.utils.models import *

console = Console()
speed_factor = load_key("speed_factor")

TRANS_SUBS_FOR_AUDIO_FILE = 'output/audio/trans_subs_for_audio.srt'
SRC_SUBS_FOR_AUDIO_FILE = 'output/audio/src_subs_for_audio.srt'
ESTIMATOR = None

def check_len_then_trim(text, duration):
    global ESTIMATOR
    if ESTIMATOR is None:
        ESTIMATOR = init_estimator()
    estimated_duration = estimate_duration(text, ESTIMATOR) / speed_factor['max']
    
    console.print(f"Subtitle text: {text}, "
                  f"[bold green]Estimated reading duration: {estimated_duration:.2f} seconds[/bold green]")

    if estimated_duration > duration:
        rprint(Panel(f"Estimated reading duration {estimated_duration:.2f} seconds exceeds given duration {duration:.2f} seconds, shortening...", title="Processing", border_style="yellow"))
        original_text = text
        prompt = get_subtitle_trim_prompt(text, duration)
        def valid_trim(response):
            if 'result' not in response:
                return {'status': 'error', 'message': 'No result in response'}
            return {'status': 'success', 'message': ''}
        try:    
            response = ask_gpt(prompt, resp_type='json', log_title='sub_trim', valid_def=valid_trim)
            shortened_text = response['result']
        except Exception:
            rprint("[bold red]🚫 AI refused to answer due to sensitivity, so manually remove punctuation[/bold red]")
            shortened_text = re.sub(r'[,.!?;:，。！？；：]', ' ', text).strip()
        rprint(Panel(f"Subtitle before shortening: {original_text}\nSubtitle after shortening: {shortened_text}", title="Subtitle Shortening Result", border_style="green"))
        return shortened_text
    else:
        return text

def time_diff_seconds(t1, t2, base_date):
    """Calculate the difference in seconds between two time objects"""
    dt1 = datetime.datetime.combine(base_date, t1)
    dt2 = datetime.datetime.combine(base_date, t2)
    return (dt2 - dt1).total_seconds()

def process_srt():
    """Process srt file, generate audio tasks"""
    
    with open(TRANS_SUBS_FOR_AUDIO_FILE, 'r', encoding='utf-8') as file:
        content = file.read()
    
    with open(SRC_SUBS_FOR_AUDIO_FILE, 'r', encoding='utf-8') as src_file:
        src_content = src_file.read()
    
    subtitles = []
    src_subtitles = {}
    
    for block in src_content.strip().split('\n\n'):
        lines = [line.strip() for line in block.split('\n') if line.strip()]
        if len(lines) < 3:
            continue
        
        number = int(lines[0])
        src_text = ' '.join(lines[2:])
        src_subtitles[number] = src_text
    
    for block in content.strip().split('\n\n'):
        lines = [line.strip() for line in block.split('\n') if line.strip()]
        if len(lines) < 3:
            continue
        
        try:
            number = int(lines[0])
            start_time, end_time = lines[1].split(' --> ')
            start_time = datetime.datetime.strptime(start_time, '%H:%M:%S,%f').time()
            end_time = datetime.datetime.strptime(end_time, '%H:%M:%S,%f').time()
            duration = time_diff_seconds(start_time, end_time, datetime.date.today())
            text = ' '.join(lines[2:])
            # Remove content within parentheses (including English and Chinese parentheses)
            text = re.sub(r'\([^)]*\)', '', text).strip()
            text = re.sub(r'（[^）]*）', '', text).strip()
            # Remove '-' character, can continue to add illegal characters that cause errors
            text = text.replace('-', '')

            # Add the original text from src_subs_for_audio.srt
            origin = src_subtitles.get(number, '')

        except ValueError as e:
            rprint(Panel(f"Unable to parse subtitle block '{block}', error: {str(e)}, skipping this subtitle block.", title="Error", border_style="red"))
            continue
        
        subtitles.append({'number': number, 'start_time': start_time, 'end_time': end_time, 'duration': duration, 'text': text, 'origin': origin})
    
    df = pd.DataFrame(subtitles)
    
    i = 0
    MIN_SUB_DUR = load_key("min_subtitle_duration")
    while i < len(df):
        today = datetime.date.today()
        if df.loc[i, 'duration'] < MIN_SUB_DUR:
            if i < len(df) - 1 and time_diff_seconds(df.loc[i, 'start_time'],df.loc[i+1, 'start_time'],today) < MIN_SUB_DUR:
                rprint(f"[bold yellow]Merging subtitles {i+1} and {i+2}[/bold yellow]")
                df.loc[i, 'text'] += ' ' + df.loc[i+1, 'text']
                df.loc[i, 'origin'] += ' ' + df.loc[i+1, 'origin']
                df.loc[i, 'end_time'] = df.loc[i+1, 'end_time']
                df.loc[i, 'duration'] = time_diff_seconds(df.loc[i, 'start_time'],df.loc[i, 'end_time'],today)
                df = df.drop(i+1).reset_index(drop=True)
            else:
                if i < len(df) - 1:  # Not the last audio
                    rprint(f"[bold blue]Extending subtitle {i+1} duration to {MIN_SUB_DUR} seconds[/bold blue]")
                    df.loc[i, 'end_time'] = (datetime.datetime.combine(today, df.loc[i, 'start_time']) + 
                                            datetime.timedelta(seconds=MIN_SUB_DUR)).time()
                    df.loc[i, 'duration'] = MIN_SUB_DUR
                else:
                    rprint(f"[bold red]The last subtitle {i+1} duration is less than {MIN_SUB_DUR} seconds, but not extending[/bold red]")
                i += 1
        else:
            i += 1
    
    df['start_time'] = df['start_time'].apply(lambda x: x.strftime('%H:%M:%S.%f')[:-3])
    df['end_time'] = df['end_time'].apply(lambda x: x.strftime('%H:%M:%S.%f')[:-3])

    ##! No longer perform secondary trim
    # check and trim subtitle length, for twice to ensure the subtitle length is within the limit, 允许tolerance
    # df['text'] = df.apply(lambda x: check_len_then_trim(x['text'], x['duration']+x['tolerance']), axis=1)

    return df

@check_file_exists(_8_1_AUDIO_TASK)
def gen_audio_task_main():
    df = process_srt()
    console.print(df)
    df.to_excel(_8_1_AUDIO_TASK, index=False)
    rprint(Panel(f"Successfully generated {_8_1_AUDIO_TASK}", title="Success", border_style="green"))

if __name__ == '__main__':
    gen_audio_task_main()

================================================
FILE: core/_8_2_dub_chunks.py
================================================
import datetime
import re
import pandas as pd
from core._8_1_audio_task import time_diff_seconds
from core.asr_backend.audio_preprocess import get_audio_duration
from core.tts_backend.estimate_duration import init_estimator, estimate_duration
from core.utils import *
from core.utils.models import *

SRC_SRT = "output/src.srt"
TRANS_SRT = "output/trans.srt"
MAX_MERGE_COUNT = 5
ESTIMATOR = None

def calc_if_too_fast(est_dur, tol_dur, duration, tolerance):
    accept = load_key("speed_factor.accept") # Maximum acceptable speed factor
    if est_dur / accept > tol_dur:  # Even max speed factor cannot adapt
        return 2
    elif est_dur > tol_dur:  # Speed adjustment needed within acceptable range
        return 1
    elif est_dur < duration - tolerance:  # Speaking speed too slow
        return -1
    else:  # Normal speaking speed
        return 0

def merge_rows(df, start_idx, merge_count):
    """Merge multiple rows and calculate cumulative values"""
    merged = {
        'est_dur': df.iloc[start_idx]['est_dur'],
        'tol_dur': df.iloc[start_idx]['tol_dur'],
        'duration': df.iloc[start_idx]['duration']
    }
    
    while merge_count < MAX_MERGE_COUNT and (start_idx + merge_count) < len(df):
        next_row = df.iloc[start_idx + merge_count]
        merged['est_dur'] += next_row['est_dur']
        merged['tol_dur'] += next_row['tol_dur']
        merged['duration'] += next_row['duration']
        
        speed_flag = calc_if_too_fast(
            merged['est_dur'],
            merged['tol_dur'],
            merged['duration'],
            df.iloc[start_idx + merge_count]['tolerance']
        )
        
        if speed_flag <= 0 or merge_count == 2:
            df.at[start_idx + merge_count, 'cut_off'] = 1
            return merge_count + 1
        
        merge_count += 1
    
    # If no suitable merge point is found
    if merge_count >= MAX_MERGE_COUNT or (start_idx + merge_count) >= len(df):
        df.at[start_idx + merge_count - 1, 'cut_off'] = 1
    return merge_count

def analyze_subtitle_timing_and_speed(df):
    rprint("[🔍 Analyzing] Calculating subtitle timing and speed...")
    global ESTIMATOR
    if ESTIMATOR is None:
        ESTIMATOR = init_estimator()
    TOLERANCE = load_key("tolerance")
    whole_dur = get_audio_duration(_RAW_AUDIO_FILE)
    df['gap'] = 0.0  # Initialize gap column
    for i in range(len(df) - 1):
        current_end = datetime.datetime.strptime(df.loc[i, 'end_time'], '%H:%M:%S.%f').time()
        next_start = datetime.datetime.strptime(df.loc[i + 1, 'start_time'], '%H:%M:%S.%f').time()
        df.loc[i, 'gap'] = time_diff_seconds(current_end, next_start, datetime.date.today())
    
    # Set the gap for the last line
    last_end = datetime.datetime.strptime(df.iloc[-1]['end_time'], '%H:%M:%S.%f').time()
    last_end_seconds = (last_end.hour * 3600 + last_end.minute * 60 + 
                       last_end.second + last_end.microsecond / 1000000)
    df.iloc[-1, df.columns.get_loc('gap')] = whole_dur - last_end_seconds
    
    df['tolerance'] = df['gap'].apply(lambda x: TOLERANCE if x > TOLERANCE else x)
    df['tol_dur'] = df['duration'] + df['tolerance']
    df['est_dur'] = df.apply(lambda x: estimate_duration(x['text'], ESTIMATOR), axis=1)

    ## Calculate speed indicators
    accept = load_key("speed_factor.accept") # Maximum acceptable speed factor
    def calc_if_too_fast(row):
        est_dur = row['est_dur']
        tol_dur = row['tol_dur']
        duration = row['duration']
        tolerance = row['tolerance']
        
        if est_dur / accept > tol_dur:  # Even max speed factor cannot adapt
            return 2
        elif est_dur > tol_dur:  # Speed adjustment needed within acceptable range
            return 1
        elif est_dur < duration - tolerance:  # Speaking speed too slow
            return -1
        else:  # Normal speaking speed
            return 0
    
    df['if_too_fast'] = df.apply(calc_if_too_fast, axis=1)
    return df

def process_cutoffs(df):
    rprint("[✂️ Processing] Generating cutoff points...")
    df['cut_off'] = 0  # Initialize cut_off column
    df.loc[df['gap'] >= load_key("tolerance"), 'cut_off'] = 1  # Set to 1 when gap is greater than TOLERANCE
    idx = 0
    while idx < len(df):
        # Process marked split points
        if df.iloc[idx]['cut_off'] == 1:
            if df.iloc[idx]['if_too_fast'] == 2:
                rprint(f"[⚠️ Warning] Line {idx} is too fast and cannot be fixed by speed adjustment")
            idx += 1
            continue

        # Process the last line
        if idx + 1 >= len(df):
            df.at[idx, 'cut_off'] = 1
            break

        # Process normal or slow lines
        if df.iloc[idx]['if_too_fast'] <= 0:
            if df.iloc[idx + 1]['if_too_fast'] <= 0:
                df.at[idx, 'cut_off'] = 1
                idx += 1
            else:
                idx += merge_rows(df, idx, 1)
        # Process fast lines
        else:
            idx += merge_rows(df, idx, 1)
    
    return df

def gen_dub_chunks():
    rprint("[🎬 Starting] Generating dubbing chunks...")
    df = pd.read_excel(_8_1_AUDIO_TASK)
    
    rprint("[📊 Processing] Analyzing timing and speed...")
    df = analyze_subtitle_timing_and_speed(df)
    
    rprint("[✂️ Processing] Processing cutoffs...")
    df = process_cutoffs(df)

    rprint("[📝 Reading] Loading transcript files...")
    content = open(TRANS_SRT, "r", encoding="utf-8").read()
    ori_content = open(SRC_SRT, "r", encoding="utf-8").read()
    
    # Process subtitle content
    content_lines = []
    ori_content_lines = []
    
    # Process translated subtitles
    for block in content.strip().split('\n\n'):
        lines = [line.strip() for line in block.split('\n') if line.strip()]
        if len(lines) >= 3:
            text = ' '.join(lines[2:])
            text = re.sub(r'\([^)]*\)|（[^）]*）', '', text).strip().replace('-', '')
            content_lines.append(text)
            
    # Process source subtitles (same structure)
    for block in ori_content.strip().split('\n\n'):
        lines = [line.strip() for line in block.split('\n') if line.strip()]
        if len(lines) >= 3:
            text = ' '.join(lines[2:])
            text = re.sub(r'\([^)]*\)|（[^）]*）', '', text).strip().replace('-', '')
            ori_content_lines.append(text)

    # Match processing
    df['lines'] = None
    df['src_lines'] = None
    last_idx = 0

    def clean_text(text):
        """clean space and punctuation"""
        if not text or not isinstance(text, str):
            return ''
        return re.sub(r'[^\w\s]|[\s]', '', text)

    for idx, row in df.iterrows():
        target = clean_text(row['text'])
        matches = []
        current = ''
        match_indices = []  # Store indices for matching lines
        
        for i in range(last_idx, len(content_lines)):
            line = content_lines[i]
            cleaned_line = clean_text

Download .txt

gitextract_wyfw8khg/

├── .cursorrules
├── .gitignore
├── .streamlit/
│   └── config.toml
├── Dockerfile
├── LICENSE
├── OneKeyStart.bat
├── README.md
├── VideoLingo_colab.ipynb
├── batch/
│   ├── OneKeyBatch.bat
│   ├── README.md
│   ├── README.zh.md
│   └── utils/
│       ├── batch_processor.py
│       ├── settings_check.py
│       └── video_processor.py
├── config.yaml
├── core/
│   ├── _10_gen_audio.py
│   ├── _11_merge_audio.py
│   ├── _12_dub_to_vid.py
│   ├── _1_ytdlp.py
│   ├── _2_asr.py
│   ├── _3_1_split_nlp.py
│   ├── _3_2_split_meaning.py
│   ├── _4_1_summarize.py
│   ├── _4_2_translate.py
│   ├── _5_split_sub.py
│   ├── _6_gen_sub.py
│   ├── _7_sub_into_vid.py
│   ├── _8_1_audio_task.py
│   ├── _8_2_dub_chunks.py
│   ├── _9_refer_audio.py
│   ├── __init__.py
│   ├── asr_backend/
│   │   ├── __init__.py
│   │   ├── audio_preprocess.py
│   │   ├── demucs_vl.py
│   │   ├── elevenlabs_asr.py
│   │   ├── whisperX_302.py
│   │   └── whisperX_local.py
│   ├── prompts.py
│   ├── spacy_utils/
│   │   ├── __init__.py
│   │   ├── load_nlp_model.py
│   │   ├── split_by_comma.py
│   │   ├── split_by_connector.py
│   │   ├── split_by_mark.py
│   │   └── split_long_by_root.py
│   ├── st_utils/
│   │   ├── __init__.py
│   │   ├── download_video_section.py
│   │   ├── imports_and_utils.py
│   │   └── sidebar_setting.py
│   ├── translate_lines.py
│   ├── tts_backend/
│   │   ├── _302_f5tts.py
│   │   ├── azure_tts.py
│   │   ├── custom_tts.py
│   │   ├── edge_tts.py
│   │   ├── estimate_duration.py
│   │   ├── fish_tts.py
│   │   ├── gpt_sovits_tts.py
│   │   ├── openai_tts.py
│   │   ├── sf_cosyvoice2.py
│   │   ├── sf_fishtts.py
│   │   └── tts_main.py
│   └── utils/
│       ├── __init__.py
│       ├── ask_gpt.py
│       ├── config_utils.py
│       ├── decorator.py
│       ├── delete_retry_dubbing.py
│       ├── models.py
│       ├── onekeycleanup.py
│       └── pypi_autochoose.py
├── custom_terms.xlsx
├── docs/
│   ├── .gitignore
│   ├── components/
│   │   ├── landing/
│   │   │   ├── comments.tsx
│   │   │   ├── faq.tsx
│   │   │   ├── features.tsx
│   │   │   ├── github-stats.tsx
│   │   │   ├── hero.tsx
│   │   │   └── index.tsx
│   │   └── ui/
│   │       ├── accordion.tsx
│   │       ├── badge.tsx
│   │       ├── button.tsx
│   │       ├── card.tsx
│   │       ├── hero-video-dialog.tsx
│   │       ├── rainbow-button.tsx
│   │       └── tooltip.tsx
│   ├── components.json
│   ├── lib/
│   │   └── utils.ts
│   ├── middleware.js
│   ├── next-env.d.ts
│   ├── next.config.js
│   ├── package.json
│   ├── pages/
│   │   ├── _app.mdx
│   │   ├── _meta.en-US.json
│   │   ├── _meta.ja.json
│   │   ├── _meta.zh-CN.json
│   │   ├── docs/
│   │   │   ├── _meta.en-US.json
│   │   │   ├── _meta.ja.json
│   │   │   ├── _meta.zh-CN.json
│   │   │   ├── docker.en-US.md
│   │   │   ├── docker.zh-CN.md
│   │   │   ├── introduction.en-US.md
│   │   │   ├── introduction.zh-CN.md
│   │   │   ├── start.en-US.md
│   │   │   ├── start.zh-CN.md
│   │   │   ├── tech.en-US.md
│   │   │   └── tech.zh-CN.md
│   │   ├── globals.css
│   │   ├── index.en-US.mdx
│   │   ├── index.ja.mdx
│   │   └── index.zh-CN.mdx
│   ├── postcss.config.js
│   ├── public/
│   │   └── site.webmanifest
│   ├── tailwind.config.js
│   ├── theme.config.jsx
│   └── tsconfig.json
├── install.py
├── launch.py
├── requirements.txt
├── setup.py
├── st.py
└── translations/
    ├── README.es.md
    ├── README.fr.md
    ├── README.ja.md
    ├── README.ru.md
    ├── README.zh-TW.md
    ├── README.zh.md
    ├── en.json
    ├── es.json
    ├── fr.json
    ├── ja.json
    ├── ru.json
    ├── translations.py
    ├── zh-CN.json
    └── zh-HK.json

Download .txt

SYMBOL INDEX (205 symbols across 66 files)

FILE: batch/utils/batch_processor.py
  function record_and_update_config (line 14) | def record_and_update_config(source_language, target_language):
  function process_batch (line 25) | def process_batch():

FILE: batch/utils/settings_check.py
  function check_settings (line 13) | def check_settings():

FILE: batch/utils/video_processor.py
  function process_video (line 19) | def process_video(file, dubbing=False, is_retry=False):
  function prepare_output_folder (line 74) | def prepare_output_folder(output_folder):
  function process_input_file (line 79) | def process_input_file(file):
  function split_sentences (line 90) | def split_sentences():
  function summarize_and_translate (line 94) | def summarize_and_translate():
  function process_and_align_subtitles (line 98) | def process_and_align_subtitles():
  function gen_audio_tasks (line 102) | def gen_audio_tasks():

FILE: core/_10_gen_audio.py
  function parse_df_srt_time (line 24) | def parse_df_srt_time(time_str: str) -> float:
  function adjust_audio_speed (line 30) | def adjust_audio_speed(input_file: str, output_file: str, speed_factor: ...
  function process_row (line 65) | def process_row(row: pd.Series, tasks_df: pd.DataFrame) -> Tuple[int, fl...
  function generate_tts_audio (line 76) | def generate_tts_audio(tasks_df: pd.DataFrame) -> pd.DataFrame:
  function process_chunk (line 118) | def process_chunk(chunk_df: pd.DataFrame, accept: float, min_speed: floa...
  function merge_chunks (line 141) | def merge_chunks(tasks_df: pd.DataFrame) -> pd.DataFrame:
  function gen_audio (line 209) | def gen_audio() -> None:

FILE: core/_11_merge_audio.py
  function load_and_flatten_data (line 16) | def load_and_flatten_data(excel_file):
  function get_audio_files (line 27) | def get_audio_files(df):
  function process_audio_segment (line 38) | def process_audio_segment(audio_file):
  function merge_audio_segments (line 54) | def merge_audio_segments(audios, new_sub_times, sample_rate):
  function create_srt_subtitle (line 85) | def create_srt_subtitle():
  function merge_full_audio (line 99) | def merge_full_audio():

FILE: core/_12_dub_to_vid.py
  function merge_video_audio (line 31) | def merge_video_audio():

FILE: core/_1_ytdlp.py
  function sanitize_filename (line 7) | def sanitize_filename(filename):
  function update_ytdlp (line 15) | def update_ytdlp():
  function download_video_ytdlp (line 26) | def download_video_ytdlp(url, save_path='output', resolution='1080'):
  function find_video_files (line 54) | def find_video_files(save_path='output'):

FILE: core/_2_asr.py
  function transcribe (line 8) | def transcribe():

FILE: core/_3_1_split_nlp.py
  function split_by_spacy (line 6) | def split_by_spacy():

FILE: core/_3_2_split_meaning.py
  function tokenize_sentence (line 12) | def tokenize_sentence(sentence, nlp):
  function find_split_positions (line 16) | def find_split_positions(original, modified):
  function split_sentence (line 48) | def split_sentence(sentence, num_parts, word_limit=20, index=-1, retry_a...
  function parallel_split_sentences (line 83) | def parallel_split_sentences(sentences, max_length, max_workers, nlp, re...
  function split_sentences_by_meaning (line 111) | def split_sentences_by_meaning():

FILE: core/_4_1_summarize.py
  function combine_chunks (line 9) | def combine_chunks():
  function search_things_to_note_in_prompt (line 17) | def search_things_to_note_in_prompt(sentence):
  function get_summary (line 33) | def get_summary():

FILE: core/_4_2_translate.py
  function split_chunks_by_chars (line 16) | def split_chunks_by_chars(chunk_size, max_i):
  function get_previous_content (line 36) | def get_previous_content(chunks, chunk_index):
  function get_after_content (line 38) | def get_after_content(chunks, chunk_index):
  function translate_chunk (line 42) | def translate_chunk(chunk, chunks, theme_prompt, i):
  function similar (line 50) | def similar(a, b):
  function translate_all (line 55) | def translate_all():

FILE: core/_5_split_sub.py
  function calc_len (line 16) | def calc_len(text: str) -> float:
  function align_subs (line 33) | def align_subs(src_sub: str, tr_sub: str, src_part: str) -> Tuple[List[s...
  function split_align_subs (line 61) | def split_align_subs(src_lines: List[str], tr_lines: List[str]):
  function split_for_sub_main (line 96) | def split_for_sub_main():

FILE: core/_6_gen_sub.py
  function convert_to_srt_format (line 23) | def convert_to_srt_format(start_time, end_time):
  function remove_punctuation (line 36) | def remove_punctuation(text):
  function show_difference (line 41) | def show_difference(str1, str2):
  function get_sentence_timestamps (line 59) | def get_sentence_timestamps(df_words, df_sentences):
  function align_timestamp (line 103) | def align_timestamp(df_text, df_translate, subtitle_output_configs: list...
  function clean_translation (line 144) | def clean_translation(x):
  function align_timestamp_main (line 150) | def align_timestamp_main():

FILE: core/_7_sub_into_vid.py
  function check_gpu_available (line 36) | def check_gpu_available():
  function merge_subtitles_to_video (line 43) | def merge_subtitles_to_video():

FILE: core/_8_1_audio_task.py
  function check_len_then_trim (line 18) | def check_len_then_trim(text, duration):
  function time_diff_seconds (line 46) | def time_diff_seconds(t1, t2, base_date):
  function process_srt (line 52) | def process_srt():
  function gen_audio_task_main (line 136) | def gen_audio_task_main():

FILE: core/_8_2_dub_chunks.py
  function calc_if_too_fast (line 15) | def calc_if_too_fast(est_dur, tol_dur, duration, tolerance):
  function merge_rows (line 26) | def merge_rows(df, start_idx, merge_count):
  function analyze_subtitle_timing_and_speed (line 58) | def analyze_subtitle_timing_and_speed(df):
  function process_cutoffs (line 101) | def process_cutoffs(df):
  function gen_dub_chunks (line 132) | def gen_dub_chunks():

FILE: core/_9_refer_audio.py
  function time_to_samples (line 13) | def time_to_samples(time_str, sr):
  function extract_audio (line 20) | def extract_audio(audio_data, sr, start_time, end_time, out_file):
  function extract_refer_audio_main (line 26) | def extract_refer_audio_main():

FILE: core/asr_backend/audio_preprocess.py
  function _ffmpeg_has_encoder (line 12) | def _ffmpeg_has_encoder(encoder_name: str) -> bool:
  function normalize_audio_volume (line 22) | def normalize_audio_volume(audio_path, output_path, target_db = -20.0, f...
  function convert_video_to_audio (line 30) | def convert_video_to_audio(video_file: str):
  function get_audio_duration (line 55) | def get_audio_duration(audio_file: str) -> float:
  function split_audio (line 71) | def split_audio(audio_file: str, target_len: float = 30*60, win: float =...
  function process_transcription (line 109) | def process_transcription(result: Dict) -> pd.DataFrame:
  function save_results (line 160) | def save_results(df: pd.DataFrame):
  function save_language (line 180) | def save_language(language: str):

FILE: core/asr_backend/demucs_vl.py
  class PreloadedSeparator (line 14) | class PreloadedSeparator(Separator):
    method __init__ (line 15) | def __init__(self, model: BagOfModels, shifts: int = 1, overlap: float...
  function demucs_audio (line 22) | def demucs_audio():

FILE: core/asr_backend/elevenlabs_asr.py
  function elev2whisper (line 33) | def elev2whisper(elev_json, word_level_timestamp = False):
  function transcribe_audio_elevenlabs (line 67) | def transcribe_audio_elevenlabs(raw_audio_path, vocal_audio_path, start ...

FILE: core/asr_backend/whisperX_302.py
  function transcribe_audio_302 (line 13) | def transcribe_audio_302(raw_audio_path: str, vocal_audio_path: str, sta...

FILE: core/asr_backend/whisperX_local.py
  function _patched_torch_load (line 22) | def _patched_torch_load(*args, **kwargs):
  function check_hf_mirror (line 38) | def check_hf_mirror():
  function transcribe_audio (line 62) | def transcribe_audio(raw_audio_file, vocal_audio_file, start, end):

FILE: core/prompts.py
  function get_split_prompt (line 6) | def get_split_prompt(sentence, num_parts = 2, word_limit = 20):
  function get_summary_prompt (line 53) | def get_summary_prompt(source_content, custom_terms_json=None):
  function generate_shared_prompt (line 128) | def generate_shared_prompt(previous_content_prompt, after_content_prompt...
  function get_prompt_faithfulness (line 144) | def get_prompt_faithfulness(lines, shared_prompt):
  function get_prompt_expressiveness (line 190) | def get_prompt_expressiveness(faithfulness_result, lines, shared_prompt):
  function get_align_prompt (line 252) | def get_align_prompt(src_sub, tr_sub, src_part):
  function get_subtitle_trim_prompt (line 302) | def get_subtitle_trim_prompt(text, duration):
  function get_correct_text_prompt (line 343) | def get_correct_text_prompt(text):

FILE: core/spacy_utils/load_nlp_model.py
  function get_spacy_model (line 7) | def get_spacy_model(language: str):
  function init_nlp (line 14) | def init_nlp():

FILE: core/spacy_utils/split_by_comma.py
  function is_valid_phrase (line 9) | def is_valid_phrase(phrase):
  function analyze_comma (line 15) | def analyze_comma(start, doc, token):
  function split_by_comma (line 30) | def split_by_comma(text, nlp):
  function split_by_comma_main (line 47) | def split_by_comma_main(nlp):

FILE: core/spacy_utils/split_by_connector.py
  function analyze_connectors (line 8) | def analyze_connectors(doc, token):
  function split_by_connectors (line 84) | def split_by_connectors(text, context_words=5, nlp=None):
  function split_sentences_main (line 127) | def split_sentences_main(nlp):

FILE: core/spacy_utils/split_by_mark.py
  function split_by_mark (line 10) | def split_by_mark(nlp):

FILE: core/spacy_utils/split_long_by_root.py
  function split_long_sentence (line 10) | def split_long_sentence(doc):
  function split_extremely_long_sentence (line 43) | def split_extremely_long_sentence(doc):
  function split_long_by_root_main (line 64) | def split_long_by_root_main(nlp):

FILE: core/st_utils/download_video_section.py
  function download_video_section (line 14) | def download_video_section():
  function convert_audio_to_video (line 67) | def convert_audio_to_video(audio_file: str) -> str:

FILE: core/st_utils/imports_and_utils.py
  function download_subtitle_zip_button (line 8) | def download_subtitle_zip_button(text: str):

FILE: core/st_utils/sidebar_setting.py
  function config_input (line 6) | def config_input(label, key, help=None):
  function page_setting (line 13) | def page_setting():
  function check_api (line 155) | def check_api():

FILE: core/translate_lines.py
  function valid_translate_result (line 9) | def valid_translate_result(result: dict, required_keys: list, required_s...
  function translate_lines (line 21) | def translate_lines(lines, previous_content_prompt, after_cotent_prompt,...

FILE: core/tts_backend/_302_f5tts.py
  function upload_file_to_302 (line 13) | def upload_file_to_302(file_path):
  function _f5_tts (line 29) | def _f5_tts(text: str, refer_url: str, save_path: str) -> bool:
  function _merge_audio (line 53) | def _merge_audio(files, output: str) -> bool:
  function _get_ref_audio (line 78) | def _get_ref_audio(task_df, min_duration=8, max_duration=14.5) -> str:
  function f5_tts_for_videolingo (line 120) | def f5_tts_for_videolingo(text: str, save_as: str, number: int, task_df):

FILE: core/tts_backend/azure_tts.py
  function azure_tts (line 4) | def azure_tts(text: str, save_path: str) -> None:

FILE: core/tts_backend/custom_tts.py
  function custom_tts (line 3) | def custom_tts(text, save_path):

FILE: core/tts_backend/edge_tts.py
  function edge_tts (line 15) | def edge_tts(text, save_path):

FILE: core/tts_backend/estimate_duration.py
  class AdvancedSyllableEstimator (line 7) | class AdvancedSyllableEstimator:
    method __init__ (line 8) | def __init__(self):
    method estimate_duration (line 20) | def estimate_duration(self, text: str, lang: Optional[str] = None) -> ...
    method count_syllables (line 24) | def count_syllables(self, text: str, lang: Optional[str] = None) -> int:
    method _count_english_syllables (line 49) | def _count_english_syllables(self, text: str) -> int:
    method _detect_language (line 59) | def _detect_language(self, text: str) -> str:
    method process_mixed_text (line 64) | def process_mixed_text(self, text: str) -> dict:
  function init_estimator (line 106) | def init_estimator():
  function estimate_duration (line 109) | def estimate_duration(text: str, estimator: AdvancedSyllableEstimator):

FILE: core/tts_backend/fish_tts.py
  function fish_tts (line 6) | def fish_tts(text: str, save_as: str) -> bool:

FILE: core/tts_backend/gpt_sovits_tts.py
  function check_lang (line 9) | def check_lang(text_lang, prompt_lang):
  function gpt_sovits_tts (line 27) | def gpt_sovits_tts(text, text_lang, save_path, ref_audio_path, prompt_la...
  function gpt_sovits_tts_for_videolingo (line 56) | def gpt_sovits_tts_for_videolingo(text, save_as, number, task_df):
  function find_and_check_config_path (line 109) | def find_and_check_config_path(dubbing_character):
  function start_gpt_sovits_server (line 125) | def start_gpt_sovits_server():

FILE: core/tts_backend/openai_tts.py
  function openai_tts (line 11) | def openai_tts(text, save_path):

FILE: core/tts_backend/sf_cosyvoice2.py
  function wav_to_base64 (line 6) | def wav_to_base64(wav_file_path):
  function cosyvoice_tts_for_videolingo (line 13) | def cosyvoice_tts_for_videolingo(text, save_as, number, task_df):

FILE: core/tts_backend/sf_fishtts.py
  function siliconflow_fish_tts (line 23) | def siliconflow_fish_tts(text, save_path, mode="preset", voice_id=None, ...
  function create_custom_voice (line 66) | def create_custom_voice(audio_path, text, custom_name=None):
  function merge_audio (line 101) | def merge_audio(files, output):
  function get_ref_audio (line 122) | def get_ref_audio(task_df):
  function siliconflow_fish_tts_for_videolingo (line 180) | def siliconflow_fish_tts_for_videolingo(text, save_as, number, task_df):

FILE: core/tts_backend/tts_main.py
  function clean_text_for_tts (line 18) | def clean_text_for_tts(text):
  function tts_main (line 25) | def tts_main(text, save_as, number, task_df):

FILE: core/utils/ask_gpt.py
  function _save_cache (line 17) | def _save_cache(model, prompt, resp_content, resp_type, resp, message=No...
  function _load_cache (line 29) | def _load_cache(prompt, resp_type, log_title):
  function ask_gpt (line 44) | def ask_gpt(prompt, resp_type=None, valid_def=None, log_title="default"):

FILE: core/utils/config_utils.py
  function load_key (line 14) | def load_key(key):
  function update_key (line 28) | def update_key(key, new_value):
  function get_joiner (line 50) | def get_joiner(language):

FILE: core/utils/decorator.py
  function except_handler (line 10) | def except_handler(error_msg, retry=0, delay=1, default_return=None):
  function check_file_exists (line 34) | def check_file_exists(file_path):
  function test_function (line 47) | def test_function():

FILE: core/utils/delete_retry_dubbing.py
  function delete_dubbing_files (line 4) | def delete_dubbing_files():

FILE: core/utils/onekeycleanup.py
  function cleanup (line 6) | def cleanup(history_dir="history"):
  function move_file (line 42) | def move_file(src, dst):
  function sanitize_filename (line 72) | def sanitize_filename(filename):

FILE: core/utils/pypi_autochoose.py
  function get_optimal_thread_count (line 22) | def get_optimal_thread_count():
  function test_mirror_speed (line 29) | def test_mirror_speed(name, url):
  function set_pip_mirror (line 42) | def set_pip_mirror(url):
  function get_current_pip_mirror (line 52) | def get_current_pip_mirror():
  function main (line 60) | def main():

FILE: docs/components/landing/comments.tsx
  type Comment (line 3) | type Comment = {
  type Props (line 9) | type Props = {
  function Comments (line 14) | function Comments({ items, title }: Props) {

FILE: docs/components/landing/faq.tsx
  type FAQItem (line 3) | interface FAQItem {
  type FAQProps (line 8) | interface FAQProps {
  function FAQ (line 13) | function FAQ({ items, title }: FAQProps) {

FILE: docs/components/landing/features.tsx
  type Feature (line 5) | type Feature = {
  type FeaturesProps (line 12) | type FeaturesProps = {
  function Features (line 18) | function Features({ items, title }: FeaturesProps) {

FILE: docs/components/landing/github-stats.tsx
  function GitHubStats (line 5) | function GitHubStats({ stars, recentStargazers }) {

FILE: docs/components/landing/hero.tsx
  type HeroProps (line 6) | interface HeroProps {
  function Hero (line 12) | function Hero({ title, description, videoSrc }: HeroProps) {

FILE: docs/components/landing/index.tsx
  function Landing (line 18) | function Landing({ data }) {

FILE: docs/components/ui/badge.tsx
  type BadgeProps (line 26) | interface BadgeProps
  function Badge (line 30) | function Badge({ className, variant, ...props }: BadgeProps) {

FILE: docs/components/ui/button.tsx
  type ButtonProps (line 37) | interface ButtonProps

FILE: docs/components/ui/hero-video-dialog.tsx
  type AnimationStyle (line 9) | type AnimationStyle =
  type HeroVideoProps (line 19) | interface HeroVideoProps {
  function HeroVideoDialog (line 70) | function HeroVideoDialog({

FILE: docs/components/ui/rainbow-button.tsx
  type RainbowButtonProps (line 5) | interface RainbowButtonProps
  function RainbowButton (line 8) | function RainbowButton({ children, ...props }: RainbowButtonProps) {

FILE: docs/lib/utils.ts
  function cn (line 4) | function cn(...inputs: ClassValue[]) {

FILE: docs/theme.config.jsx
  method useNextSeoProps (line 40) | useNextSeoProps() {

FILE: install.py
  function install_package (line 15) | def install_package(*packages):
  function check_nvidia_gpu (line 18) | def check_nvidia_gpu():
  function check_ffmpeg (line 44) | def check_ffmpeg():
  function _detect_cuda_version_from_smi (line 96) | def _detect_cuda_version_from_smi():
  function _detect_cuda_index (line 111) | def _detect_cuda_index():
  function main (line 143) | def main():

FILE: launch.py
  function log (line 11) | def log(msg):
  function check_package (line 16) | def check_package(name, import_name=None):
  function main (line 24) | def main():

FILE: st.py
  function text_processing_section (line 16) | def text_processing_section():
  function process_text (line 45) | def process_text():
  function audio_processing_section (line 65) | def audio_processing_section():
  function process_audio (line 92) | def process_audio():
  function main (line 108) | def main():

FILE: translations/translations.py
  function load_translations (line 14) | def load_translations(language="en"):
  function translate (line 19) | def translate(key):

Download .json

Condensed preview — 132 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (587K chars).

[
  {
    "path": ".cursorrules",
    "chars": 101,
    "preview": "2. 使用\n# ------------\n# comment\n# ------------ \n进行大块的注释\n3. 避免使用复杂的函数内注释，以及函数变量中不要有类型定义\n4. 使用英文注释和print"
  },
  {
    "path": ".gitignore",
    "chars": 2943,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": ".streamlit/config.toml",
    "chars": 29,
    "preview": "[server]\nmaxUploadSize = 4096"
  },
  {
    "path": "Dockerfile",
    "chars": 2125,
    "preview": "ARG CUDA_VERSION=12.4.1\nFROM nvidia/cuda:${CUDA_VERSION}-devel-ubuntu20.04\n\n# Set environment variables\nENV DEBIAN_FRONT"
  },
  {
    "path": "LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "OneKeyStart.bat",
    "chars": 242,
    "preview": "@echo off\nchcp 65001 >nul 2>&1\ncall conda activate videolingo 2>nul\nset PYTHONWARNINGS=ignore\npython \"%~dp0launch.py\"\nif"
  },
  {
    "path": "README.md",
    "chars": 7050,
    "preview": "<div align=\"center\">\n\n<img src=\"/docs/logo.png\" alt=\"VideoLingo Logo\" height=\"140\">\n\n# Connect the World, Frame by Frame"
  },
  {
    "path": "VideoLingo_colab.ipynb",
    "chars": 85535,
    "preview": "{\n  \"cells\": [\n    {\n      \"cell_type\": \"markdown\",\n      \"metadata\": {\n        \"id\": \"RkeSbYF2HoM_\"\n      },\n      \"sou"
  },
  {
    "path": "batch/OneKeyBatch.bat",
    "chars": 131,
    "preview": "@echo off\ncd /D \"%~dp0\"\ncd ..\n\ncall conda activate videolingo\n\n@rem 运行批处理脚本\ncall python batch\\utils\\batch_processor.py\n\n"
  },
  {
    "path": "batch/README.md",
    "chars": 1864,
    "preview": "# VideoLingo Batch Mode\n\n[English](./README.md) | [简体中文](./README.zh.md)\n\nBefore utilizing the batch mode, ensure you ha"
  },
  {
    "path": "batch/README.zh.md",
    "chars": 1085,
    "preview": "# VideoLingo Batch Mode\n\n[English](./README.md) | [简体中文](./README.zh.md)\n\n在使用批处理模式前，请确保你已经使用过 Streamlit 模式并正确设置了 `config"
  },
  {
    "path": "batch/utils/batch_processor.py",
    "chars": 4485,
    "preview": "import os\nimport gc\nfrom batch.utils.settings_check import check_settings\nfrom batch.utils.video_processor import proces"
  },
  {
    "path": "batch/utils/settings_check.py",
    "chars": 1899,
    "preview": "import os\nimport pandas as pd\nfrom rich.console import Console\nfrom rich.panel import Panel\n\n# Constants\nSETTINGS_FILE ="
  },
  {
    "path": "batch/utils/video_processor.py",
    "chars": 3732,
    "preview": "import os\nfrom core.st_utils.imports_and_utils import *\nfrom core.utils.onekeycleanup import cleanup\nfrom core.utils imp"
  },
  {
    "path": "config.yaml",
    "chars": 4642,
    "preview": "# * Settings marked with * are advanced settings that won't appear in the Streamlit page and can only be modified manual"
  },
  {
    "path": "core/_10_gen_audio.py",
    "chars": 11588,
    "preview": "import os\nimport time\nimport shutil\nimport subprocess\nfrom typing import Tuple\n\nimport pandas as pd\nfrom pydub import Au"
  },
  {
    "path": "core/_11_merge_audio.py",
    "chars": 5674,
    "preview": "import os\nimport pandas as pd\nimport subprocess\nfrom pydub import AudioSegment\nfrom rich.progress import Progress, Spinn"
  },
  {
    "path": "core/_12_dub_to_vid.py",
    "chars": 3108,
    "preview": "import platform\nimport subprocess\n\nimport cv2\nimport numpy as np\nfrom rich.console import Console\n\nfrom core._1_ytdlp im"
  },
  {
    "path": "core/_1_ytdlp.py",
    "chars": 2996,
    "preview": "import os,sys\nimport glob\nimport re\nimport subprocess\nfrom core.utils import *\n\ndef sanitize_filename(filename):\n    # R"
  },
  {
    "path": "core/_2_asr.py",
    "chars": 1834,
    "preview": "from core.utils import *\nfrom core.asr_backend.demucs_vl import demucs_audio\nfrom core.asr_backend.audio_preprocess impo"
  },
  {
    "path": "core/_3_1_split_nlp.py",
    "chars": 376,
    "preview": "from core.spacy_utils import *\nfrom core.utils.models import _3_1_SPLIT_BY_NLP\nfrom core.utils import check_file_exists\n"
  },
  {
    "path": "core/_3_2_split_meaning.py",
    "chars": 5728,
    "preview": "import concurrent.futures\nfrom difflib import SequenceMatcher\nimport math\nfrom core.prompts import get_split_prompt\nfrom"
  },
  {
    "path": "core/_4_1_summarize.py",
    "chars": 2873,
    "preview": "import json\nfrom core.prompts import get_summary_prompt\nimport pandas as pd\nfrom core.utils import *\nfrom core.utils.mod"
  },
  {
    "path": "core/_4_2_translate.py",
    "chars": 5246,
    "preview": "import pandas as pd\nimport json\nimport concurrent.futures\nfrom core.translate_lines import translate_lines\nfrom core._4_"
  },
  {
    "path": "core/_5_split_sub.py",
    "chars": 5473,
    "preview": "import pandas as pd\nfrom typing import List, Tuple\nimport concurrent.futures\n\nfrom core._3_2_split_meaning import split_"
  },
  {
    "path": "core/_6_gen_sub.py",
    "chars": 6885,
    "preview": "import pandas as pd\nimport os\nimport re\nfrom rich.panel import Panel\nfrom rich.console import Console\nimport autocorrect"
  },
  {
    "path": "core/_7_sub_into_vid.py",
    "chars": 3865,
    "preview": "import os, subprocess, time\nfrom core._1_ytdlp import find_video_files\nimport cv2\nimport numpy as np\nimport platform\nfro"
  },
  {
    "path": "core/_8_1_audio_task.py",
    "chars": 6533,
    "preview": "import datetime\nimport re\nimport pandas as pd\nfrom rich.console import Console\nfrom rich.panel import Panel\nfrom core.pr"
  },
  {
    "path": "core/_8_2_dub_chunks.py",
    "chars": 7712,
    "preview": "import datetime\nimport re\nimport pandas as pd\nfrom core._8_1_audio_task import time_diff_seconds\nfrom core.asr_backend.a"
  },
  {
    "path": "core/_9_refer_audio.py",
    "chars": 2065,
    "preview": "import os\nfrom rich.panel import Panel\nfrom rich.console import Console\nfrom rich.progress import Progress, SpinnerColum"
  },
  {
    "path": "core/__init__.py",
    "chars": 1012,
    "preview": "# use try-except to avoid error when installing\ntry:\n    from . import (\n        _1_ytdlp,\n        _2_asr,\n        _3_1_"
  },
  {
    "path": "core/asr_backend/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "core/asr_backend/audio_preprocess.py",
    "chars": 8100,
    "preview": "import os, subprocess\nimport pandas as pd\nfrom typing import Dict, List, Tuple\nfrom pydub import AudioSegment\nfrom core."
  },
  {
    "path": "core/asr_backend/demucs_vl.py",
    "chars": 2262,
    "preview": "import os\nimport torch\nfrom rich.console import Console\nfrom rich import print as rprint\nfrom demucs.pretrained import g"
  },
  {
    "path": "core/asr_backend/elevenlabs_asr.py",
    "chars": 5283,
    "preview": "import os\nimport json\nimport time\nimport requests\nimport tempfile\nimport librosa\nimport soundfile as sf\nfrom rich import"
  },
  {
    "path": "core/asr_backend/whisperX_302.py",
    "chars": 2440,
    "preview": "import os\nimport io\nimport json\nimport time\nimport requests\nimport librosa\nimport soundfile as sf\nfrom rich import print"
  },
  {
    "path": "core/asr_backend/whisperX_local.py",
    "chars": 6823,
    "preview": "import os\nimport warnings\nimport time\nimport subprocess\nimport torch\nimport functools\n\nwarnings.filterwarnings(\"ignore\")"
  },
  {
    "path": "core/prompts.py",
    "chars": 13271,
    "preview": "import json\nfrom core.utils import *\n\n## ================================================================\n# @ step4_spli"
  },
  {
    "path": "core/spacy_utils/__init__.py",
    "chars": 372,
    "preview": "from .split_by_comma import split_by_comma_main\nfrom .split_by_connector import split_sentences_main\nfrom .split_by_mark"
  },
  {
    "path": "core/spacy_utils/load_nlp_model.py",
    "chars": 1307,
    "preview": "import spacy\nfrom spacy.cli import download\nfrom core.utils import rprint, load_key, except_handler\n\nSPACY_MODEL_MAP = l"
  },
  {
    "path": "core/spacy_utils/split_by_comma.py",
    "chars": 2719,
    "preview": "import itertools\nimport os\nimport warnings\nfrom core.utils import *\nfrom core.spacy_utils.load_nlp_model import init_nlp"
  },
  {
    "path": "core/spacy_utils/split_by_connector.py",
    "chars": 6063,
    "preview": "import os\nimport warnings\nfrom core.spacy_utils.load_nlp_model import init_nlp, SPLIT_BY_COMMA_FILE, SPLIT_BY_CONNECTOR_"
  },
  {
    "path": "core/spacy_utils/split_by_mark.py",
    "chars": 2458,
    "preview": "import os\nimport pandas as pd\nimport warnings\nfrom core.spacy_utils.load_nlp_model import init_nlp, SPLIT_BY_MARK_FILE\nf"
  },
  {
    "path": "core/spacy_utils/split_long_by_root.py",
    "chars": 4483,
    "preview": "import os\nimport string\nimport warnings\nfrom core.spacy_utils.load_nlp_model import init_nlp, SPLIT_BY_CONNECTOR_FILE\nfr"
  },
  {
    "path": "core/st_utils/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "core/st_utils/download_video_section.py",
    "chars": 3298,
    "preview": "import os\nimport re\nimport shutil\nimport subprocess\nfrom time import sleep\n\nimport streamlit as st\nfrom core._1_ytdlp im"
  },
  {
    "path": "core/st_utils/imports_and_utils.py",
    "chars": 3356,
    "preview": "import os\nimport streamlit as st\nimport io, zipfile\nfrom core.st_utils.download_video_section import download_video_sect"
  },
  {
    "path": "core/st_utils/sidebar_setting.py",
    "chars": 8120,
    "preview": "import streamlit as st\nfrom translations.translations import translate as t\nfrom translations.translations import DISPLA"
  },
  {
    "path": "core/translate_lines.py",
    "chars": 5485,
    "preview": "from core.prompts import generate_shared_prompt, get_prompt_faithfulness, get_prompt_expressiveness\nfrom rich.panel impo"
  },
  {
    "path": "core/tts_backend/_302_f5tts.py",
    "chars": 5347,
    "preview": "import http.client\nimport json\nimport os\nimport requests\nfrom pydub import AudioSegment\nfrom core.asr_backend.audio_prep"
  },
  {
    "path": "core/tts_backend/azure_tts.py",
    "chars": 799,
    "preview": "import requests\nfrom core.utils import load_key\n\ndef azure_tts(text: str, save_path: str) -> None:\n    url = \"https://ap"
  },
  {
    "path": "core/tts_backend/custom_tts.py",
    "chars": 948,
    "preview": "from pathlib import Path\n\ndef custom_tts(text, save_path):\n    \"\"\"\n    Custom TTS (Text-to-Speech) interface\n    \n    Ar"
  },
  {
    "path": "core/tts_backend/edge_tts.py",
    "chars": 970,
    "preview": "from pathlib import Path\nimport edge_tts\nfrom core.utils import *\nimport subprocess\n\n# Available voices can be listed us"
  },
  {
    "path": "core/tts_backend/estimate_duration.py",
    "chars": 6201,
    "preview": "import syllables\nfrom pypinyin import pinyin, Style\nfrom g2p_en import G2p\nfrom typing import Optional\nimport re\n\nclass "
  },
  {
    "path": "core/tts_backend/fish_tts.py",
    "chars": 1261,
    "preview": "import requests\nfrom core.utils import *\nimport json\n\n@except_handler(\"Failed to generate audio using 302.ai Fish TTS\", "
  },
  {
    "path": "core/tts_backend/gpt_sovits_tts.py",
    "chars": 7853,
    "preview": "from pathlib import Path\nimport requests\nimport os, sys\nimport subprocess\nimport socket\nimport time\nfrom core.utils impo"
  },
  {
    "path": "core/tts_backend/openai_tts.py",
    "chars": 1437,
    "preview": "from pathlib import Path\nimport requests\nimport json\nfrom core.utils import load_key, except_handler\n\nBASE_URL = \"https:"
  },
  {
    "path": "core/tts_backend/sf_cosyvoice2.py",
    "chars": 1780,
    "preview": "from openai import OpenAI\nfrom pathlib import Path\nimport base64\nfrom core.utils import *\n\ndef wav_to_base64(wav_file_pa"
  },
  {
    "path": "core/tts_backend/sf_fishtts.py",
    "chars": 9808,
    "preview": "import os\nimport time\nimport uuid\nimport base64\nimport hashlib\nimport requests\nfrom pathlib import Path\nfrom pydub impor"
  },
  {
    "path": "core/tts_backend/tts_main.py",
    "chars": 3753,
    "preview": "import os\nimport re\nfrom pydub import AudioSegment\n\nfrom core.asr_backend.audio_preprocess import get_audio_duration\nfro"
  },
  {
    "path": "core/utils/__init__.py",
    "chars": 387,
    "preview": "# use try-except to avoid error when installing\ntry:\n    from .ask_gpt import ask_gpt\n    from .decorator import except_"
  },
  {
    "path": "core/utils/ask_gpt.py",
    "chars": 3342,
    "preview": "import os\nimport json\nfrom threading import Lock\nimport json_repair\nfrom openai import OpenAI\nfrom core.utils.config_uti"
  },
  {
    "path": "core/utils/config_utils.py",
    "chars": 1648,
    "preview": "from ruamel.yaml import YAML\nimport threading\n\nCONFIG_PATH = 'config.yaml'\nlock = threading.Lock()\n\nyaml = YAML()\nyaml.p"
  },
  {
    "path": "core/utils/decorator.py",
    "chars": 1570,
    "preview": "import functools\nimport time\nimport os\nfrom rich import print as rprint\n\n# ------------------------------\n# retry decora"
  },
  {
    "path": "core/utils/delete_retry_dubbing.py",
    "chars": 948,
    "preview": "import os\nimport shutil\n\ndef delete_dubbing_files():\n    files_to_delete = [\n        os.path.join(\"output\", \"dub.wav\"),\n"
  },
  {
    "path": "core/utils/models.py",
    "chars": 1439,
    "preview": "# ------------------------------------------\n# 定义中间产出文件\n# ------------------------------------------\n\n_2_CLEANED_CHUNKS "
  },
  {
    "path": "core/utils/onekeycleanup.py",
    "chars": 2673,
    "preview": "import os\nimport glob\nfrom core._1_ytdlp import find_video_files\nimport shutil\n\ndef cleanup(history_dir=\"history\"):\n    "
  },
  {
    "path": "core/utils/pypi_autochoose.py",
    "chars": 3754,
    "preview": "import subprocess\nimport time\nimport requests\nimport os\nimport concurrent.futures\nfrom rich.console import Console\nfrom "
  },
  {
    "path": "docs/.gitignore",
    "chars": 234,
    "preview": ".DS_Store\n.next/\nnode_modules/\n*.log\ndist/\n.turbo/\nout/\n# Theme styles\npackages/nextra-theme-*/style.css\n\n# Stork relate"
  },
  {
    "path": "docs/components/landing/comments.tsx",
    "chars": 960,
    "preview": "import { Card, CardContent } from '@/components/ui/card'\n\ntype Comment = {\n\tcontent: string\n\tauthor: string\n\ttitle: stri"
  },
  {
    "path": "docs/components/landing/faq.tsx",
    "chars": 932,
    "preview": "import { Accordion, AccordionContent, AccordionItem, AccordionTrigger } from '@/components/ui/accordion'\n\nexport interfa"
  },
  {
    "path": "docs/components/landing/features.tsx",
    "chars": 1393,
    "preview": "import { Card, CardContent, CardHeader, CardTitle } from '@/components/ui/card'\nimport { CheckCircle, ArrowRight } from "
  },
  {
    "path": "docs/components/landing/github-stats.tsx",
    "chars": 2564,
    "preview": "import { Tooltip, TooltipContent, TooltipProvider, TooltipTrigger } from '@/components/ui/tooltip'\nimport { Button } fro"
  },
  {
    "path": "docs/components/landing/hero.tsx",
    "chars": 1430,
    "preview": "import Link from 'next/link'\nimport { Button } from '@/components/ui/button'\nimport HeroVideoDialog from \"@/components/u"
  },
  {
    "path": "docs/components/landing/index.tsx",
    "chars": 982,
    "preview": "import Hero from '@/components/landing/hero'\nimport Features from '@/components/landing/features'\nimport Comments from '"
  },
  {
    "path": "docs/components/ui/accordion.tsx",
    "chars": 2023,
    "preview": "// @ts-nocheck\nimport * as React from \"react\"\nimport * as AccordionPrimitive from \"@radix-ui/react-accordion\"\nimport { C"
  },
  {
    "path": "docs/components/ui/badge.tsx",
    "chars": 1140,
    "preview": "import * as React from \"react\"\nimport { cva, type VariantProps } from \"class-variance-authority\"\n\nimport { cn } from \"@/"
  },
  {
    "path": "docs/components/ui/button.tsx",
    "chars": 1799,
    "preview": "import * as React from \"react\"\nimport { Slot } from \"@radix-ui/react-slot\"\nimport { cva, type VariantProps } from \"class"
  },
  {
    "path": "docs/components/ui/card.tsx",
    "chars": 1847,
    "preview": "import * as React from \"react\"\n\nimport { cn } from \"@/lib/utils\"\n\nconst Card = React.forwardRef<\n  HTMLDivElement,\n  Rea"
  },
  {
    "path": "docs/components/ui/hero-video-dialog.tsx",
    "chars": 4628,
    "preview": "\"use client\";\n\nimport { useState } from \"react\";\nimport { AnimatePresence, motion } from \"framer-motion\";\nimport { Play,"
  },
  {
    "path": "docs/components/ui/rainbow-button.tsx",
    "chars": 1700,
    "preview": "import React from \"react\";\n\nimport { cn } from \"@/lib/utils\";\n\ninterface RainbowButtonProps\n  extends React.ButtonHTMLAt"
  },
  {
    "path": "docs/components/ui/tooltip.tsx",
    "chars": 1145,
    "preview": "import * as React from \"react\"\nimport * as TooltipPrimitive from \"@radix-ui/react-tooltip\"\n\nimport { cn } from \"@/lib/ut"
  },
  {
    "path": "docs/components.json",
    "chars": 412,
    "preview": "{\n  \"$schema\": \"https://ui.shadcn.com/schema.json\",\n  \"style\": \"default\",\n  \"rsc\": false,\n  \"tsx\": true,\n  \"tailwind\": {"
  },
  {
    "path": "docs/lib/utils.ts",
    "chars": 166,
    "preview": "import { type ClassValue, clsx } from \"clsx\"\nimport { twMerge } from \"tailwind-merge\"\n\nexport function cn(...inputs: Cla"
  },
  {
    "path": "docs/middleware.js",
    "chars": 56,
    "preview": "export { locales as middleware } from 'nextra/locales'\n\n"
  },
  {
    "path": "docs/next-env.d.ts",
    "chars": 230,
    "preview": "/// <reference types=\"next\" />\n/// <reference types=\"next/image-types/global\" />\n\n// NOTE: This file should not be edite"
  },
  {
    "path": "docs/next.config.js",
    "chars": 462,
    "preview": "const withNextra = require('nextra')({\n    theme: 'nextra-theme-docs',\n    themeConfig: './theme.config.jsx',\n})\n\nconst "
  },
  {
    "path": "docs/package.json",
    "chars": 961,
    "preview": "{\n  \"name\": \"docs\",\n  \"version\": \"1.0.0\",\n  \"description\": \"\",\n  \"main\": \"index.js\",\n  \"scripts\": {\n    \"dev\": \"next\",\n "
  },
  {
    "path": "docs/pages/_app.mdx",
    "chars": 2198,
    "preview": "import './globals.css'\nimport { GoogleAnalytics } from '@next/third-parties/google'\n\nexport default function App({ Compo"
  },
  {
    "path": "docs/pages/_meta.en-US.json",
    "chars": 223,
    "preview": "{\n    \"index\": {\n      \"type\": \"page\",\n      \"title\": \"VideoLingo\",\n      \"display\": \"hidden\",\n      \"theme\": {\n        "
  },
  {
    "path": "docs/pages/_meta.ja.json",
    "chars": 221,
    "preview": "{\n    \"index\": {\n      \"type\": \"page\",\n      \"title\": \"VideoLingo\",\n      \"display\": \"hidden\",\n      \"theme\": {\n        "
  },
  {
    "path": "docs/pages/_meta.zh-CN.json",
    "chars": 216,
    "preview": "{\n    \"index\": {\n      \"type\": \"page\",\n      \"title\": \"VideoLingo\",\n      \"display\": \"hidden\",\n      \"theme\": {\n        "
  },
  {
    "path": "docs/pages/docs/_meta.en-US.json",
    "chars": 212,
    "preview": "{ \n    \"introduction\": {\n      \"title\": \"Introduction\"\n    },\n    \"start\": {\n      \"title\": \"Start\"\n    },\n    \"tech\": {"
  },
  {
    "path": "docs/pages/docs/_meta.ja.json",
    "chars": 139,
    "preview": "{\n    \"introduction\": {\n      \"title\": \"紹介\"\n    },\n    \"start\": {\n      \"title\": \"スタート\"\n    },\n    \"tech\": {\n      \"titl"
  },
  {
    "path": "docs/pages/docs/_meta.zh-CN.json",
    "chars": 190,
    "preview": "{\n    \"introduction\": {\n      \"title\": \"介绍\"\n    },\n    \"start\": {\n      \"title\": \"使用文档\"\n    },\n    \"tech\": {\n      \"titl"
  },
  {
    "path": "docs/pages/docs/docker.en-US.md",
    "chars": 2724,
    "preview": "# Docker Installation\n\nVideoLingo provides a Dockerfile that you can use to build the current VideoLingo package. Here a"
  },
  {
    "path": "docs/pages/docs/docker.zh-CN.md",
    "chars": 1516,
    "preview": "# Docker安装\n\nVideoLingo 提供了Dockerfile,可自行使用Dockerfile打包目前VideoLingo。以下是打包和运行的详细说明:\n\n## 系统要求\n\n- CUDA版本 > 12.4\n- NVIDIA Dri"
  },
  {
    "path": "docs/pages/docs/introduction.en-US.md",
    "chars": 5920,
    "preview": "# VideoLingo: Connecting the World, Frame by Frame\n\n## 🌟 Overview ([Try VideoLingo Now!](https://videolingo.io))\n\nVideoL"
  },
  {
    "path": "docs/pages/docs/introduction.zh-CN.md",
    "chars": 3562,
    "preview": "# VideoLingo: 连接世界的每一帧\n\n**QQ 群：875297969**\n\n## 🌟 简介（[在线体验！](https://videolingo.io)）\n\nVideoLingo 是一站式视频翻译本地化配音工具，能够一键生成 N"
  },
  {
    "path": "docs/pages/docs/start.en-US.md",
    "chars": 12715,
    "preview": "# 🚀 Getting Started\n\n## 📋 API Configuration\nVideoLingo requires an LLM and TTS(optional). For the best quality, use clau"
  },
  {
    "path": "docs/pages/docs/start.zh-CN.md",
    "chars": 8441,
    "preview": "# 🚀 开始使用\n\n## 📋 API 配置指南\n本项目需使用大模型和 TTS。追求最佳质量请使用 claude-3-5-sonnet-20240620 与 Azure TTS。也可以选择完全本地化体验，使用 Ollama 作为大模型，Edg"
  },
  {
    "path": "docs/pages/docs/tech.en-US.md",
    "chars": 16187,
    "preview": "## Videolingo Video Translation System Technical Documentation\n\nVideolingo is a highly integrated video translation syst"
  },
  {
    "path": "docs/pages/docs/tech.zh-CN.md",
    "chars": 8040,
    "preview": "## Videolingo 视频翻译系统技术文档\n\nVideolingo 是一个高度集成的视频翻译系统，能够自动执行一系列复杂的操作，包括视频下载、音频提取、语音识别、文本处理、翻译、字幕生成、文本到语音合成以及音视频合成。该系统利用 AI"
  },
  {
    "path": "docs/pages/globals.css",
    "chars": 2067,
    "preview": "@tailwind base;\n@tailwind components;\n@tailwind utilities;\n\n@layer base {\n\t:root {\n\t\t--background: 0 0% 100%;\n\t\t--foregr"
  },
  {
    "path": "docs/pages/index.en-US.mdx",
    "chars": 4711,
    "preview": "---\ntitle: VideoLingo\n---\n\nimport Landing from '@/components/landing'\n\nexport const getStaticProps = ({ params }) => {\n "
  },
  {
    "path": "docs/pages/index.ja.mdx",
    "chars": 3562,
    "preview": "---\ntitle: VideoLingo\n---\n\nimport Landing from '@/components/landing'\n\nexport const getStaticProps = ({ params }) => {\n "
  },
  {
    "path": "docs/pages/index.zh-CN.mdx",
    "chars": 3326,
    "preview": "---\ntitle: VideoLingo\n---\n\nimport Landing from '@/components/landing'\n\nexport const getStaticProps = async ({ params }) "
  },
  {
    "path": "docs/postcss.config.js",
    "chars": 82,
    "preview": "module.exports = {\n  plugins: {\n    tailwindcss: {},\n    autoprefixer: {},\n  },\n}\n"
  },
  {
    "path": "docs/public/site.webmanifest",
    "chars": 436,
    "preview": "{\n  \"name\": \"MyWebSite\",\n  \"short_name\": \"MySite\",\n  \"icons\": [\n    {\n      \"src\": \"/web-app-manifest-192x192.png\",\n    "
  },
  {
    "path": "docs/tailwind.config.js",
    "chars": 2755,
    "preview": "/** @type {import('tailwindcss').Config} */\nmodule.exports = {\n  darkMode: [\"class\"],\n  content: [\n    './pages/**/*.{js"
  },
  {
    "path": "docs/theme.config.jsx",
    "chars": 1369,
    "preview": "import { useRouter } from 'next/router'\nimport { useConfig } from 'nextra-theme-docs'\n\nconst title = 'VideoLingo'\n\nexpor"
  },
  {
    "path": "docs/tsconfig.json",
    "chars": 887,
    "preview": "{\n    \"compilerOptions\": {\n      \"target\": \"es5\",\n      \"lib\": [\"dom\", \"dom.iterable\", \"esnext\"],\n      \"allowJs\": true,"
  },
  {
    "path": "install.py",
    "chars": 11237,
    "preview": "import os, sys\nimport platform\nimport subprocess\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\n\nascii_logo"
  },
  {
    "path": "launch.py",
    "chars": 2949,
    "preview": "\"\"\"VideoLingo Enhanced Launcher - Pre-flight checks + logging.\"\"\"\nimport subprocess, sys, os, shutil, socket\nfrom pathli"
  },
  {
    "path": "requirements.txt",
    "chars": 676,
    "preview": "librosa==0.11.0\npytorch-lightning==2.6.1\nlightning==2.6.1\ntransformers>=4.48.0\nmoviepy==1.0.3\nnumpy>=2.0.2\nopenai>=1.55."
  },
  {
    "path": "setup.py",
    "chars": 316,
    "preview": "from setuptools import setup, find_packages\n\nNAME = 'VideoLingo'\nVERSION = '3.0.0'\n\nwith open('requirements.txt', encodi"
  },
  {
    "path": "st.py",
    "chars": 5248,
    "preview": "import streamlit as st\nimport os, sys\nfrom core.st_utils.imports_and_utils import *\nfrom core import *\n\n# SET PATH\ncurre"
  },
  {
    "path": "translations/README.es.md",
    "chars": 7632,
    "preview": "<div align=\"center\">\n\n<img src=\"/docs/logo.png\" alt=\"VideoLingo Logo\" height=\"140\">\n\n# Conectando el Mundo, Cuadro por C"
  },
  {
    "path": "translations/README.fr.md",
    "chars": 7871,
    "preview": "<div align=\"center\">\n\n<img src=\"/docs/logo.png\" alt=\"VideoLingo Logo\" height=\"140\">\n\n# Connecter le Monde, Image par Ima"
  },
  {
    "path": "translations/README.ja.md",
    "chars": 5180,
    "preview": "<div align=\"center\">\n\n<img src=\"/docs/logo.png\" alt=\"VideoLingo Logo\" height=\"140\">\n\n# フレームごとに世界をつなぐ\n\n<a href=\"https://t"
  },
  {
    "path": "translations/README.ru.md",
    "chars": 5727,
    "preview": "<div align=\"center\">\n\n<img src=\"/docs/logo.png\" alt=\"VideoLingo Logo\" height=\"140\">\n\n# Объединяя Мир, Кадр за Кадром\n\n<a"
  },
  {
    "path": "translations/README.zh-TW.md",
    "chars": 4634,
    "preview": "<div align=\"center\">\n\n<img src=\"/docs/logo.png\" alt=\"VideoLingo Logo\" height=\"140\">\n\n# 連結世界，逐格前行\n\n<a href=\"https://trend"
  },
  {
    "path": "translations/README.zh.md",
    "chars": 4689,
    "preview": "<div align=\"center\">\n\n<img src=\"/docs/logo.png\" alt=\"VideoLingo Logo\" height=\"140\">\n\n# 连接世界每一帧\n\n<a href=\"https://trendsh"
  },
  {
    "path": "translations/en.json",
    "chars": 8925,
    "preview": "{\n    \"a. Download or Upload Video\": \"a. Download or Upload Video\",\n    \"Delete and Reselect\": \"Delete and Reselect\",\n  "
  },
  {
    "path": "translations/es.json",
    "chars": 9540,
    "preview": "{\n    \"a. Download or Upload Video\": \"a. Descargar o subir video\",\n    \"Delete and Reselect\": \"Eliminar y volver a selec"
  },
  {
    "path": "translations/fr.json",
    "chars": 9699,
    "preview": "{\n    \"a. Download or Upload Video\": \"a. Télécharger ou importer une vidéo\",\n    \"Delete and Reselect\": \"Supprimer et re"
  },
  {
    "path": "translations/ja.json",
    "chars": 7315,
    "preview": "{\n    \"a. Download or Upload Video\": \"a. 動画のダウンロードまたはアップロード\",\n    \"Delete and Reselect\": \"削除して再選択\",\n    \"Enter YouTube l"
  },
  {
    "path": "translations/ru.json",
    "chars": 9170,
    "preview": "{\n    \"a. Download or Upload Video\": \"a. Скачать или загрузить видео\",\n    \"Delete and Reselect\": \"Удалить и выбрать зан"
  },
  {
    "path": "translations/translations.py",
    "chars": 889,
    "preview": "import json\n\nDISPLAY_LANGUAGES = {\n    \"🇬🇧 English\": \"en\",\n    \"🇨🇳 简体中文\": \"zh-CN\",\n    \"🇭🇰 繁体中文\": \"zh-HK\",\n    \"🇯🇵 日本語\":"
  },
  {
    "path": "translations/zh-CN.json",
    "chars": 6846,
    "preview": "{\n    \"a. Download or Upload Video\": \"a. 下载或上传视频\",\n    \"Delete and Reselect\": \"删除并重新选择\",\n    \"Enter YouTube link:\": \"输入Y"
  },
  {
    "path": "translations/zh-HK.json",
    "chars": 6850,
    "preview": "{\n    \"a. Download or Upload Video\": \"a. 下載或上傳影片\",\n    \"Delete and Reselect\": \"刪除並重新選擇\",\n    \"Enter YouTube link:\": \"輸入Y"
  }
]

// ... and 1 more files (download for full content)

About this extraction

This page contains the full source code of the Huanshere/VideoLingo GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 132 files (541.1 KB), approximately 165.2k tokens, and a symbol index with 205 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo