Showing preview only (885K chars total). Download the full file or copy to clipboard to get everything.
Repository: linyqh/NarratoAI
Branch: main
Commit: a6f2e0d815c4
Files: 118
Total size: 844.1 KB
Directory structure:
gitextract_1wpinv12/
├── .dockerignore
├── .github/
│ ├── pull_request_template.md
│ ├── release-drafter.yml
│ └── workflows/
│ ├── auto-release-generator.yml
│ ├── codeReview.yml
│ └── discord-release-notification.yml
├── .gitignore
├── Dockerfile
├── LICENSE
├── Makefile
├── README-en.md
├── README.md
├── app/
│ ├── __init__.py
│ ├── config/
│ │ ├── __init__.py
│ │ ├── audio_config.py
│ │ ├── config.py
│ │ └── ffmpeg_config.py
│ ├── models/
│ │ ├── __init__.py
│ │ ├── const.py
│ │ ├── exception.py
│ │ └── schema.py
│ ├── services/
│ │ ├── SDE/
│ │ │ └── short_drama_explanation.py
│ │ ├── SDP/
│ │ │ ├── generate_script_short.py
│ │ │ └── utils/
│ │ │ ├── short_schema.py
│ │ │ ├── step1_subtitle_analyzer_openai.py
│ │ │ ├── step5_merge_script.py
│ │ │ └── utils.py
│ │ ├── __init__.py
│ │ ├── audio_merger.py
│ │ ├── audio_normalizer.py
│ │ ├── clip_video.py
│ │ ├── generate_narration_script.py
│ │ ├── generate_video.py
│ │ ├── llm/
│ │ │ ├── __init__.py
│ │ │ ├── base.py
│ │ │ ├── config_validator.py
│ │ │ ├── exceptions.py
│ │ │ ├── litellm_provider.py
│ │ │ ├── manager.py
│ │ │ ├── migration_adapter.py
│ │ │ ├── providers/
│ │ │ │ └── __init__.py
│ │ │ ├── test_litellm_integration.py
│ │ │ ├── test_llm_service.py
│ │ │ ├── unified_service.py
│ │ │ └── validators.py
│ │ ├── llm.py
│ │ ├── material.py
│ │ ├── merger_video.py
│ │ ├── prompts/
│ │ │ ├── __init__.py
│ │ │ ├── base.py
│ │ │ ├── documentary/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── frame_analysis.py
│ │ │ │ └── narration_generation.py
│ │ │ ├── exceptions.py
│ │ │ ├── manager.py
│ │ │ ├── registry.py
│ │ │ ├── short_drama_editing/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── plot_extraction.py
│ │ │ │ └── subtitle_analysis.py
│ │ │ ├── short_drama_narration/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── plot_analysis.py
│ │ │ │ └── script_generation.py
│ │ │ ├── template.py
│ │ │ └── validators.py
│ │ ├── script_service.py
│ │ ├── state.py
│ │ ├── subtitle.py
│ │ ├── subtitle_merger.py
│ │ ├── subtitle_text.py
│ │ ├── task.py
│ │ ├── update_script.py
│ │ ├── upload_validation.py
│ │ ├── video.py
│ │ ├── video_service.py
│ │ ├── voice.py
│ │ └── youtube_service.py
│ └── utils/
│ ├── check_script.py
│ ├── ffmpeg_utils.py
│ ├── gemini_analyzer.py
│ ├── gemini_openai_analyzer.py
│ ├── qwenvl_analyzer.py
│ ├── script_generator.py
│ ├── utils.py
│ └── video_processor.py
├── config.example.toml
├── docker-compose.yml
├── docker-deploy.sh
├── docker-entrypoint.sh
├── docs/
│ └── voice-list.txt
├── project_version
├── requirements.txt
├── resource/
│ ├── fonts/
│ │ └── fonts_in_here.txt
│ ├── public/
│ │ └── index.html
│ ├── scripts/
│ │ └── script_in_here.txt
│ ├── songs/
│ │ └── song_in_here.txt
│ ├── srt/
│ │ └── srt_in_here.txt
│ └── videos/
│ └── video_in_here.txt
├── webui/
│ ├── __init__.py
│ ├── components/
│ │ ├── __init__.py
│ │ ├── audio_settings.py
│ │ ├── basic_settings.py
│ │ ├── ffmpeg_diagnostics.py
│ │ ├── script_settings.py
│ │ ├── subtitle_settings.py
│ │ ├── system_settings.py
│ │ └── video_settings.py
│ ├── config/
│ │ └── settings.py
│ ├── i18n/
│ │ ├── __init__.py
│ │ ├── en.json
│ │ └── zh.json
│ ├── tools/
│ │ ├── base.py
│ │ ├── generate_script_docu.py
│ │ ├── generate_script_short.py
│ │ └── generate_short_summary.py
│ └── utils/
│ ├── cache.py
│ ├── file_utils.py
│ └── vision_analyzer.py
└── webui.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .dockerignore
================================================
# Git 相关
.git/
.gitignore
.gitattributes
.svn/
# Python 相关
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST
# 虚拟环境
.env
.env.*
.venv
venv/
ENV/
env.bak/
venv.bak/
# IDE 相关
.vscode/
.idea/
*.swp
*.swo
*~
# 操作系统相关
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db
# 日志和数据库文件
*.log
*.db
logs/
# 临时文件
*.tmp
*.temp
temp/
tmp/
# 存储目录(运行时生成的内容)
storage/temp/
storage/tasks/
storage/demo.py
# 缓存目录
.cache/
.pytest_cache/
# 文档(保留必要的)
docs/
*.md
!README.md
# Docker 相关文件(避免递归复制)
Dockerfile.*
docker-compose.*.yml
# 配置文件(使用示例配置)
config.toml
# 资源文件中的大文件
resource/videos/
resource/songs/
# 测试文件
tests/
test_*
*_test.py
# 其他不必要的文件
*.bak
*.orig
*.rej
================================================
FILE: .github/pull_request_template.md
================================================
## PR 类型
请选择一个适当的标签(必选其一):
- [ ] 破坏性变更 (breaking)
- [ ] 安全修复 (security)
- [ ] 新功能 (feature)
- [ ] Bug修复 (bug)
- [ ] 代码重构 (refactor)
- [ ] 依赖升级 (upgrade)
- [ ] 文档更新 (docs)
- [ ] 翻译相关 (lang-all)
- [ ] 内部改进 (internal)
## 描述
<!-- 请提供对此次更改的清晰描述。为什么需要这个更改?它解决了什么问题? -->
## 相关 Issue
<!-- 请链接相关的 issue(如果有)。例如:Fixes #123 -->
## 更改内容
<!-- 详细描述具体更改了什么 -->
- xxx
- xxx
- xxx
## 测试
<!-- 描述如何测试你的更改 -->
- [ ] 单元测试
- [ ] 集成测试
- [ ] 手动测试
## 截图(如果适用)
<!-- 如果是UI相关的更改,请提供截图 -->
## 检查清单
- [ ] 我的代码遵循项目的代码风格
- [ ] 我已经添加了必要的测试
- [ ] 我已经更新了相关文档
- [ ] 我的更改不会引入新的警告
- [ ] PR 标题清晰描述了更改内容
## 补充说明
<!-- 任何其他相关信息 -->
================================================
FILE: .github/release-drafter.yml
================================================
name-template: 'v$RESOLVED_VERSION'
tag-template: 'v$RESOLVED_VERSION'
categories:
- title: '🚀 新功能'
labels:
- 'feature'
- 'enhancement'
- title: '🐛 Bug 修复'
labels:
- 'fix'
- 'bug'
- title: '🧰 维护'
labels:
- 'chore'
- 'maintenance'
- title: '📚 文档'
labels:
- 'docs'
- 'documentation'
change-template: '- $TITLE @$AUTHOR (#$NUMBER)'
version-resolver:
major:
labels:
- 'major'
- 'breaking'
minor:
labels:
- 'minor'
- 'feature'
patch:
labels:
- 'patch'
- 'fix'
- 'bug'
- 'maintenance'
default: patch
template: |
## 更新内容
$CHANGES
## 贡献者
$CONTRIBUTORS
================================================
FILE: .github/workflows/auto-release-generator.yml
================================================
name: Auto Release Generator
on:
push:
branches:
- main
paths:
- 'project_version' # 确保路径准确,不使用通配符
jobs:
check-version-and-release:
runs-on: ubuntu-latest
permissions:
contents: write # 用于创建 releases
pull-requests: write # 可能需要的额外权限
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0 # 获取完整历史以检查变更
- name: Debug Environment
run: |
echo "工作目录内容:"
ls -la
echo "project_version 文件内容:"
cat project_version || echo "文件不存在"
- name: Check if version changed
id: check-version
run: |
# 获取当前版本号
if [ -f "project_version" ]; then
CURRENT_VERSION=$(cat project_version)
echo "Current version: $CURRENT_VERSION"
# 获取上一个提交中的版本号
git fetch origin main
if git show HEAD~1:project_version &>/dev/null; then
PREVIOUS_VERSION=$(git show HEAD~1:project_version)
echo "Previous version from commit: $PREVIOUS_VERSION"
if [[ "$CURRENT_VERSION" != "$PREVIOUS_VERSION" ]]; then
echo "Version changed from $PREVIOUS_VERSION to $CURRENT_VERSION"
echo "version_changed=true" >> $GITHUB_OUTPUT
echo "current_version=$CURRENT_VERSION" >> $GITHUB_OUTPUT
else
echo "Version unchanged"
echo "version_changed=false" >> $GITHUB_OUTPUT
fi
else
echo "Cannot find previous version, assuming first release"
echo "version_changed=true" >> $GITHUB_OUTPUT
echo "current_version=$CURRENT_VERSION" >> $GITHUB_OUTPUT
fi
else
echo "project_version file not found"
echo "version_changed=false" >> $GITHUB_OUTPUT
fi
- name: Set up Python
if: steps.check-version.outputs.version_changed == 'true'
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install OpenAI SDK
if: steps.check-version.outputs.version_changed == 'true'
run: pip install openai
- name: Get commits since last release
if: steps.check-version.outputs.version_changed == 'true'
id: get-commits
run: |
# 直接获取最近10个提交
echo "Getting last 13 commits"
COMMITS=$(git log -13 --pretty=format:"%s")
echo "Commits to be included in release notes:"
echo "$COMMITS"
echo "commits<<EOF" >> $GITHUB_OUTPUT
echo "$COMMITS" >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
- name: Generate release notes with AI
if: steps.check-version.outputs.version_changed == 'true'
id: generate-notes
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENAI_BASE_URL: https://api.siliconflow.cn/v1
CURRENT_VERSION: ${{ steps.check-version.outputs.current_version }}
run: |
cat > generate_release_notes.py << 'EOF'
import os
import sys
from openai import OpenAI
# 设置OpenAI客户端
client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY"),
base_url=os.environ.get("OPENAI_BASE_URL")
)
# 获取提交信息和版本号
commits = sys.stdin.read()
version = os.environ.get("CURRENT_VERSION")
# 调用API生成发布说明
try:
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3",
messages=[
{"role": "system", "content": "你是一个专业的软件发布说明生成助手。请根据提供的git提交信息,生成一个结构化的发布说明,包括新功能、改进、修复的bug等类别。使用中文回复。"},
{"role": "user", "content": f"请根据以下git提交信息,生成一个版本{version}的发布说明,内容详细且完整,相似的提交信息不要重复出现: \n\n{commits}"}
],
temperature=0.7,
)
release_notes = response.choices[0].message.content
print(f"commits: \n{commits}")
print(f"大模型总结的发布说明: \n{release_notes}")
except Exception as e:
print(f"Error calling OpenAI API: {e}")
release_notes = f"# 版本 {version} 发布\n\n## 更新内容\n\n"
# 简单处理提交信息
for line in commits.strip().split("\n"):
if line:
release_notes += f"- {line}\n"
# 输出生成的发布说明
print(release_notes)
# 保存到GitHub输出
with open(os.environ.get("GITHUB_OUTPUT"), "a") as f:
f.write("release_notes<<RELEASE_NOTES_EOF\n")
f.write(release_notes)
f.write("\nRELEASE_NOTES_EOF\n")
EOF
python generate_release_notes.py < <(echo "${{ steps.get-commits.outputs.commits }}")
- name: Debug release notes
if: steps.check-version.outputs.version_changed == 'true'
run: |
echo "Generated release notes:"
echo "${{ steps.generate-notes.outputs.release_notes }}"
- name: Create GitHub Release
if: steps.check-version.outputs.version_changed == 'true'
uses: softprops/action-gh-release@v1
with:
tag_name: v${{ steps.check-version.outputs.current_version }}
name: v${{ steps.check-version.outputs.current_version }}
body: ${{ steps.generate-notes.outputs.release_notes }}
draft: false
prerelease: false
token: ${{ secrets.GIT_TOKEN }}
================================================
FILE: .github/workflows/codeReview.yml
================================================
name: Code Review
permissions:
contents: read
pull-requests: write
on:
# 在提合并请求的时候触发
pull_request:
types: [opened, reopened]
workflow_dispatch:
jobs:
codeReview:
runs-on: ubuntu-latest
steps:
- name: GPT代码逻辑检查
uses: anc95/ChatGPT-CodeReview@main
env:
GITHUB_TOKEN: ${{ secrets.GIT_TOKEN }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENAI_API_ENDPOINT: https://api.siliconflow.cn/v1
MODEL: deepseek-ai/DeepSeek-V3
LANGUAGE: Chinese
================================================
FILE: .github/workflows/discord-release-notification.yml
================================================
name: Discord Release Notification
on:
release:
types: [published]
jobs:
notify-discord:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: pip install openai discord-webhook requests
- name: Enhance release notes and send to Discord
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENAI_BASE_URL: https://api.siliconflow.cn/v1
DISCORD_WEBHOOK_URL: ${{ secrets.DISCORD_WEBHOOK_URL }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
cat > send_discord_notification.py << 'EOF'
import os
import sys
import json
from openai import OpenAI
import requests
from datetime import datetime
from discord_webhook import DiscordWebhook, DiscordEmbed
# 设置OpenAI客户端
client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY"),
base_url=os.environ.get("OPENAI_BASE_URL")
)
# 获取GitHub release信息
github_token = os.environ.get("GITHUB_TOKEN")
repo = os.environ.get("GITHUB_REPOSITORY")
# 直接从GitHub API获取最新release
headers = {"Authorization": f"token {github_token}"}
response = requests.get(f"https://api.github.com/repos/{repo}/releases/latest", headers=headers)
if response.status_code != 200:
print(f"Error fetching release info: {response.status_code}")
print(response.text)
sys.exit(1)
release_info = response.json()
# 提取需要的信息
release_notes = release_info.get("body", "无发布说明")
version = release_info.get("tag_name", "未知版本")
# 安全地解析发布日期
published_at = release_info.get("published_at")
if published_at:
try:
release_date = datetime.strptime(published_at, "%Y-%m-%dT%H:%M:%SZ").strftime("%Y年%m月%d日")
except ValueError:
release_date = "未知日期"
else:
release_date = "未知日期"
# 使用大模型润色发布说明
try:
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3",
messages=[
{"role": "system", "content": "你是一个专业的软件发布公告优化助手。请优化以下发布说明,使其更加生动、专业,并明确区分新功能、优化内容、修复内容和移除内容等类别。保持原有信息的完整性,同时增强可读性和专业性。使用中文回复。\n\n重要:Discord不支持复杂的Markdown格式,因此请使用简单的格式化:\n1. 使用**粗体**和*斜体*而不是Markdown标题\n2. 使用简单的列表符号(•)而不是Markdown列表\n3. 避免使用#、##等标题格式\n4. 不要使用表格、代码块等复杂格式\n5. 确保段落之间有空行\n6. 使用简单的分隔符(如 ------)来分隔不同部分"},
{"role": "user", "content": f"请优化以下版本{version}的发布说明,使其更适合在Discord社区发布。请记住Discord不支持复杂的Markdown格式,所以使用简单的格式化方式:\n\n{release_notes}"}
],
temperature=0.7,
)
enhanced_notes = response.choices[0].message.content
print(f"大模型润色后的发布说明: \n{enhanced_notes}")
except Exception as e:
print(f"Error calling OpenAI API: {e}")
enhanced_notes = release_notes # 如果API调用失败,使用原始发布说明
# 创建Discord消息
webhook_url = os.environ.get("DISCORD_WEBHOOK_URL")
if not webhook_url:
print("Error: DISCORD_WEBHOOK_URL not set")
sys.exit(1)
webhook = DiscordWebhook(url=webhook_url)
# 创建嵌入式消息
embed = DiscordEmbed(
title=f"🚀 NarratoAI {version} 发布公告",
description=f"发布日期: {release_date}",
color="5865F2" # Discord蓝色
)
# 处理发布说明,确保不超过Discord的字段限制
# Discord字段值限制为1024个字符
MAX_FIELD_LENGTH = 1024
# 如果内容很短,直接添加
if enhanced_notes and len(enhanced_notes) <= MAX_FIELD_LENGTH:
embed.add_embed_field(name="📋 更新内容", value=enhanced_notes)
elif enhanced_notes:
# 尝试按段落或明显的分隔符分割内容
sections = []
# 检查是否有明显的新功能、优化、修复等部分
if "**新增功能**" in enhanced_notes or "**新功能**" in enhanced_notes:
parts = enhanced_notes.split("**新增功能**", 1)
if len(parts) > 1:
intro = parts[0].strip()
if intro:
sections.append(("📋 更新概述", intro))
rest = "**新增功能**" + parts[1]
# 进一步分割剩余部分
feature_end = -1
for marker in ["**优化内容**", "**性能优化**", "**修复内容**", "**bug修复**", "**问题修复**"]:
pos = rest.lower().find(marker.lower())
if pos != -1 and (feature_end == -1 or pos < feature_end):
feature_end = pos
if feature_end != -1:
sections.append(("✨ 新增功能", rest[:feature_end].strip()))
rest = rest[feature_end:]
else:
sections.append(("✨ 新增功能", rest.strip()))
rest = ""
# 继续分割剩余部分
if rest:
optimize_end = -1
for marker in ["**修复内容**", "**bug修复**", "**问题修复**"]:
pos = rest.lower().find(marker.lower())
if pos != -1 and (optimize_end == -1 or pos < optimize_end):
optimize_end = pos
if optimize_end != -1:
sections.append(("⚡ 优化内容", rest[:optimize_end].strip()))
sections.append(("🔧 修复内容", rest[optimize_end:].strip()))
else:
sections.append(("⚡ 优化内容", rest.strip()))
else:
# 如果没有明显的结构,按长度分割
chunks = [enhanced_notes[i:i+MAX_FIELD_LENGTH] for i in range(0, len(enhanced_notes), MAX_FIELD_LENGTH)]
for i, chunk in enumerate(chunks):
if i == 0:
sections.append(("📋 更新内容", chunk))
else:
sections.append((f"📋 更新内容(续{i})", chunk))
# 添加所有部分到embed
for name, content in sections:
if len(content) > MAX_FIELD_LENGTH:
# 如果单个部分仍然过长,进一步分割
sub_chunks = [content[i:i+MAX_FIELD_LENGTH] for i in range(0, len(content), MAX_FIELD_LENGTH)]
for i, chunk in enumerate(sub_chunks):
if i == 0:
embed.add_embed_field(name=name, value=chunk)
else:
embed.add_embed_field(name=f"{name}(续{i})", value=chunk)
else:
embed.add_embed_field(name=name, value=content)
else:
embed.add_embed_field(name="📋 更新内容", value="无详细更新内容")
# 添加下载链接
html_url = release_info.get("html_url", "")
if html_url:
embed.add_embed_field(name="📥 下载链接", value=html_url, inline=False)
# 设置页脚
embed.set_footer(text=f"NarratoAI 团队 • {release_date}")
embed.set_timestamp()
# 添加嵌入式消息到webhook
webhook.add_embed(embed)
# 发送消息
response = webhook.execute()
if response:
print(f"Discord notification sent with status code: {response.status_code}")
else:
print("Failed to send Discord notification")
EOF
# 执行脚本
python send_discord_notification.py
================================================
FILE: .gitignore
================================================
.DS_Store
/config.toml
/storage/
/.idea/
/app/services/__pycache__
/app/__pycache__/
/app/config/__pycache__/
/app/models/__pycache__/
/app/utils/__pycache__/
/*/__pycache__/*
.vscode
/**/.streamlit
__pycache__
logs/
node_modules
# VuePress 默认临时文件目录
/sites/docs/.vuepress/.temp
# VuePress 默认缓存目录
/sites/docs/.vuepress/.cache
# VuePress 默认构建生成的静态文件目录
/sites/docs/.vuepress/dist
# 模型目录
/models/
./models/*
resource/scripts/*.json
resource/videos/*.mp4
resource/songs/*.mp3
resource/songs/*.flac
resource/fonts/*.ttc
resource/fonts/*.ttf
resource/fonts/*.otf
resource/srt/*.srt
app/models/faster-whisper-large-v2/*
app/models/faster-whisper-large-v3/*
app/models/bert/*
bug清单.md
task.md
.claude/*
.serena/*
# OpenSpec: 忽略活动的变更提案,但保留归档和规范
openspec/*
AGENTS.md
CLAUDE.md
tests/*
================================================
FILE: Dockerfile
================================================
# 多阶段构建 - 构建阶段
FROM python:3.12-slim-bookworm AS builder
# 设置构建参数
ARG DEBIAN_FRONTEND=noninteractive
# 设置工作目录
WORKDIR /build
# 安装构建依赖
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
git \
git-lfs \
pkg-config \
&& rm -rf /var/lib/apt/lists/*
# 升级 pip 并创建虚拟环境
RUN python -m pip install --upgrade pip setuptools wheel && \
python -m venv /opt/venv
# 激活虚拟环境
ENV PATH="/opt/venv/bin:$PATH"
# 复制 requirements.txt 并使用镜像安装 Python 依赖
COPY requirements.txt .
RUN pip install --no-cache-dir -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt
# 运行阶段
FROM python:3.12-slim-bookworm
# 设置运行参数
ARG DEBIAN_FRONTEND=noninteractive
# 设置工作目录
WORKDIR /NarratoAI
# 从构建阶段复制虚拟环境
COPY --from=builder /opt/venv /opt/venv
# 设置环境变量
ENV PATH="/opt/venv/bin:$PATH" \
PYTHONPATH="/NarratoAI" \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONIOENCODING=utf-8 \
LANG=C.UTF-8 \
LC_ALL=C.UTF-8
# 一次性安装所有依赖、创建用户、配置系统,减少层级
RUN apt-get update && apt-get install -y --no-install-recommends \
imagemagick \
ffmpeg \
wget \
curl \
git-lfs \
ca-certificates \
dos2unix \
&& sed -i 's/<policy domain="path" rights="none" pattern="@\*"/<policy domain="path" rights="read|write" pattern="@\*"/' /etc/ImageMagick-6/policy.xml || true \
&& git lfs install \
&& groupadd -r narratoai && useradd -r -g narratoai -d /NarratoAI -s /bin/bash narratoai \
&& rm -rf /var/lib/apt/lists/*
# 复制入口脚本并修复换行符问题
COPY --chown=narratoai:narratoai docker-entrypoint.sh /usr/local/bin/
RUN dos2unix /usr/local/bin/docker-entrypoint.sh && chmod +x /usr/local/bin/docker-entrypoint.sh
# 复制其余的应用代码
COPY --chown=narratoai:narratoai . .
# 创建目录、复制配置、设置权限
RUN mkdir -p storage/temp storage/tasks storage/json storage/narration_scripts storage/drama_analysis && \
if [ ! -f config.toml ]; then cp config.example.toml config.toml; fi && \
chown -R narratoai:narratoai /NarratoAI && \
chmod -R 755 /NarratoAI
# 切换到非 root 用户
USER narratoai
# 暴露端口
EXPOSE 8501
# 健康检查
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8501/_stcore/health || exit 1
# 设置入口点
ENTRYPOINT ["/usr/local/bin/docker-entrypoint.sh"]
CMD ["webui"]
================================================
FILE: LICENSE
================================================
Modified MIT License - Non-Commercial Use Only
Copyright (c) 2024 linyq
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to use,
copy, modify, merge, publish, and distribute the Software, subject to the
following conditions:
1. The Software is provided **for personal, educational, or research purposes only**.
2. Commercial use of the Software, including but not limited to incorporating
it into paid products, services, or platforms, is strictly prohibited
without prior written permission from the copyright holder.
3. The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
-------------------------------------------------------
中文说明(仅供参考,以英文条款为准):
修改后的 MIT 协议 - 仅限非商业用途
版权所有 (c) 2024 linyq
任何人均可免费获取本软件及相关文档的副本,并享有以下权利:
- 使用、复制、修改、合并、发布、分发本软件;
但必须遵守以下条件:
1. 本软件仅限用于 **个人、学习或科研用途**。
2. 严禁任何商业用途,包括但不限于将本软件用于收费产品、商业服务、
SaaS 平台或任何形式的营利行为。若需商业授权,必须事先获得版权所有者书面许可。
3. 上述版权声明及许可条款必须包含在本软件的所有副本或重要部分中。
免责声明:
本软件按“现状”提供,不附带任何明示或暗示的担保,包括但不限于
适销性、特定用途适用性和非侵权担保。作者或版权持有人在任何情况下
不对因使用本软件或与本软件有关的行为造成的任何损害承担责任。
================================================
FILE: Makefile
================================================
# NarratoAI Docker Makefile
.PHONY: help build up down restart logs shell clean deploy
# 默认目标
.DEFAULT_GOAL := help
# 变量定义
SERVICE_NAME := narratoai-webui
# 颜色定义
GREEN := \033[32m
YELLOW := \033[33m
BLUE := \033[34m
RESET := \033[0m
help: ## 显示帮助信息
@echo "$(GREEN)NarratoAI Docker 管理命令$(RESET)"
@echo ""
@echo "$(YELLOW)可用命令:$(RESET)"
@awk 'BEGIN {FS = ":.*?## "} /^[a-zA-Z_-]+:.*?## / {printf " $(BLUE)%-15s$(RESET) %s\n", $$1, $$2}' $(MAKEFILE_LIST)
deploy: ## 一键部署
@echo "$(GREEN)执行一键部署...$(RESET)"
./docker-deploy.sh
build: ## 构建 Docker 镜像
@echo "$(GREEN)构建 Docker 镜像...$(RESET)"
docker-compose build
up: ## 启动服务
@echo "$(GREEN)启动服务...$(RESET)"
docker-compose up -d
@echo "$(GREEN)访问地址: http://localhost:8501$(RESET)"
down: ## 停止服务
@echo "$(YELLOW)停止服务...$(RESET)"
docker-compose down
restart: ## 重启服务
@echo "$(YELLOW)重启服务...$(RESET)"
docker-compose restart
logs: ## 查看日志
docker-compose logs -f
shell: ## 进入容器
docker-compose exec $(SERVICE_NAME) bash
ps: ## 查看服务状态
docker-compose ps
clean: ## 清理未使用的资源
@echo "$(YELLOW)清理未使用的资源...$(RESET)"
docker system prune -f
config: ## 检查配置文件
@if [ -f "config.toml" ]; then \
echo "$(GREEN)config.toml 存在$(RESET)"; \
else \
echo "$(YELLOW)复制示例配置...$(RESET)"; \
cp config.example.toml config.toml; \
fi
================================================
FILE: README-en.md
================================================
<div align="center">
<h1 align="center" style="font-size: 2cm;"> NarratoAI 😎📽️ </h1>
<h3 align="center">An all-in-one AI-powered tool for film commentary and automated video editing.🎬🎞️ </h3>
<h3>📖 English | <a href="README.md">简体中文</a> | <a href="README-ja.md">日本語</a> </h3>
<div align="center">
[//]: # ( <a href="https://trendshift.io/repositories/8731" target="_blank"><img src="https://trendshift.io/api/badge/repositories/8731" alt="harry0703%2FNarratoAI | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>)
</div>
<br>
NarratoAI is an automated video narration tool that provides an all-in-one solution for script writing, automated video editing, voice-over, and subtitle generation, powered by LLM to enhance efficient content creation.
<br>
> **🔥 Highly Recommended: The new paradigm of VibeCut —— [Speclip](https://speclip.com) , a true editing Agent! [👉 Click to download for free](https://speclip.com)**
[](https://github.com/linyqh/NarratoAI)
[](https://github.com/linyqh/NarratoAI/blob/main/LICENSE)
[](https://github.com/linyqh/NarratoAI/issues)
[](https://github.com/linyqh/NarratoAI/stargazers)
<a href="https://discord.com/invite/V2pbAqqQNb" target="_blank">💬 Join the open source community to get project updates and the latest news.</a>
<h2><a href="https://p9mf6rjv3c.feishu.cn/wiki/SP8swLLZki5WRWkhuFvc2CyInDg?from=from_copylink" target="_blank">🎉🎉🎉 Official Documentation 🎉🎉🎉</a> </h2>
<h3>Home</h3>

<h3>Video Review Interface</h3>

</div>
## Latest News
- 2025.05.11 Released new version 0.6.0, supports **short drama commentary** and optimized editing process
- 2025.03.06 Released new version 0.5.2, supports DeepSeek R1 and DeepSeek V3 models for short drama mixing
- 2024.12.16 Released new version 0.3.9, supports Alibaba Qwen2-VL model for video understanding; supports short drama mixing
- 2024.11.24 Opened Discord community: https://discord.com/invite/V2pbAqqQNb
- 2024.11.11 Migrated open source community, welcome to join! [Join the official community](https://github.com/linyqh/NarratoAI/wiki)
- 2024.11.10 Released official documentation, details refer to [Official Documentation](https://p9mf6rjv3c.feishu.cn/wiki/SP8swLLZki5WRWkhuFvc2CyInDg)
- 2024.11.10 Released new version v0.3.5; optimized video editing process,
## Major Benefits 🎉
From now on, fully support DeepSeek model! Register to enjoy 20 million free tokens (worth 14 yuan platform quota), editing a 10-minute video only costs 0.1 yuan!
🔥 Quick benefits:
1️⃣ Click the link to register: https://cloud.siliconflow.cn/i/pyOKqFCV
2️⃣ Log in with your phone number, **be sure to fill in the invitation code: pyOKqFCV**
3️⃣ Receive a 14 yuan quota, experience high cost-effective AI editing quickly!
💡 Low cost, high creativity:
Silicon Flow API Key can be integrated with one click, doubling intelligent editing efficiency!
(Note: The invitation code is the only proof for benefit collection, automatically credited after registration)
Immediately take action to unlock your AI productivity with "pyOKqFCV"!
😊 Update Steps:
Integration Package: Click update.bat one-click update script
Code Build: Use git pull to fetch the latest code
## Announcement 📢
_**Note⚠️: Recently, someone has been impersonating the author on x (Twitter) to issue tokens on the pump.fun platform! This is a scam!!! Do not be deceived! Currently, NarratoAI has not made any official promotions on x (Twitter), please be cautious**_
Below is a screenshot of this person's x (Twitter) homepage
<img src="https://github.com/user-attachments/assets/c492ab99-52cd-4ba2-8695-1bd2073ecf12" alt="Screenshot_20250109_114131_Samsung Internet" style="width:30%; height:auto;">
## Future Plans 🥳
- [x] Windows Integration Pack Release
- [x] Optimized the story generation process and improved the generation effect
- [x] Released version 0.3.5 integration package
- [x] Support Alibaba Qwen2-VL large model for video understanding
- [x] Support short drama commentary
- [x] One-click merge materials
- [x] One-click transcription
- [x] One-click clear cache
- [ ] Support exporting to Jianying drafts
- [X] Support short drama commentary
- [ ] Character face matching
- [ ] Support automatic matching based on voiceover, script, and video materials
- [ ] Support more TTS engines
- [ ] ...
## System Requirements 📦
- Recommended minimum: CPU with 4 cores or more, 8GB RAM or more, GPU is not required
- Windows 10/11 or MacOS 11.0 or above
- [Python 3.12+](https://www.python.org/downloads/)
## Feedback & Suggestions 📢
👏 1. You can submit [issue](https://github.com/linyqh/NarratoAI/issues) or [pull request](https://github.com/linyqh/NarratoAI/pulls)
💬 2. [Join the open source community exchange group](https://github.com/linyqh/NarratoAI/wiki)
📷 3. Follow the official account [NarratoAI助手] to grasp the latest news
## Reference Projects 📚
- https://github.com/FujiwaraChoki/MoneyPrinter
- https://github.com/harry0703/MoneyPrinterTurbo
This project was refactored based on the above projects with the addition of video narration features. Thanks to the original authors for their open-source spirit 🥳🥳🥳
## Buy the Author a Cup of Coffee ☕️
<div style="display: flex; justify-content: space-between;">
<img src="https://github.com/user-attachments/assets/5038ccfb-addf-4db1-9966-99415989fd0c" alt="Image 1" style="width: 350px; height: 350px; margin: auto;"/>
<img src="https://github.com/user-attachments/assets/07d4fd58-02f0-425c-8b59-2ab94b4f09f8" alt="Image 2" style="width: 350px; height: 350px; margin: auto;"/>
</div>
## License 📝
Click to view [`LICENSE`](LICENSE) file
## Star History
[](https://star-history.com/#linyqh/NarratoAI&Date)
================================================
FILE: README.md
================================================
<div align="center">
<h1 align="center" style="font-size: 2cm;"> NarratoAI 😎📽️ </h1>
<h3 align="center">一站式 AI 影视解说+自动化剪辑工具🎬🎞️ </h3>
<h3>📖 <a href="README-en.md">English</a> | 简体中文 </h3>
<div align="center">
[//]: # ( <a href="https://trendshift.io/repositories/8731" target="_blank"><img src="https://trendshift.io/api/badge/repositories/8731" alt="harry0703%2FNarratoAI | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>)
</div>
<br>
NarratoAI 是一个自动化影视解说工具,基于LLM实现文案撰写、自动化视频剪辑、配音和字幕生成的一站式流程,助力高效内容创作。
<br>
> **🔥 隆重推荐:VibeCut 的新范式 —— [Speclip](https://speclip.com) ,一个真正意义上的剪辑 Agent**
[](https://github.com/linyqh/NarratoAI)
[](https://github.com/linyqh/NarratoAI/blob/main/LICENSE)
[](https://github.com/linyqh/NarratoAI/issues)
[](https://github.com/linyqh/NarratoAI/stargazers)
<a href="https://discord.com/invite/V2pbAqqQNb" target="_blank">💬 加入 discord 开源社区,获取项目动态和最新资讯。</a>
<h2><a href="https://p9mf6rjv3c.feishu.cn/wiki/SP8swLLZki5WRWkhuFvc2CyInDg?from=from_copylink" target="_blank">🎉🎉🎉 官方文档 🎉🎉🎉</a> </h2>
<h3>首页</h3>

</div>
## 许可证
本项目仅供学习和研究使用,不得商用。如需商业授权,请联系作者。
## 最新资讯
- 2025.11.20 发布新版本 0.7.5, 新增 [IndexTTS2](https://github.com/index-tts/index-tts) 语音克隆支持
- 2025.10.15 发布新版本 0.7.3, 使用 [LiteLLM](https://github.com/BerriAI/litellm) 管理模型供应商
- 2025.09.10 发布新版本 0.7.2, 新增腾讯云tts
- 2025.08.18 发布新版本 0.7.1,支持 **语音克隆** 和 最新大模型
- 2025.05.11 发布新版本 0.6.0,支持 **短剧解说** 和 优化剪辑流程
- 2025.03.06 发布新版本 0.5.2,支持 DeepSeek R1 和 DeepSeek V3 模型进行短剧混剪
- 2024.12.16 发布新版本 0.3.9,支持阿里 Qwen2-VL 模型理解视频;支持短剧混剪
- 2024.11.24 开通 discord 社群:https://discord.com/invite/V2pbAqqQNb
- 2024.11.11 迁移开源社群,欢迎加入! [加入官方社群](https://github.com/linyqh/NarratoAI/wiki)
- 2024.11.10 发布官方文档,详情参见 [官方文档](https://p9mf6rjv3c.feishu.cn/wiki/SP8swLLZki5WRWkhuFvc2CyInDg)
- 2024.11.10 发布新版本 v0.3.5;优化视频剪辑流程,
## 重磅福利 🎉
> 1️⃣
> **开发者专属福利:一站式AI平台,注册即送体验金!**
>
> 还在为接入各种AI模型烦恼吗?向您推荐 302.AI,一个企业级的AI资源中心。一次接入,即可调用上百种AI模型,涵盖语言、图像、音视频等,按量付费,极大降低开发成本。
>
> 通过下方我的专属链接注册,**立获1美元免费体验金**,助您轻松开启AI开发之旅。
>
> **立即注册领取:** [https://share.302.ai/I9P6mP](https://share.302.ai/I9P6mP)
---
> 2️⃣
> 即日起全面支持硅基流动!注册即享2000万免费Token(价值16元平台配额),剪辑10分钟视频仅需0.1元!
>
> 🔥 快速领福利:
> 1️⃣ 点击链接注册:https://cloud.siliconflow.cn/i/MI9PgHwB
> 2️⃣ 使用手机号登录,**务必填写邀请码:MI9PgHwB**
> 3️⃣ 领取16元配额,极速体验高性价比AI剪辑
>
> 💡 小成本大创作:
> 硅基流动API Key一键接入,智能剪辑效率翻倍!
> (注:邀请码为福利领取唯一凭证,注册后自动到账)
>
> 立即行动,用「MI9PgHwB」解锁你的AI生产力!
## ⚠️谨防被骗 📢
_**1. NarratoAI 是一款完全免费的软件,近期在社交媒体(抖音,B站等)上发现,有人将 NarratoAI 改名后售卖,下面是部分截图,请大家务必提高警惕,切勿上当受骗**_
---
<div style="display: flex; flex-wrap: wrap; justify-content: space-around; align-items: flex-start; gap: 10px;">
<img src="https://github.com/user-attachments/assets/9cc0e5e4-bd5b-4655-b5ef-7d9085cdbc50" alt="诈骗截图 1" style="width: 23%; max-width: 250px; height: auto; border: 1px solid #ddd; border-radius: 5px; box-shadow: 2px 2px 8px rgba(0,0,0,0.1);">
<img src="https://github.com/user-attachments/assets/464b877c-b061-4856-8260-a0ef6fad7e52" alt="诈骗截图 2" style="width: 23%; max-width: 250px; height: auto; border: 1px solid #ddd; border-radius: 5px; box-shadow: 2px 2px 8px rgba(0,0,0,0.1);">
<img src="https://github.com/user-attachments/assets/9d7a6ea9-4bca-42b5-a61e-7e464037930f" alt="诈骗截图 3" style="width: 23%; max-width: 250px; height: auto; border: 1px solid #ddd; border-radius: 5px; box-shadow: 2px 2px 8px rgba(0,0,0,0.1);">
<img src="https://github.com/user-attachments/assets/09eeb94d-c670-4d7d-ba19-c0468bed3291" alt="诈骗截图 4" style="width: 23%; max-width: 250px; height: auto; border: 1px solid #ddd; border-radius: 5px; box-shadow: 2px 2px 8px rgba(0,0,0,0.1);">
</div>
---
## 未来计划 🥳
- [x] windows 整合包发布
- [x] 优化剧情生成流程,提升生成效果
- [x] 发布 0.3.5 整合包
- [x] 支持阿里 Qwen2-VL 大模型理解视频
- [x] 支持短剧混剪
- [x] 一键合并素材
- [x] 一键转录
- [x] 一键清理缓存
- [ ] 支持导出剪映草稿
- [X] 支持短剧解说
- [ ] 主角人脸匹配
- [ ] 支持根据口播,文案,视频素材自动匹配
- [ ] 支持更多 TTS 引擎
- [ ] ...
## 快速启动 🚀
### 方式一:macos Docker 部署(macos 推荐)
```bash
# 1. 克隆项目
git clone https://github.com/linyqh/NarratoAI.git
cd NarratoAI
# 2. 一键部署
docker compose up -d
# 3. 访问应用
# 浏览器打开 http://localhost:8501
```
### 方式二:整合包(Windows 推荐)
> *关注微信公众号 **NarratoAI 助手** 右下角菜单栏获取下载链接*
### 方式三:本地运行
```bash
# 1. 克隆项目
git clone https://github.com/linyqh/NarratoAI.git
cd NarratoAI
# 2. 安装依赖
pip install -r requirements.txt
# 3. 复制配置文件
cp config.example.toml config.toml
# 4. 编辑 config.toml,配置你的 API 密钥
# 5. 启动应用
streamlit run webui.py --server.maxUploadSize=2048
# 6. 访问应用
# 浏览器打开 http://localhost:8501
```
## 配置要求 📦
- 建议最低 CPU 4核或以上,内存 8G 或以上,显卡非必须
- Windows 10/11 或 MacOS 11.0 以上系统
- [Python 3.12+](https://www.python.org/downloads/)
## 反馈建议 📢
👏 1. 可以提交 [issue](https://github.com/linyqh/NarratoAI/issues)或者 [pull request](https://github.com/linyqh/NarratoAI/pulls)
💬 2. [加入开源社区交流群](https://github.com/linyqh/NarratoAI/wiki)
📷 3. 关注公众号【NarratoAI助手】,掌握最新资讯
## 参考项目 📚
- https://github.com/FujiwaraChoki/MoneyPrinter
- https://github.com/harry0703/MoneyPrinterTurbo
该项目基于以上项目重构而来,增加了影视解说功能,感谢大佬的开源精神 🥳🥳🥳
## 请作者喝一杯咖啡 ☕️
<div style="display: flex; justify-content: space-between;">
<img src="https://github.com/user-attachments/assets/5038ccfb-addf-4db1-9966-99415989fd0c" alt="Image 1" style="width: 350px; height: 350px; margin: auto;"/>
<img src="https://github.com/user-attachments/assets/07d4fd58-02f0-425c-8b59-2ab94b4f09f8" alt="Image 2" style="width: 350px; height: 350px; margin: auto;"/>
</div>
## 赞助
[](https://dartnode.com "Powered by DartNode - Free VPS for Open Source")
## 许可证 📝
点击查看 [`LICENSE`](LICENSE) 文件
## Star History
[](https://star-history.com/#linyqh/NarratoAI&Date)
================================================
FILE: app/__init__.py
================================================
================================================
FILE: app/config/__init__.py
================================================
import os
import sys
from loguru import logger
from app.config import config
from app.utils import utils
def __init_logger():
# _log_file = utils.storage_dir("logs/server.log")
_lvl = config.log_level
root_dir = os.path.dirname(
os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
)
def format_record(record):
# 获取日志记录中的文件全路径
file_path = record["file"].path
# 将绝对路径转换为相对于项目根目录的路径
relative_path = os.path.relpath(file_path, root_dir)
# 更新记录中的文件路径
record["file"].path = f"./{relative_path}"
# 返回修改后的格式字符串
# 您可以根据需要调整这里的格式
_format = (
"<green>{time:%Y-%m-%d %H:%M:%S}</> | "
+ "<level>{level}</> | "
+ '"{file.path}:{line}":<blue> {function}</> '
+ "- <level>{message}</>"
+ "\n"
)
return _format
def log_filter(record):
"""过滤不必要的日志消息"""
# 过滤掉模板注册等 DEBUG 级别的噪音日志
ignore_patterns = [
"已注册模板过滤器",
"已注册提示词",
"注册视觉模型提供商",
"注册文本模型提供商",
"LLM服务提供商注册",
"FFmpeg支持的硬件加速器",
"硬件加速测试优先级",
"硬件加速方法",
]
# 如果是 DEBUG 级别且包含过滤模式,则不显示
if record["level"].name == "DEBUG":
return not any(pattern in record["message"] for pattern in ignore_patterns)
return True
logger.remove()
logger.add(
sys.stdout,
level=_lvl,
format=format_record,
colorize=True,
filter=log_filter
)
# logger.add(
# _log_file,
# level=_lvl,
# format=format_record,
# rotation="00:00",
# retention="3 days",
# backtrace=True,
# diagnose=True,
# enqueue=True,
# )
__init_logger()
================================================
FILE: app/config/audio_config.py
================================================
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''
@Project: NarratoAI
@File : audio_config
@Author : Viccy同学
@Date : 2025/1/7
@Description: 音频配置管理
'''
from typing import Dict, Any
from loguru import logger
class AudioConfig:
"""音频配置管理类"""
# 默认音量配置
DEFAULT_VOLUMES = {
'tts_volume': 0.8, # TTS音量稍微降低
'original_volume': 1.3, # 原声音量提高
'bgm_volume': 0.3, # 背景音乐保持较低
}
# 音频质量配置
AUDIO_QUALITY = {
'sample_rate': 44100, # 采样率
'channels': 2, # 声道数(立体声)
'bitrate': '128k', # 比特率
}
# 音频处理配置
PROCESSING_CONFIG = {
'enable_smart_volume': True, # 启用智能音量调整
'enable_audio_normalization': True, # 启用音频标准化
'target_lufs': -20.0, # 目标响度 (LUFS)
'max_peak': -1.0, # 最大峰值 (dBFS)
'volume_analysis_method': 'lufs', # 音量分析方法: 'lufs' 或 'rms'
}
# 音频混合配置
MIXING_CONFIG = {
'crossfade_duration': 0.1, # 交叉淡化时长(秒)
'bgm_fade_out': 3.0, # BGM淡出时长(秒)
'dynamic_range_compression': False, # 动态范围压缩
}
@classmethod
def get_optimized_volumes(cls, video_type: str = 'default') -> Dict[str, float]:
"""
根据视频类型获取优化的音量配置
Args:
video_type: 视频类型 ('default', 'educational', 'entertainment', 'news')
Returns:
Dict[str, float]: 音量配置字典
"""
base_volumes = cls.DEFAULT_VOLUMES.copy()
# 根据视频类型调整音量
if video_type == 'educational':
# 教育类视频:突出解说,降低原声
base_volumes.update({
'tts_volume': 0.9,
'original_volume': 0.8,
'bgm_volume': 0.2,
})
elif video_type == 'entertainment':
# 娱乐类视频:平衡解说和原声
base_volumes.update({
'tts_volume': 0.8,
'original_volume': 1.2,
'bgm_volume': 0.4,
})
elif video_type == 'news':
# 新闻类视频:突出解说,最小化背景音
base_volumes.update({
'tts_volume': 1.0,
'original_volume': 0.6,
'bgm_volume': 0.1,
})
logger.info(f"使用 {video_type} 类型的音量配置: {base_volumes}")
return base_volumes
@classmethod
def get_audio_processing_config(cls) -> Dict[str, Any]:
"""获取音频处理配置"""
return cls.PROCESSING_CONFIG.copy()
@classmethod
def get_mixing_config(cls) -> Dict[str, Any]:
"""获取音频混合配置"""
return cls.MIXING_CONFIG.copy()
@classmethod
def validate_volume(cls, volume: float, name: str) -> float:
"""
验证和限制音量值
Args:
volume: 音量值
name: 音量名称(用于日志)
Returns:
float: 验证后的音量值
"""
min_volume = 0.0
max_volume = 2.0 # 允许原声超过1.0
if volume < min_volume:
logger.warning(f"{name}音量 {volume} 低于最小值 {min_volume},已调整")
return min_volume
elif volume > max_volume:
logger.warning(f"{name}音量 {volume} 超过最大值 {max_volume},已调整")
return max_volume
return volume
@classmethod
def apply_volume_profile(cls, profile_name: str) -> Dict[str, float]:
"""
应用预设的音量配置文件
Args:
profile_name: 配置文件名称
Returns:
Dict[str, float]: 音量配置
"""
profiles = {
'balanced': {
'tts_volume': 0.8,
'original_volume': 1.2,
'bgm_volume': 0.3,
},
'voice_focused': {
'tts_volume': 1.0,
'original_volume': 0.7,
'bgm_volume': 0.2,
},
'original_focused': {
'tts_volume': 0.7,
'original_volume': 1.5,
'bgm_volume': 0.2,
},
'quiet_background': {
'tts_volume': 0.8,
'original_volume': 1.3,
'bgm_volume': 0.1,
}
}
if profile_name in profiles:
logger.info(f"应用音量配置文件: {profile_name}")
return profiles[profile_name]
else:
logger.warning(f"未找到配置文件 {profile_name},使用默认配置")
return cls.DEFAULT_VOLUMES.copy()
# 全局音频配置实例
audio_config = AudioConfig()
def get_recommended_volumes_for_content(content_type: str = 'mixed') -> Dict[str, float]:
"""
根据内容类型推荐音量设置
Args:
content_type: 内容类型
- 'mixed': 混合内容(默认)
- 'voice_only': 纯解说
- 'original_heavy': 原声为主
- 'music_video': 音乐视频
Returns:
Dict[str, float]: 推荐的音量配置
"""
recommendations = {
'mixed': {
'tts_volume': 0.8,
'original_volume': 1.3,
'bgm_volume': 0.3,
},
'voice_only': {
'tts_volume': 1.0,
'original_volume': 0.5,
'bgm_volume': 0.2,
},
'original_heavy': {
'tts_volume': 0.6,
'original_volume': 1.6,
'bgm_volume': 0.1,
},
'music_video': {
'tts_volume': 0.7,
'original_volume': 1.8,
'bgm_volume': 0.0, # 不添加额外BGM
}
}
return recommendations.get(content_type, recommendations['mixed'])
if __name__ == "__main__":
# 测试配置
config = AudioConfig()
# 测试不同类型的音量配置
for video_type in ['default', 'educational', 'entertainment', 'news']:
volumes = config.get_optimized_volumes(video_type)
print(f"{video_type}: {volumes}")
# 测试配置文件
for profile in ['balanced', 'voice_focused', 'original_focused']:
volumes = config.apply_volume_profile(profile)
print(f"{profile}: {volumes}")
================================================
FILE: app/config/config.py
================================================
import os
import socket
import toml
import shutil
from loguru import logger
root_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.realpath(__file__))))
config_file = f"{root_dir}/config.toml"
version_file = f"{root_dir}/project_version"
def get_version_from_file():
"""从project_version文件中读取版本号"""
try:
if os.path.isfile(version_file):
with open(version_file, "r", encoding="utf-8") as f:
return f.read().strip()
return "0.1.0" # 默认版本号
except Exception as e:
logger.error(f"读取版本号文件失败: {str(e)}")
return "0.1.0" # 默认版本号
def load_config():
# fix: IsADirectoryError: [Errno 21] Is a directory: '/NarratoAI/config.toml'
if os.path.isdir(config_file):
shutil.rmtree(config_file)
if not os.path.isfile(config_file):
example_file = f"{root_dir}/config.example.toml"
if os.path.isfile(example_file):
shutil.copyfile(example_file, config_file)
logger.info(f"copy config.example.toml to config.toml")
logger.info(f"load config from file: {config_file}")
try:
_config_ = toml.load(config_file)
except Exception as e:
logger.warning(f"load config failed: {str(e)}, try to load as utf-8-sig")
with open(config_file, mode="r", encoding="utf-8-sig") as fp:
_cfg_content = fp.read()
_config_ = toml.loads(_cfg_content)
return _config_
def save_config():
with open(config_file, "w", encoding="utf-8") as f:
_cfg["app"] = app
_cfg["proxy"] = proxy
_cfg["azure"] = azure
_cfg["tencent"] = tencent
_cfg["soulvoice"] = soulvoice
_cfg["ui"] = ui
_cfg["tts_qwen"] = tts_qwen
_cfg["indextts2"] = indextts2
f.write(toml.dumps(_cfg))
_cfg = load_config()
app = _cfg.get("app", {})
whisper = _cfg.get("whisper", {})
proxy = _cfg.get("proxy", {})
azure = _cfg.get("azure", {})
tencent = _cfg.get("tencent", {})
soulvoice = _cfg.get("soulvoice", {})
ui = _cfg.get("ui", {})
frames = _cfg.get("frames", {})
tts_qwen = _cfg.get("tts_qwen", {})
indextts2 = _cfg.get("indextts2", {})
hostname = socket.gethostname()
log_level = _cfg.get("log_level", "DEBUG")
listen_host = _cfg.get("listen_host", "0.0.0.0")
listen_port = _cfg.get("listen_port", 8080)
project_name = _cfg.get("project_name", "NarratoAI")
project_description = _cfg.get(
"project_description",
"<a href='https://github.com/linyqh/NarratoAI'>https://github.com/linyqh/NarratoAI</a>",
)
# 从文件读取版本号,而不是从配置文件中获取
project_version = get_version_from_file()
reload_debug = False
imagemagick_path = app.get("imagemagick_path", "")
if imagemagick_path and os.path.isfile(imagemagick_path):
os.environ["IMAGEMAGICK_BINARY"] = imagemagick_path
ffmpeg_path = app.get("ffmpeg_path", "")
if ffmpeg_path and os.path.isfile(ffmpeg_path):
os.environ["IMAGEIO_FFMPEG_EXE"] = ffmpeg_path
logger.info(f"{project_name} v{project_version}")
================================================
FILE: app/config/ffmpeg_config.py
================================================
"""
FFmpeg 配置管理模块
专门用于管理 FFmpeg 兼容性设置和优化参数
"""
import os
import platform
from typing import Dict, List, Optional
from dataclasses import dataclass
from loguru import logger
@dataclass
class FFmpegProfile:
"""FFmpeg 配置文件"""
name: str
description: str
hwaccel_enabled: bool
hwaccel_type: Optional[str]
encoder: str
quality_preset: str
pixel_format: str
additional_args: List[str]
compatibility_level: int # 1-5, 5为最高兼容性
class FFmpegConfigManager:
"""FFmpeg 配置管理器"""
# 预定义的配置文件
PROFILES = {
# 高性能配置(适用于现代硬件)
"high_performance": FFmpegProfile(
name="high_performance",
description="高性能配置(NVIDIA/AMD 独立显卡)",
hwaccel_enabled=True,
hwaccel_type="auto",
encoder="auto",
quality_preset="fast",
pixel_format="yuv420p",
additional_args=["-preset", "fast"],
compatibility_level=2
),
# 兼容性配置(适用于有问题的硬件)
"compatibility": FFmpegProfile(
name="compatibility",
description="兼容性配置(解决滤镜链问题)",
hwaccel_enabled=False,
hwaccel_type=None,
encoder="libx264",
quality_preset="medium",
pixel_format="yuv420p",
additional_args=["-preset", "medium", "-crf", "23"],
compatibility_level=5
),
# Windows N 卡优化配置
"windows_nvidia": FFmpegProfile(
name="windows_nvidia",
description="Windows NVIDIA 显卡优化配置",
hwaccel_enabled=True,
hwaccel_type="nvenc_pure", # 纯编码器,避免解码问题
encoder="h264_nvenc",
quality_preset="medium",
pixel_format="yuv420p",
additional_args=["-preset", "medium", "-cq", "23"],
compatibility_level=3
),
# macOS 优化配置
"macos_videotoolbox": FFmpegProfile(
name="macos_videotoolbox",
description="macOS VideoToolbox 优化配置",
hwaccel_enabled=True,
hwaccel_type="videotoolbox",
encoder="h264_videotoolbox",
quality_preset="medium",
pixel_format="yuv420p",
additional_args=["-q:v", "65"],
compatibility_level=3
),
# 通用软件配置
"universal_software": FFmpegProfile(
name="universal_software",
description="通用软件编码配置(最高兼容性)",
hwaccel_enabled=False,
hwaccel_type=None,
encoder="libx264",
quality_preset="medium",
pixel_format="yuv420p",
additional_args=["-preset", "medium", "-crf", "23"],
compatibility_level=5
)
}
@classmethod
def get_recommended_profile(cls) -> str:
"""
根据系统环境推荐最佳配置文件
Returns:
str: 推荐的配置文件名称
"""
system = platform.system().lower()
# 检测硬件加速可用性
try:
from app.utils import ffmpeg_utils
hwaccel_info = ffmpeg_utils.get_ffmpeg_hwaccel_info()
hwaccel_available = hwaccel_info.get("available", False)
hwaccel_type = hwaccel_info.get("type", "software")
gpu_vendor = hwaccel_info.get("gpu_vendor", "unknown")
except Exception as e:
logger.warning(f"无法检测硬件加速信息: {e}")
hwaccel_available = False
hwaccel_type = "software"
gpu_vendor = "unknown"
# 根据平台和硬件推荐配置
if system == "windows":
if hwaccel_available and gpu_vendor == "nvidia":
return "windows_nvidia"
elif hwaccel_available:
return "high_performance"
else:
return "compatibility"
elif system == "darwin":
if hwaccel_available and hwaccel_type == "videotoolbox":
return "macos_videotoolbox"
else:
return "universal_software"
elif system == "linux":
if hwaccel_available:
return "high_performance"
else:
return "universal_software"
else:
return "universal_software"
@classmethod
def get_profile(cls, profile_name: str) -> FFmpegProfile:
"""
获取指定的配置文件
Args:
profile_name: 配置文件名称
Returns:
FFmpegProfile: 配置文件对象
"""
if profile_name not in cls.PROFILES:
logger.warning(f"未知的配置文件: {profile_name},使用默认配置")
profile_name = "universal_software"
return cls.PROFILES[profile_name]
@classmethod
def get_extraction_command(cls,
input_path: str,
output_path: str,
timestamp: float,
profile_name: Optional[str] = None) -> List[str]:
"""
根据配置文件生成关键帧提取命令
Args:
input_path: 输入视频路径
output_path: 输出图片路径
timestamp: 时间戳
profile_name: 配置文件名称,None 表示自动选择
Returns:
List[str]: FFmpeg 命令列表
"""
if profile_name is None:
profile_name = cls.get_recommended_profile()
profile = cls.get_profile(profile_name)
# 构建基础命令
cmd = [
"ffmpeg",
"-hide_banner",
"-loglevel", "error",
]
# 添加硬件加速参数
if profile.hwaccel_enabled and profile.hwaccel_type:
if profile.hwaccel_type == "auto":
# 自动检测硬件加速
try:
from app.utils import ffmpeg_utils
hw_args = ffmpeg_utils.get_ffmpeg_hwaccel_args()
cmd.extend(hw_args)
except Exception:
pass
elif profile.hwaccel_type == "nvenc_pure":
# 纯 NVENC 编码器,不使用硬件解码
pass
else:
# 指定的硬件加速类型
cmd.extend(["-hwaccel", profile.hwaccel_type])
# 添加输入参数
cmd.extend([
"-ss", str(timestamp),
"-i", input_path,
"-vframes", "1",
])
# 添加质量和格式参数
if profile.encoder == "libx264":
cmd.extend(["-q:v", "2"])
elif profile.encoder == "h264_nvenc":
cmd.extend(["-cq", "23"])
elif profile.encoder == "h264_videotoolbox":
cmd.extend(["-q:v", "65"])
else:
cmd.extend(["-q:v", "2"])
# 添加像素格式
cmd.extend(["-pix_fmt", profile.pixel_format])
# 添加额外参数
cmd.extend(profile.additional_args)
# 添加输出参数
cmd.extend(["-y", output_path])
return cmd
@classmethod
def list_profiles(cls) -> Dict[str, str]:
"""
列出所有可用的配置文件
Returns:
Dict[str, str]: 配置文件名称到描述的映射
"""
return {name: profile.description for name, profile in cls.PROFILES.items()}
@classmethod
def get_compatibility_report(cls) -> Dict[str, any]:
"""
生成兼容性报告
Returns:
Dict: 兼容性报告
"""
recommended_profile = cls.get_recommended_profile()
profile = cls.get_profile(recommended_profile)
try:
from app.utils import ffmpeg_utils
hwaccel_info = ffmpeg_utils.get_ffmpeg_hwaccel_info()
except Exception:
hwaccel_info = {"available": False, "message": "检测失败"}
return {
"system": platform.system(),
"recommended_profile": recommended_profile,
"profile_description": profile.description,
"compatibility_level": profile.compatibility_level,
"hardware_acceleration": hwaccel_info,
"suggestions": cls._get_suggestions(profile, hwaccel_info)
}
@classmethod
def _get_suggestions(cls, profile: FFmpegProfile, hwaccel_info: Dict) -> List[str]:
"""生成优化建议"""
suggestions = []
if not hwaccel_info.get("available", False):
suggestions.append("建议更新显卡驱动以启用硬件加速")
if profile.compatibility_level >= 4:
suggestions.append("当前使用高兼容性配置,性能可能较低")
if platform.system().lower() == "windows" and "nvidia" in hwaccel_info.get("gpu_vendor", "").lower():
suggestions.append("Windows NVIDIA 用户建议使用纯编码器模式避免滤镜链问题")
return suggestions
================================================
FILE: app/models/__init__.py
================================================
================================================
FILE: app/models/const.py
================================================
PUNCTUATIONS = [
"?",
",",
".",
"、",
";",
":",
"!",
"…",
"?",
",",
"。",
"、",
";",
":",
"!",
"...",
]
TASK_STATE_FAILED = -1
TASK_STATE_COMPLETE = 1
TASK_STATE_PROCESSING = 4
FILE_TYPE_VIDEOS = ["mp4", "mov", "mkv", "webm"]
FILE_TYPE_IMAGES = ["jpg", "jpeg", "png", "bmp"]
================================================
FILE: app/models/exception.py
================================================
import traceback
from typing import Any
from loguru import logger
class HttpException(Exception):
def __init__(
self, task_id: str, status_code: int, message: str = "", data: Any = None
):
self.message = message
self.status_code = status_code
self.data = data
# 获取异常堆栈信息
tb_str = traceback.format_exc().strip()
if not tb_str or tb_str == "NoneType: None":
msg = f"HttpException: {status_code}, {task_id}, {message}"
else:
msg = f"HttpException: {status_code}, {task_id}, {message}\n{tb_str}"
if status_code == 400:
logger.warning(msg)
else:
logger.error(msg)
class FileNotFoundException(Exception):
pass
================================================
FILE: app/models/schema.py
================================================
import warnings
from enum import Enum
from typing import Any, List, Optional, Union
import pydantic
from pydantic import BaseModel, Field
# 忽略 Pydantic 的特定警告
warnings.filterwarnings(
"ignore",
category=UserWarning,
message="Field name.*shadows an attribute in parent.*",
)
class AudioVolumeDefaults:
"""音量配置默认值常量类 - 确保全局一致性"""
# 语音音量默认值
VOICE_VOLUME = 1.0
TTS_VOLUME = 1.0
# 原声音量默认值 - 提高原声音量以平衡TTS
ORIGINAL_VOLUME = 1.2
# 背景音乐音量默认值
BGM_VOLUME = 0.3
# 音量范围
MIN_VOLUME = 0.0
MAX_VOLUME = 2.0 # 允许原声音量超过1.0以平衡TTS
# 智能音量调整
ENABLE_SMART_VOLUME = True # 是否启用智能音量分析和调整
class VideoConcatMode(str, Enum):
random = "random"
sequential = "sequential"
class VideoAspect(str, Enum):
landscape = "16:9"
landscape_2 = "4:3"
portrait = "9:16"
portrait_2 = "3:4"
square = "1:1"
def to_resolution(self):
if self == VideoAspect.landscape.value:
return 1920, 1080
elif self == VideoAspect.portrait.value:
return 1080, 1920
elif self == VideoAspect.square.value:
return 1080, 1080
return 1080, 1920
class _Config:
arbitrary_types_allowed = True
@pydantic.dataclasses.dataclass(config=_Config)
class MaterialInfo:
provider: str = "pexels"
url: str = ""
duration: int = 0
# VoiceNames = [
# # zh-CN
# "female-zh-CN-XiaoxiaoNeural",
# "female-zh-CN-XiaoyiNeural",
# "female-zh-CN-liaoning-XiaobeiNeural",
# "female-zh-CN-shaanxi-XiaoniNeural",
#
# "male-zh-CN-YunjianNeural",
# "male-zh-CN-YunxiNeural",
# "male-zh-CN-YunxiaNeural",
# "male-zh-CN-YunyangNeural",
#
# # "female-zh-HK-HiuGaaiNeural",
# # "female-zh-HK-HiuMaanNeural",
# # "male-zh-HK-WanLungNeural",
# #
# # "female-zh-TW-HsiaoChenNeural",
# # "female-zh-TW-HsiaoYuNeural",
# # "male-zh-TW-YunJheNeural",
#
# # en-US
# "female-en-US-AnaNeural",
# "female-en-US-AriaNeural",
# "female-en-US-AvaNeural",
# "female-en-US-EmmaNeural",
# "female-en-US-JennyNeural",
# "female-en-US-MichelleNeural",
#
# "male-en-US-AndrewNeural",
# "male-en-US-BrianNeural",
# "male-en-US-ChristopherNeural",
# "male-en-US-EricNeural",
# "male-en-US-GuyNeural",
# "male-en-US-RogerNeural",
# "male-en-US-SteffanNeural",
# ]
class VideoParams(BaseModel):
"""
{
"video_subject": "",
"video_aspect": "横屏 16:9(西瓜视频)",
"voice_name": "女生-晓晓",
"bgm_name": "random",
"font_name": "STHeitiMedium 黑体-中",
"text_color": "#FFFFFF",
"font_size": 60,
"stroke_color": "#000000",
"stroke_width": 1.5
}
"""
video_subject: str
video_script: str = "" # 用于生成视频的脚本
video_terms: Optional[Union[str, list]] = None # 用于生成视频的关键词
video_aspect: Optional[VideoAspect] = VideoAspect.portrait.value
video_concat_mode: Optional[VideoConcatMode] = VideoConcatMode.random.value
video_clip_duration: Optional[int] = 5
video_count: Optional[int] = 1
video_source: Optional[str] = "pexels"
video_materials: Optional[List[MaterialInfo]] = None # 用于生成视频的素材
video_language: Optional[str] = "" # auto detect
voice_name: Optional[str] = ""
voice_volume: Optional[float] = AudioVolumeDefaults.VOICE_VOLUME
voice_rate: Optional[float] = 1.0
bgm_type: Optional[str] = "random"
bgm_file: Optional[str] = ""
bgm_volume: Optional[float] = AudioVolumeDefaults.BGM_VOLUME
subtitle_enabled: Optional[bool] = True
subtitle_position: Optional[str] = "bottom" # top, bottom, center
custom_position: float = 70.0
font_name: Optional[str] = "STHeitiMedium.ttc"
text_fore_color: Optional[str] = "#FFFFFF"
text_background_color: Optional[str] = "transparent"
font_size: int = 60
stroke_color: Optional[str] = "#000000"
stroke_width: float = 1.5
n_threads: Optional[int] = 2
paragraph_number: Optional[int] = 1
class VideoClipParams(BaseModel):
"""
NarratoAI 数据模型
"""
video_clip_json: Optional[list] = Field(default=[], description="LLM 生成的视频剪辑脚本内容")
video_clip_json_path: Optional[str] = Field(default="", description="LLM 生成的视频剪辑脚本路径")
video_origin_path: Optional[str] = Field(default="", description="原视频路径")
video_aspect: Optional[VideoAspect] = Field(default=VideoAspect.portrait.value, description="视频比例")
video_language: Optional[str] = Field(default="zh-CN", description="视频语言")
# video_clip_duration: Optional[int] = 5 # 视频片段时长
# video_count: Optional[int] = 1 # 视频片段数量
# video_source: Optional[str] = "local"
# video_concat_mode: Optional[VideoConcatMode] = VideoConcatMode.random.value
voice_name: Optional[str] = Field(default="zh-CN-YunjianNeural", description="语音名称")
voice_volume: Optional[float] = Field(default=AudioVolumeDefaults.VOICE_VOLUME, description="解说语音音量")
voice_rate: Optional[float] = Field(default=1.0, description="语速")
voice_pitch: Optional[float] = Field(default=1.0, description="语调")
tts_engine: Optional[str] = Field(default="", description="TTS 引擎")
bgm_name: Optional[str] = Field(default="random", description="背景音乐名称")
bgm_type: Optional[str] = Field(default="random", description="背景音乐类型")
bgm_file: Optional[str] = Field(default="", description="背景音乐文件")
subtitle_enabled: bool = True
font_name: str = "SimHei" # 默认使用黑体
font_size: int = 36
text_fore_color: str = "white" # 文本前景色
text_back_color: Optional[str] = None # 文本背景色
stroke_color: str = "black" # 描边颜色
stroke_width: float = 1.5 # 描边宽度
subtitle_position: str = "bottom" # top, bottom, center, custom
custom_position: float = 70.0 # 自定义位置
n_threads: Optional[int] = Field(default=16, description="线程数") # 线程数,有助于提升视频处理速度
tts_volume: Optional[float] = Field(default=AudioVolumeDefaults.TTS_VOLUME, description="解说语音音量(后处理)")
original_volume: Optional[float] = Field(default=AudioVolumeDefaults.ORIGINAL_VOLUME, description="视频原声音量")
bgm_volume: Optional[float] = Field(default=AudioVolumeDefaults.BGM_VOLUME, description="背景音乐音量")
class SubtitlePosition(str, Enum):
TOP = "top"
CENTER = "center"
BOTTOM = "bottom"
================================================
FILE: app/services/SDE/short_drama_explanation.py
================================================
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''
@Project: NarratoAI
@File : 短剧解说
@Author : 小林同学
@Date : 2025/5/9 上午12:36
'''
import os
import json
import requests
from typing import Dict, Any, Optional
from loguru import logger
from app.config import config
from app.utils.utils import get_uuid, storage_dir
from app.services.subtitle_text import read_subtitle_text
# 导入新的提示词管理系统
from app.services.prompts import PromptManager
class SubtitleAnalyzer:
"""字幕剧情分析器,负责分析字幕内容并提取关键剧情段落"""
def __init__(
self,
api_key: Optional[str] = None,
model: Optional[str] = None,
base_url: Optional[str] = None,
custom_prompt: Optional[str] = None,
temperature: Optional[float] = 1.0,
provider: Optional[str] = None,
):
"""
初始化字幕分析器
Args:
api_key: API密钥,如果不提供则从配置中读取
model: 模型名称,如果不提供则从配置中读取
base_url: API基础URL,如果不提供则从配置中读取或使用默认值
custom_prompt: 自定义提示词,如果不提供则使用默认值
temperature: 模型温度
provider: 提供商类型,用于确定API调用格式
"""
# 使用传入的参数或从配置中获取
self.api_key = api_key
self.model = model
self.base_url = base_url
self.temperature = temperature
self.provider = provider or self._detect_provider()
# 设置自定义提示词(如果提供)
self.custom_prompt = custom_prompt
# 根据提供商类型确定是否为原生Gemini
self.is_native_gemini = self.provider.lower() == 'gemini'
# 初始化HTTP请求所需的头信息
self._init_headers()
def _detect_provider(self):
"""根据配置自动检测提供商类型"""
return config.app.get('text_llm_provider', 'gemini').lower()
def _init_headers(self):
"""初始化HTTP请求头"""
try:
# 基础请求头,包含API密钥和内容类型
self.headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.api_key}"
}
# logger.debug(f"初始化成功 - API Key: {self.api_key[:8]}... - Base URL: {self.base_url}")
except Exception as e:
logger.error(f"初始化请求头失败: {str(e)}")
raise
def analyze_subtitle(self, subtitle_content: str) -> Dict[str, Any]:
"""
分析字幕内容
Args:
subtitle_content: 字幕内容文本
Returns:
Dict[str, Any]: 包含分析结果的字典
"""
try:
# 构建完整提示词
if self.custom_prompt:
# 使用自定义提示词
prompt = f"{self.custom_prompt}\n\n{subtitle_content}"
else:
# 使用新的提示词管理系统,正确传入参数
prompt = PromptManager.get_prompt(
category="short_drama_narration",
name="plot_analysis",
parameters={"subtitle_content": subtitle_content}
)
if self.is_native_gemini:
# 使用原生Gemini API格式
return self._call_native_gemini_api(prompt)
else:
# 使用OpenAI兼容格式
return self._call_openai_compatible_api(prompt)
except Exception as e:
logger.error(f"字幕分析过程中发生错误: {str(e)}")
return {
"status": "error",
"message": str(e),
"temperature": self.temperature
}
def _call_native_gemini_api(self, prompt: str) -> Dict[str, Any]:
"""调用原生Gemini API"""
try:
# 构建原生Gemini API请求数据
payload = {
"systemInstruction": {
"parts": [{"text": "你是一位专业的剧本分析师和剧情概括助手。请严格按照要求的格式输出分析结果。"}]
},
"contents": [{
"parts": [{"text": prompt}]
}],
"generationConfig": {
"temperature": self.temperature,
"topK": 40,
"topP": 0.95,
"maxOutputTokens": 64000,
"candidateCount": 1
},
"safetySettings": [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE"
}
]
}
# 构建请求URL
url = f"{self.base_url}/models/{self.model}:generateContent"
# 发送请求
response = requests.post(
url,
json=payload,
headers={"Content-Type": "application/json", "x-goog-api-key": self.api_key},
timeout=120
)
if response.status_code == 200:
response_data = response.json()
# 检查响应格式
if "candidates" not in response_data or not response_data["candidates"]:
return {
"status": "error",
"message": "原生Gemini API返回无效响应,可能触发了安全过滤",
"temperature": self.temperature
}
candidate = response_data["candidates"][0]
# 检查是否被安全过滤阻止
if "finishReason" in candidate and candidate["finishReason"] == "SAFETY":
return {
"status": "error",
"message": "内容被Gemini安全过滤器阻止",
"temperature": self.temperature
}
if "content" not in candidate or "parts" not in candidate["content"]:
return {
"status": "error",
"message": "原生Gemini API返回内容格式错误",
"temperature": self.temperature
}
# 提取文本内容
analysis_result = ""
for part in candidate["content"]["parts"]:
if "text" in part:
analysis_result += part["text"]
if not analysis_result.strip():
return {
"status": "error",
"message": "原生Gemini API返回空内容",
"temperature": self.temperature
}
logger.debug(f"原生Gemini字幕分析完成")
return {
"status": "success",
"analysis": analysis_result,
"tokens_used": response_data.get("usage", {}).get("total_tokens", 0),
"model": self.model,
"temperature": self.temperature
}
else:
error_msg = f"原生Gemini API请求失败,状态码: {response.status_code}, 响应: {response.text}"
logger.error(error_msg)
return {
"status": "error",
"message": error_msg,
"temperature": self.temperature
}
except Exception as e:
logger.error(f"原生Gemini API调用失败: {str(e)}")
return {
"status": "error",
"message": f"原生Gemini API调用失败: {str(e)}",
"temperature": self.temperature
}
def _call_openai_compatible_api(self, prompt: str) -> Dict[str, Any]:
"""调用OpenAI兼容的API"""
try:
# 构建OpenAI格式的请求数据
payload = {
"model": self.model,
"messages": [
{"role": "system", "content": "你是一位专业的剧本分析师和剧情概括助手。"},
{"role": "user", "content": prompt}
],
"temperature": self.temperature
}
# 构建请求地址
url = f"{self.base_url}/chat/completions"
# 发送HTTP请求
response = requests.post(url, headers=self.headers, json=payload, timeout=120)
# 解析响应
if response.status_code == 200:
response_data = response.json()
# 提取响应内容
if "choices" in response_data and len(response_data["choices"]) > 0:
analysis_result = response_data["choices"][0]["message"]["content"]
logger.debug(f"OpenAI兼容API字幕分析完成,消耗的tokens: {response_data.get('usage', {}).get('total_tokens', 0)}")
# 返回结果
return {
"status": "success",
"analysis": analysis_result,
"tokens_used": response_data.get("usage", {}).get("total_tokens", 0),
"model": self.model,
"temperature": self.temperature
}
else:
logger.error("OpenAI兼容API字幕分析失败: 未获取到有效响应")
return {
"status": "error",
"message": "未获取到有效响应",
"temperature": self.temperature
}
else:
error_msg = f"OpenAI兼容API请求失败,状态码: {response.status_code}, 响应: {response.text}"
logger.error(error_msg)
return {
"status": "error",
"message": error_msg,
"temperature": self.temperature
}
except Exception as e:
logger.error(f"OpenAI兼容API调用失败: {str(e)}")
return {
"status": "error",
"message": f"OpenAI兼容API调用失败: {str(e)}",
"temperature": self.temperature
}
def analyze_subtitle_from_file(self, subtitle_file_path: str) -> Dict[str, Any]:
"""
从文件读取字幕并分析
Args:
subtitle_file_path: 字幕文件的路径
Returns:
Dict[str, Any]: 包含分析结果的字典
"""
try:
# 检查文件是否存在
if not os.path.exists(subtitle_file_path):
return {
"status": "error",
"message": f"字幕文件不存在: {subtitle_file_path}",
"temperature": self.temperature
}
# 读取文件内容
subtitle_content = read_subtitle_text(subtitle_file_path).text
if not subtitle_content:
return {
"status": "error",
"message": f"字幕文件内容为空或无法读取: {subtitle_file_path}",
"temperature": self.temperature
}
# 分析字幕
return self.analyze_subtitle(subtitle_content)
except Exception as e:
logger.error(f"从文件读取字幕并分析过程中发生错误: {str(e)}")
return {
"status": "error",
"message": str(e),
"temperature": self.temperature
}
def save_analysis_result(self, analysis_result: Dict[str, Any], output_path: Optional[str] = None) -> str:
"""
保存分析结果到文件
Args:
analysis_result: 分析结果
output_path: 输出文件路径,如果不提供则自动生成
Returns:
str: 输出文件的路径
"""
try:
# 如果未提供输出路径,则自动生成
if not output_path:
output_dir = storage_dir("drama_analysis", create=True)
output_path = os.path.join(output_dir, f"analysis_{get_uuid(True)}.txt")
# 确保目录存在
os.makedirs(os.path.dirname(output_path), exist_ok=True)
# 保存结果
with open(output_path, 'w', encoding='utf-8') as f:
if analysis_result["status"] == "success":
f.write(analysis_result["analysis"])
else:
f.write(f"分析失败: {analysis_result['message']}")
logger.info(f"分析结果已保存到: {output_path}")
return output_path
except Exception as e:
logger.error(f"保存分析结果时发生错误: {str(e)}")
return ""
def generate_narration_script(self, short_name: str, plot_analysis: str, subtitle_content: str = "", temperature: float = 0.7) -> Dict[str, Any]:
"""
根据剧情分析生成解说文案
Args:
short_name: 短剧名称
plot_analysis: 剧情分析内容
subtitle_content: 原始字幕内容,用于提供准确的时间戳信息
temperature: 生成温度,控制创造性,默认0.7
Returns:
Dict[str, Any]: 包含生成结果的字典
"""
try:
# 使用新的提示词管理系统构建提示词
prompt = PromptManager.get_prompt(
category="short_drama_narration",
name="script_generation",
parameters={
"drama_name": short_name,
"plot_analysis": plot_analysis,
"subtitle_content": subtitle_content
}
)
if self.is_native_gemini:
# 使用原生Gemini API格式
return self._generate_narration_with_native_gemini(prompt, temperature)
else:
# 使用OpenAI兼容格式
return self._generate_narration_with_openai_compatible(prompt, temperature)
except Exception as e:
logger.error(f"解说文案生成过程中发生错误: {str(e)}")
return {
"status": "error",
"message": str(e),
"temperature": self.temperature
}
def _generate_narration_with_native_gemini(self, prompt: str, temperature: float) -> Dict[str, Any]:
"""使用原生Gemini API生成解说文案"""
try:
# 构建原生Gemini API请求数据
# 为了确保JSON输出,在提示词中添加更强的约束
enhanced_prompt = f"{prompt}\n\n请确保输出严格的JSON格式,不要包含任何其他文字或标记。"
payload = {
"systemInstruction": {
"parts": [{"text": "你是一位专业的短视频解说脚本撰写专家。你必须严格按照JSON格式输出,不能包含任何其他文字、说明或代码块标记。"}]
},
"contents": [{
"parts": [{"text": enhanced_prompt}]
}],
"generationConfig": {
"temperature": temperature,
"topK": 40,
"topP": 0.95,
"maxOutputTokens": 64000,
"candidateCount": 1,
"stopSequences": ["```", "注意", "说明"]
},
"safetySettings": [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_NONE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_NONE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_NONE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_NONE"
}
]
}
# 构建请求URL
url = f"{self.base_url}/models/{self.model}:generateContent"
# 发送请求
response = requests.post(
url,
json=payload,
headers={"Content-Type": "application/json", "x-goog-api-key": self.api_key},
timeout=120
)
if response.status_code == 200:
response_data = response.json()
# 检查响应格式
if "candidates" not in response_data or not response_data["candidates"]:
return {
"status": "error",
"message": "原生Gemini API返回无效响应,可能触发了安全过滤",
"temperature": temperature
}
candidate = response_data["candidates"][0]
# 检查是否被安全过滤阻止
if "finishReason" in candidate and candidate["finishReason"] == "SAFETY":
return {
"status": "error",
"message": "内容被Gemini安全过滤器阻止",
"temperature": temperature
}
if "content" not in candidate or "parts" not in candidate["content"]:
return {
"status": "error",
"message": "原生Gemini API返回内容格式错误",
"temperature": temperature
}
# 提取文本内容
narration_script = ""
for part in candidate["content"]["parts"]:
if "text" in part:
narration_script += part["text"]
if not narration_script.strip():
return {
"status": "error",
"message": "原生Gemini API返回空内容",
"temperature": temperature
}
logger.debug(f"原生Gemini解说文案生成完成")
return {
"status": "success",
"narration_script": narration_script,
"tokens_used": response_data.get("usage", {}).get("total_tokens", 0),
"model": self.model,
"temperature": temperature
}
else:
error_msg = f"原生Gemini API请求失败,状态码: {response.status_code}, 响应: {response.text}"
logger.error(error_msg)
return {
"status": "error",
"message": error_msg,
"temperature": temperature
}
except Exception as e:
logger.error(f"原生Gemini API解说文案生成失败: {str(e)}")
return {
"status": "error",
"message": f"原生Gemini API解说文案生成失败: {str(e)}",
"temperature": temperature
}
def _generate_narration_with_openai_compatible(self, prompt: str, temperature: float) -> Dict[str, Any]:
"""使用OpenAI兼容API生成解说文案"""
try:
# 构建OpenAI格式的请求数据
payload = {
"model": self.model,
"messages": [
{"role": "system", "content": "你是一位专业的短视频解说脚本撰写专家。"},
{"role": "user", "content": prompt}
],
"temperature": temperature
}
# 对特定模型添加响应格式设置
if self.model not in ["deepseek-reasoner"]:
payload["response_format"] = {"type": "json_object"}
# 构建请求地址
url = f"{self.base_url}/chat/completions"
# 发送HTTP请求
response = requests.post(url, headers=self.headers, json=payload, timeout=120)
# 解析响应
if response.status_code == 200:
response_data = response.json()
# 提取响应内容
if "choices" in response_data and len(response_data["choices"]) > 0:
narration_script = response_data["choices"][0]["message"]["content"]
logger.debug(f"OpenAI兼容API解说文案生成完成,消耗的tokens: {response_data.get('usage', {}).get('total_tokens', 0)}")
# 返回结果
return {
"status": "success",
"narration_script": narration_script,
"tokens_used": response_data.get("usage", {}).get("total_tokens", 0),
"model": self.model,
"temperature": temperature
}
else:
logger.error("OpenAI兼容API解说文案生成失败: 未获取到有效响应")
return {
"status": "error",
"message": "未获取到有效响应",
"temperature": temperature
}
else:
error_msg = f"OpenAI兼容API请求失败,状态码: {response.status_code}, 响应: {response.text}"
logger.error(error_msg)
return {
"status": "error",
"message": error_msg,
"temperature": temperature
}
except Exception as e:
logger.error(f"OpenAI兼容API解说文案生成失败: {str(e)}")
return {
"status": "error",
"message": f"OpenAI兼容API解说文案生成失败: {str(e)}",
"temperature": temperature
}
def save_narration_script(self, narration_result: Dict[str, Any], output_path: Optional[str] = None) -> str:
"""
保存解说文案到文件
Args:
narration_result: 解说文案生成结果
output_path: 输出文件路径,如果不提供则自动生成
Returns:
str: 输出文件的路径
"""
try:
# 如果未提供输出路径,则自动生成
if not output_path:
output_dir = storage_dir("narration_scripts", create=True)
output_path = os.path.join(output_dir, f"narration_{get_uuid(True)}.json")
# 确保目录存在
os.makedirs(os.path.dirname(output_path), exist_ok=True)
# 保存结果
with open(output_path, 'w', encoding='utf-8') as f:
if narration_result["status"] == "success":
f.write(narration_result["narration_script"])
else:
f.write(f"生成失败: {narration_result['message']}")
logger.info(f"解说文案已保存到: {output_path}")
return output_path
except Exception as e:
logger.error(f"保存解说文案时发生错误: {str(e)}")
return ""
def analyze_subtitle(
subtitle_content: str = None,
subtitle_file_path: str = None,
api_key: Optional[str] = None,
model: Optional[str] = None,
base_url: Optional[str] = None,
custom_prompt: Optional[str] = None,
temperature: float = 1.0,
save_result: bool = False,
output_path: Optional[str] = None,
provider: Optional[str] = None
) -> Dict[str, Any]:
"""
分析字幕内容的便捷函数
Args:
subtitle_content: 字幕内容文本
subtitle_file_path: 字幕文件路径
custom_prompt: 自定义提示词
api_key: API密钥
model: 模型名称
base_url: API基础URL
temperature: 模型温度
save_result: 是否保存结果到文件
output_path: 输出文件路径
provider: 提供商类型
Returns:
Dict[str, Any]: 包含分析结果的字典
"""
# 初始化分析器
analyzer = SubtitleAnalyzer(
temperature=temperature,
api_key=api_key,
model=model,
base_url=base_url,
custom_prompt=custom_prompt,
provider=provider
)
logger.debug(f"使用模型: {analyzer.model} 开始分析, 温度: {analyzer.temperature}")
# 分析字幕
if subtitle_content:
result = analyzer.analyze_subtitle(subtitle_content)
elif subtitle_file_path:
result = analyzer.analyze_subtitle_from_file(subtitle_file_path)
else:
return {
"status": "error",
"message": "必须提供字幕内容或字幕文件路径",
"temperature": temperature
}
# 保存结果
if save_result and result["status"] == "success":
result["output_path"] = analyzer.save_analysis_result(result, output_path)
return result
def generate_narration_script(
short_name: str = None,
plot_analysis: str = None,
subtitle_content: str = None,
api_key: Optional[str] = None,
model: Optional[str] = None,
base_url: Optional[str] = None,
temperature: float = 1.0,
save_result: bool = False,
output_path: Optional[str] = None,
provider: Optional[str] = None
) -> Dict[str, Any]:
"""
根据剧情分析生成解说文案的便捷函数
Args:
short_name: 短剧名称
plot_analysis: 剧情分析内容,直接提供
subtitle_content: 原始字幕内容,用于提供准确的时间戳信息
api_key: API密钥
model: 模型名称
base_url: API基础URL
temperature: 生成温度,控制创造性
save_result: 是否保存结果到文件
output_path: 输出文件路径
provider: 提供商类型
Returns:
Dict[str, Any]: 包含生成结果的字典
"""
# 初始化分析器
analyzer = SubtitleAnalyzer(
temperature=temperature,
api_key=api_key,
model=model,
base_url=base_url,
provider=provider
)
# 生成解说文案
result = analyzer.generate_narration_script(short_name, plot_analysis, subtitle_content or "", temperature)
# 保存结果
if save_result and result["status"] == "success":
result["output_path"] = analyzer.save_narration_script(result, output_path)
return result
if __name__ == '__main__':
text_api_key = "skxxxx"
text_model = "gemini-2.0-flash"
text_base_url = "https://api.narratoai.cn/v1/chat/completions" # 确保URL不以斜杠结尾,便于后续拼接
subtitle_path = "/Users/apple/Desktop/home/NarratoAI/resource/srt/家里家外1-5.srt"
# 示例用法
if subtitle_path:
# 分析字幕总结剧情
analysis_result = analyze_subtitle(
subtitle_file_path=subtitle_path,
api_key=text_api_key,
model=text_model,
base_url=text_base_url,
save_result=True
)
if analysis_result["status"] == "success":
print("字幕分析成功!")
print("分析结果:")
print(analysis_result["analysis"])
# 读取原始字幕内容用于解说脚本生成
with open(subtitle_path, 'r', encoding='utf-8') as f:
subtitle_content = f.read()
# 根据剧情生成解说文案
narration_result = generate_narration_script(
short_name="家里家外",
plot_analysis=analysis_result["analysis"],
subtitle_content=subtitle_content,
api_key=text_api_key,
model=text_model,
base_url=text_base_url,
save_result=True
)
if narration_result["status"] == "success":
print("\n解说文案生成成功!")
print("解说文案:")
print(narration_result["narration_script"])
else:
print(f"\n解说文案生成失败: {narration_result['message']}")
else:
print(f"分析失败: {analysis_result['message']}")
================================================
FILE: app/services/SDP/generate_script_short.py
================================================
"""
视频脚本生成pipeline,串联各个处理步骤
"""
from typing import Any, Dict, Optional
from loguru import logger
from .utils.step1_subtitle_analyzer_openai import analyze_subtitle
from .utils.step5_merge_script import merge_script
from app.services.upload_validation import InputValidationError, resolve_subtitle_input
def generate_script_result(
api_key: str,
model_name: str,
output_path: str,
base_url: str = None,
custom_clips: int = 5,
provider: str = None,
*,
srt_path: Optional[str] = None,
subtitle_content: Optional[str] = None,
subtitle_file_path: Optional[str] = None,
) -> Dict[str, Any]:
"""生成视频混剪脚本(安全版本,返回结果字典)
Args:
api_key: API密钥
model_name: 模型名称
output_path: 输出文件路径
base_url: API基础URL,可选
custom_clips: 自定义片段数量,默认5
provider: LLM服务提供商,可选
srt_path: 字幕文件路径(向后兼容)
subtitle_content: 字幕文本内容
subtitle_file_path: 字幕文件路径(推荐)
Returns:
Dict[str, Any]:
成功: {"status": "success", "script": [...]}
失败: {"status": "error", "message": "错误信息"}
"""
try:
# 解析字幕输入源(支持内容或文件路径)
resolved_content, resolved_path = resolve_subtitle_input(
subtitle_content=subtitle_content,
subtitle_file_path=subtitle_file_path,
srt_path=srt_path,
)
logger.info("开始分析字幕内容...")
openai_analysis = analyze_subtitle(
model_name=model_name,
api_key=api_key,
base_url=base_url,
custom_clips=custom_clips,
provider=provider,
srt_path=resolved_path,
subtitle_content=resolved_content,
)
adjusted_results = openai_analysis['plot_points']
final_script = merge_script(adjusted_results, output_path)
return {"status": "success", "script": final_script}
except InputValidationError as e:
logger.error(f"输入验证失败: {e}")
return {"status": "error", "message": str(e)}
except Exception as e:
logger.exception(f"SDP 脚本生成失败: {e}")
return {"status": "error", "message": f"生成脚本失败: {str(e)}"}
def generate_script(
srt_path: Optional[str] = None,
api_key: str = None,
model_name: str = None,
output_path: str = None,
base_url: str = None,
custom_clips: int = 5,
provider: str = None,
*,
subtitle_content: Optional[str] = None,
subtitle_file_path: Optional[str] = None,
):
"""生成视频混剪脚本(向后兼容版本)
Args:
srt_path: 字幕文件路径(向后兼容参数,可选)
api_key: API密钥
model_name: 模型名称
output_path: 输出文件路径
base_url: API基础URL,可选
custom_clips: 自定义片段数量,默认5
provider: LLM服务提供商,可选
subtitle_content: 字幕文本内容(可选)
subtitle_file_path: 字幕文件路径(推荐使用,可选)
Returns:
str: 生成的脚本内容
Raises:
FileNotFoundError: 字幕文件不存在(向后兼容)
ValueError: 输入验证失败或脚本生成失败
"""
result = generate_script_result(
api_key=api_key,
model_name=model_name,
output_path=output_path,
base_url=base_url,
custom_clips=custom_clips,
provider=provider,
srt_path=srt_path,
subtitle_content=subtitle_content,
subtitle_file_path=subtitle_file_path,
)
if result.get("status") != "success":
error_message = result.get("message", "生成脚本失败")
# 保持向后兼容:如果是文件不存在错误,抛出 FileNotFoundError
if "不存在" in error_message and (srt_path or subtitle_file_path):
raise FileNotFoundError(error_message)
raise ValueError(error_message)
return result["script"]
================================================
FILE: app/services/SDP/utils/short_schema.py
================================================
"""
定义项目中使用的数据类型
"""
from typing import List, Dict, Optional
from dataclasses import dataclass
@dataclass
class PlotPoint:
timestamp: str
title: str
picture: str
@dataclass
class Commentary:
timestamp: str
title: str
copywriter: str
@dataclass
class SubtitleSegment:
start_time: float
end_time: float
text: str
@dataclass
class ScriptItem:
timestamp: str
title: str
picture: str
copywriter: str
@dataclass
class PipelineResult:
output_video_path: str
plot_points: List[PlotPoint]
subtitle_segments: List[SubtitleSegment]
commentaries: List[Commentary]
final_script: List[ScriptItem]
error: Optional[str] = None
class VideoProcessingError(Exception):
pass
class SubtitleProcessingError(Exception):
pass
class PlotAnalysisError(Exception):
pass
class CopywritingError(Exception):
pass
================================================
FILE: app/services/SDP/utils/step1_subtitle_analyzer_openai.py
================================================
"""
使用统一LLM服务,分析字幕文件,返回剧情梗概和爆点
"""
import traceback
import json
from loguru import logger
from app.services.subtitle_text import has_timecodes, normalize_subtitle_text, read_subtitle_text
# 导入新的提示词管理系统
from app.services.prompts import PromptManager
# 导入统一LLM服务
from app.services.llm.unified_service import UnifiedLLMService
# 导入安全的异步执行函数
from app.services.llm.migration_adapter import _run_async_safely
def analyze_subtitle(
model_name: str,
api_key: str = None,
base_url: str = None,
custom_clips: int = 5,
provider: str = None,
srt_path: str = None,
subtitle_content: str = None
) -> dict:
"""分析字幕内容,返回完整的分析结果
Args:
model_name (str): 大模型名称
api_key (str, optional): 大模型API密钥. Defaults to None.
base_url (str, optional): 大模型API基础URL. Defaults to None.
custom_clips (int): 需要提取的片段数量. Defaults to 5.
provider (str, optional): LLM服务提供商. Defaults to None.
srt_path (str, optional): SRT字幕文件路径(与subtitle_content二选一)
subtitle_content (str, optional): SRT字幕文本内容(与srt_path二选一)
Returns:
dict: 包含剧情梗概和结构化的时间段分析的字典
"""
try:
# 读取并规范化字幕文本(不依赖结构化 SRT 解析,提升兼容性)
if subtitle_content and str(subtitle_content).strip():
normalized_subtitle_text = normalize_subtitle_text(subtitle_content)
source_label = "字幕内容(直接传入)"
elif srt_path:
decoded = read_subtitle_text(srt_path)
normalized_subtitle_text = decoded.text
source_label = f"字幕文件: {srt_path} (encoding: {decoded.encoding})"
else:
raise ValueError("必须提供 srt_path 或 subtitle_content 参数")
# 基础校验:必须有内容且包含可用于定位的时间码
if not normalized_subtitle_text or len(normalized_subtitle_text.strip()) < 10:
error_msg = (
f"字幕来源 [{source_label}] 内容为空或过短。\n"
f"请检查:\n"
f"1. 文件格式是否为标准 SRT\n"
f"2. 文件编码是否为 UTF-8、UTF-16、GBK 或 GB2312\n"
f"3. 文件内容是否为空"
)
logger.error(error_msg)
raise ValueError(error_msg)
if not has_timecodes(normalized_subtitle_text):
error_msg = (
f"字幕来源 [{source_label}] 未检测到有效时间码,无法进行时间段定位。\n"
f"请确保字幕包含类似以下格式的时间轴:\n"
f"00:00:01,000 --> 00:00:02,000\n"
f"(若毫秒分隔符为'.',系统会自动规范化为',')"
)
logger.error(error_msg)
raise ValueError(error_msg)
logger.info(f"成功加载字幕来源 [{source_label}],字符数: {len(normalized_subtitle_text)}")
subtitle_content = normalized_subtitle_text
# 如果没有指定provider,根据model_name推断
if not provider:
if "deepseek" in model_name.lower():
provider = "deepseek"
elif "gpt" in model_name.lower():
provider = "openai"
elif "gemini" in model_name.lower():
provider = "gemini"
else:
provider = "openai" # 默认使用openai
logger.info(f"使用LLM服务分析字幕,提供商: {provider}, 模型: {model_name}")
# 使用新的提示词管理系统
subtitle_analysis_prompt = PromptManager.get_prompt(
category="short_drama_editing",
name="subtitle_analysis",
parameters={
"subtitle_content": subtitle_content,
"custom_clips": custom_clips
}
)
# 使用统一LLM服务生成文本
logger.info("开始分析字幕内容...")
response = _run_async_safely(
UnifiedLLMService.generate_text,
prompt=subtitle_analysis_prompt,
provider=provider,
model=model_name,
api_key=api_key,
base_url=base_url,
temperature=0.1, # 使用较低的温度以获得更稳定的结果
max_tokens=4000
)
# 解析JSON响应
from webui.tools.generate_short_summary import parse_and_fix_json
summary_data = parse_and_fix_json(response)
if not summary_data:
raise Exception("无法解析LLM返回的JSON数据")
logger.info(f"字幕分析完成,找到 {len(summary_data.get('plot_titles', []))} 个关键情节")
logger.debug(json.dumps(summary_data, indent=4, ensure_ascii=False))
# 构建爆点标题列表
plot_titles_text = ""
logger.info(f"找到 {len(summary_data.get('plot_titles', []))} 个片段")
for i, point in enumerate(summary_data['plot_titles'], 1):
plot_titles_text += f"{i}. {point}\n"
# 使用新的提示词管理系统
plot_extraction_prompt = PromptManager.get_prompt(
category="short_drama_editing",
name="plot_extraction",
parameters={
"subtitle_content": subtitle_content,
"plot_summary": summary_data['summary'],
"plot_titles": plot_titles_text
}
)
# 使用统一LLM服务进行爆点时间段分析
logger.info("开始分析爆点时间段...")
response = _run_async_safely(
UnifiedLLMService.generate_text,
prompt=plot_extraction_prompt,
provider=provider,
model=model_name,
api_key=api_key,
base_url=base_url,
temperature=0.1,
max_tokens=4000
)
# 解析JSON响应
plot_data = parse_and_fix_json(response)
if not plot_data:
raise Exception("无法解析爆点分析的JSON数据")
logger.info(f"爆点分析完成,找到 {len(plot_data.get('plot_points', []))} 个时间段")
# 合并结果
result = {
"summary": summary_data.get("summary", ""),
"plot_titles": summary_data.get("plot_titles", []),
"plot_points": plot_data.get("plot_points", [])
}
return result
except Exception as e:
logger.error(f"分析字幕时发生错误: {str(e)}")
raise Exception(f"分析字幕时发生错误:{str(e)}\n{traceback.format_exc()}")
================================================
FILE: app/services/SDP/utils/step5_merge_script.py
================================================
"""
合并生成最终脚本
"""
import os
import json
from typing import Dict, List
def merge_script(
plot_points: List[Dict],
output_path: str
):
"""合并生成最终脚本
Args:
plot_points: 校对后的剧情点
output_path: 输出文件路径,如果提供则保存到文件
Returns:
str: 最终合并的脚本
"""
# 创建包含所有信息的临时列表
final_script = []
# 处理原生画面条目
number = 1
for plot_point in plot_points:
script_item = {
"_id": number,
"timestamp": plot_point["timestamp"],
"picture": plot_point["picture"],
"narration": f"播放原生_{os.urandom(4).hex()}",
"OST": 1, # OST=0 仅保留解说 OST=2 保留解说和原声
}
final_script.append(script_item)
number += 1
# 保存结果
if not output_path or not str(output_path).strip():
raise ValueError("output_path不能为空")
output_path = str(output_path)
os.makedirs(os.path.dirname(output_path) or ".", exist_ok=True)
with open(output_path, 'w', encoding='utf-8') as f:
json.dump(final_script, f, ensure_ascii=False, indent=4)
print(f"脚本生成完成:{output_path}")
return final_script
================================================
FILE: app/services/SDP/utils/utils.py
================================================
# 公共方法
import json
import requests # 新增
import pysrt
from loguru import logger
from typing import List, Dict
def load_srt(file_path: str) -> List[Dict]:
"""加载并解析SRT文件(使用 pysrt 库,支持多种编码和格式)
Args:
file_path: SRT文件路径
Returns:
字幕内容列表,格式:
[
{
'number': int, # 字幕序号
'timestamp': str, # "00:00:01,000 --> 00:00:03,000"
'text': str, # 字幕文本
'start_time': str, # "00:00:01,000"
'end_time': str # "00:00:03,000"
},
...
]
Raises:
FileNotFoundError: 文件不存在
ValueError: 文件编码不支持或格式错误
"""
# 编码自动检测:依次尝试常见编码
encodings = ['utf-8', 'utf-8-sig', 'gbk', 'gb2312']
subs = None
detected_encoding = None
for encoding in encodings:
try:
subs = pysrt.open(file_path, encoding=encoding)
detected_encoding = encoding
logger.info(f"成功加载字幕文件 {file_path},编码:{encoding},共 {len(subs)} 条")
break
except UnicodeDecodeError:
continue
except Exception as e:
logger.warning(f"使用编码 {encoding} 加载失败: {e}")
continue
if subs is None:
# 所有编码都失败
raise ValueError(
f"无法读取字幕文件 {file_path},"
f"请检查文件编码(支持 UTF-8、GBK、GB2312)"
)
# 检查是否为空
if not subs:
logger.warning(f"字幕文件 {file_path} 解析后无有效内容")
return []
# 转换为原格式(向后兼容)
subtitles = []
for sub in subs:
# 合并多行文本为单行(某些 SRT 文件会有换行)
text = sub.text.replace('\n', ' ').strip()
# 跳过空字幕
if not text:
continue
subtitles.append({
'number': sub.index,
'timestamp': f"{sub.start} --> {sub.end}",
'text': text,
'start_time': str(sub.start),
'end_time': str(sub.end)
})
logger.info(f"成功解析 {len(subtitles)} 条有效字幕")
return subtitles
def load_srt_from_content(srt_content: str) -> List[Dict]:
"""从字符串内容解析SRT(用于直接传入字幕内容,无需依赖文件路径)
Args:
srt_content: SRT格式的字幕文本内容
Returns:
字幕内容列表,格式同 load_srt 函数
Raises:
ValueError: 字幕内容为空或格式错误
"""
if srt_content is None or not str(srt_content).strip():
raise ValueError("字幕内容为空")
try:
subs = pysrt.from_string(str(srt_content))
except Exception as e:
logger.error(f"无法解析字幕内容: {e}")
raise ValueError("无法解析字幕内容,请确保为标准 SRT 格式") from e
if not subs:
logger.warning("字幕内容解析后无有效内容")
return []
subtitles = []
for sub in subs:
text = sub.text.replace('\n', ' ').strip()
if not text:
continue
subtitles.append({
'number': sub.index,
'timestamp': f"{sub.start} --> {sub.end}",
'text': text,
'start_time': str(sub.start),
'end_time': str(sub.end)
})
logger.info(f"成功从内容解析 {len(subtitles)} 条有效字幕")
return subtitles
================================================
FILE: app/services/__init__.py
================================================
================================================
FILE: app/services/audio_merger.py
================================================
import os
import json
import subprocess
import edge_tts
from edge_tts import submaker
from pydub import AudioSegment
from typing import List, Dict
from loguru import logger
from app.utils import utils
def check_ffmpeg():
"""检查FFmpeg是否已安装"""
try:
subprocess.run(['ffmpeg', '-version'], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
return True
except FileNotFoundError:
return False
def merge_audio_files(task_id: str, total_duration: float, list_script: list):
"""
合并音频文件
Args:
task_id: 任务ID
total_duration: 总时长
list_script: 完整脚本信息,包含duration时长和audio路径
Returns:
str: 合并后的音频文件路径
"""
# 检查FFmpeg是否安装
if not check_ffmpeg():
logger.error("FFmpeg未安装,无法合并音频文件")
return None
# 创建一个空的音频片段
final_audio = AudioSegment.silent(duration=total_duration * 1000) # 总时长以毫秒为单位
# 计算每个片段的开始位置(基于duration字段)
current_position = 0 # 初始位置(秒)
# 遍历脚本中的每个片段
for segment in list_script:
try:
# 获取片段时长(秒)
duration = segment['duration']
# 检查audio字段是否为空
if segment['audio'] and os.path.exists(segment['audio']):
# 加载TTS音频文件
tts_audio = AudioSegment.from_file(segment['audio'])
# 将TTS音频添加到最终音频
final_audio = final_audio.overlay(tts_audio, position=current_position * 1000)
else:
# audio为空,不添加音频,仅保留间隔
logger.info(f"片段 {segment.get('timestamp', '')} 没有音频文件,保留 {duration} 秒的间隔")
# 更新下一个片段的开始位置
current_position += duration
except Exception as e:
logger.error(f"处理音频片段时出错: {str(e)}")
# 即使处理失败,也要更新位置,确保后续片段位置正确
if 'duration' in segment:
current_position += segment['duration']
continue
# 保存合并后的音频文件
output_audio_path = os.path.join(utils.task_dir(task_id), "merger_audio.mp3")
final_audio.export(output_audio_path, format="mp3")
logger.info(f"合并后的音频文件已保存: {output_audio_path}")
return output_audio_path
def time_to_seconds(time_str):
"""
将时间字符串转换为秒数,支持多种格式:
1. 'HH:MM:SS,mmm' (时:分:秒,毫秒)
2. 'MM:SS,mmm' (分:秒,毫秒)
3. 'SS,mmm' (秒,毫秒)
"""
try:
# 处理毫秒部分
if ',' in time_str:
time_part, ms_part = time_str.split(',')
ms = float(ms_part) / 1000
else:
time_part = time_str
ms = 0
# 分割时间部分
parts = time_part.split(':')
if len(parts) == 3: # HH:MM:SS
h, m, s = map(int, parts)
seconds = h * 3600 + m * 60 + s
elif len(parts) == 2: # MM:SS
m, s = map(int, parts)
seconds = m * 60 + s
else: # SS
seconds = int(parts[0])
return seconds + ms
except (ValueError, IndexError) as e:
logger.error(f"Error parsing time {time_str}: {str(e)}")
return 0.0
def extract_timestamp(filename):
"""
从文件名中提取开始和结束时间戳
例如: "audio_00_06,500-00_24,800.mp3" -> (6.5, 24.8)
"""
try:
# 从文件名中提取时间部分
time_part = filename.split('_', 1)[1].split('.')[0] # 获取 "00_06,500-00_24,800" 部分
start_time, end_time = time_part.split('-') # 分割成开始和结束时间
# 将下划线格式转换回冒号格式
start_time = start_time.replace('_', ':')
end_time = end_time.replace('_', ':')
# 将时间戳转换为秒
start_seconds = time_to_seconds(start_time)
end_seconds = time_to_seconds(end_time)
return start_seconds, end_seconds
except Exception as e:
logger.error(f"Error extracting timestamp from {filename}: {str(e)}")
return 0.0, 0.0
if __name__ == "__main__":
# 示例用法
total_duration = 90
video_script = [
{'picture': '【解说】好的,各位,欢迎回到我的频道!《庆余年 2》刚开播就给了我们一个王炸!范闲在北齐"死"了?这怎么可能!',
'timestamp': '00:00:00-00:00:26',
'narration': '好的各位,欢迎回到我的频道!《庆余年 2》刚开播就给了我们一个王炸!范闲在北齐"死"了?这怎么可能!上集片尾那个巨大的悬念,这一集就立刻揭晓了!范闲假死归来,他面临的第一个,也是最大的难关,就是如何面对他最敬爱的,同时也是最可怕的那个人——庆帝!',
'OST': 0, 'duration': 26,
'audio': '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/audio_00_00_00-00_01_15.mp3'},
{'picture': '【解说】上一集我们看到,范闲在北齐遭遇了惊天变故,生死不明!', 'timestamp': '00:01:15-00:01:29',
'narration': '但我们都知道,他绝不可能就这么轻易退场!第二集一开场,范闲就已经秘密回到了京都。他的生死传闻,可不像我们想象中那样只是小范围流传,而是…',
'OST': 0, 'duration': 14,
'audio': '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/audio_00_01_15-00_04_40.mp3'},
{'picture': '画面切到王启年小心翼翼地向范闲汇报。', 'timestamp': '00:04:41-00:04:58',
'narration': '我发现大人的死讯不光是在民间,在官场上也它传开了,所以呢,所以啊,可不是什么好事,将来您跟陛下怎么交代,这可是欺君之罪',
'OST': 1, 'duration': 17,
'audio': ''},
{'picture': '【解说】"欺君之罪"!在封建王朝,这可是抄家灭族的大罪!搁一般人,肯定脚底抹油溜之大吉了。',
'timestamp': '00:04:58-00:05:20',
'narration': '"欺君之罪"!在封建王朝,这可是抄家灭族的大罪!搁一般人,肯定脚底抹油溜之大吉了。但范闲是谁啊?他偏要反其道而行之!他竟然决定,直接去见庆帝!冒着天大的风险,用"假死"这个事实去赌庆帝的态度!',
'OST': 0, 'duration': 22,
'audio': '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/audio_00_04_58-00_05_45.mp3'},
{'picture': '【解说】但想见庆帝,哪有那么容易?范闲艺高人胆大,竟然选择了最激进的方式——闯宫!',
'timestamp': '00:05:45-00:05:53',
'narration': '但想见庆帝,哪有那么容易?范闲艺高人胆大,竟然选择了最激进的方式——闯宫!',
'OST': 0, 'duration': 8,
'audio': '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/audio_00_05_45-00_06_00.mp3'},
{'picture': '画面切换到范闲蒙面闯入皇宫,被侍卫包围的场景。', 'timestamp': '00:06:00-00:06:03',
'narration': '抓刺客',
'OST': 1, 'duration': 3,
'audio': ''}]
output_file = merge_audio_files("test456", total_duration, video_script)
print(output_file)
================================================
FILE: app/services/audio_normalizer.py
================================================
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''
@Project: NarratoAI
@File : audio_normalizer
@Author : Viccy同学
@Date : 2025/1/7
@Description: 音频响度分析和标准化工具
'''
import os
import subprocess
import tempfile
from typing import Optional, Tuple, Dict, Any
from loguru import logger
from moviepy import AudioFileClip
from pydub import AudioSegment
import numpy as np
class AudioNormalizer:
"""音频响度分析和标准化工具"""
def __init__(self):
self.target_lufs = -23.0 # 目标响度 (LUFS),符合广播标准
self.max_peak = -1.0 # 最大峰值 (dBFS)
def analyze_audio_lufs(self, audio_path: str) -> Optional[float]:
"""
使用FFmpeg分析音频的LUFS响度
Args:
audio_path: 音频文件路径
Returns:
float: LUFS值,如果分析失败返回None
"""
if not os.path.exists(audio_path):
logger.error(f"音频文件不存在: {audio_path}")
return None
try:
# 使用FFmpeg的loudnorm滤镜分析音频响度
cmd = [
'ffmpeg', '-hide_banner', '-nostats',
'-i', audio_path,
'-af', 'loudnorm=I=-23:TP=-1:LRA=7:print_format=json',
'-f', 'null', '-'
]
result = subprocess.run(
cmd,
capture_output=True,
text=True,
check=False
)
# 从stderr中提取JSON信息
stderr_lines = result.stderr.split('\n')
json_start = False
json_lines = []
for line in stderr_lines:
if line.strip() == '{':
json_start = True
if json_start:
json_lines.append(line)
if line.strip() == '}':
break
if json_lines:
import json
try:
loudness_data = json.loads('\n'.join(json_lines))
input_i = float(loudness_data.get('input_i', 0))
logger.info(f"音频 {os.path.basename(audio_path)} 的LUFS: {input_i}")
return input_i
except (json.JSONDecodeError, ValueError) as e:
logger.warning(f"解析LUFS数据失败: {e}")
except Exception as e:
logger.error(f"分析音频LUFS失败: {e}")
return None
def get_audio_rms(self, audio_path: str) -> Optional[float]:
"""
计算音频的RMS值作为响度的简单估计
Args:
audio_path: 音频文件路径
Returns:
float: RMS值 (dB),如果计算失败返回None
"""
try:
audio = AudioSegment.from_file(audio_path)
# 转换为numpy数组
samples = np.array(audio.get_array_of_samples())
# 如果是立体声,取平均值
if audio.channels == 2:
samples = samples.reshape((-1, 2))
samples = samples.mean(axis=1)
# 计算RMS
rms = np.sqrt(np.mean(samples**2))
# 转换为dB
if rms > 0:
rms_db = 20 * np.log10(rms / (2**15)) # 假设16位音频
logger.info(f"音频 {os.path.basename(audio_path)} 的RMS: {rms_db:.2f} dB")
return rms_db
else:
return -60.0 # 静音
except Exception as e:
logger.error(f"计算音频RMS失败: {e}")
return None
def normalize_audio_lufs(self, input_path: str, output_path: str,
target_lufs: Optional[float] = None) -> bool:
"""
使用FFmpeg的loudnorm滤镜标准化音频响度
Args:
input_path: 输入音频文件路径
output_path: 输出音频文件路径
target_lufs: 目标LUFS值,默认使用-23.0
Returns:
bool: 是否成功
"""
if target_lufs is None:
target_lufs = self.target_lufs
try:
# 第一遍:分析音频
analyze_cmd = [
'ffmpeg', '-hide_banner', '-nostats',
'-i', input_path,
'-af', f'loudnorm=I={target_lufs}:TP={self.max_peak}:LRA=7:print_format=json',
'-f', 'null', '-'
]
analyze_result = subprocess.run(
analyze_cmd,
capture_output=True,
text=True,
check=False
)
# 解析分析结果
stderr_lines = analyze_result.stderr.split('\n')
json_start = False
json_lines = []
for line in stderr_lines:
if line.strip() == '{':
json_start = True
if json_start:
json_lines.append(line)
if line.strip() == '}':
break
if not json_lines:
logger.warning("无法获取音频分析数据,使用简单标准化")
return self._simple_normalize(input_path, output_path)
import json
loudness_data = json.loads('\n'.join(json_lines))
# 第二遍:应用标准化
normalize_cmd = [
'ffmpeg', '-y', '-hide_banner',
'-i', input_path,
'-af', (
f'loudnorm=I={target_lufs}:TP={self.max_peak}:LRA=7:'
f'measured_I={loudness_data["input_i"]}:'
f'measured_LRA={loudness_data["input_lra"]}:'
f'measured_TP={loudness_data["input_tp"]}:'
f'measured_thresh={loudness_data["input_thresh"]}'
),
'-ar', '44100', # 统一采样率
'-ac', '2', # 统一为立体声
output_path
]
result = subprocess.run(
normalize_cmd,
capture_output=True,
text=True,
check=True
)
logger.info(f"音频标准化完成: {output_path}")
return True
except subprocess.CalledProcessError as e:
logger.error(f"FFmpeg标准化失败: {e}")
return self._simple_normalize(input_path, output_path)
except Exception as e:
logger.error(f"音频标准化失败: {e}")
return False
def _simple_normalize(self, input_path: str, output_path: str) -> bool:
"""
简单的音频标准化(备用方案)
Args:
input_path: 输入音频文件路径
output_path: 输出音频文件路径
Returns:
bool: 是否成功
"""
try:
# 使用pydub进行简单的音量标准化
audio = AudioSegment.from_file(input_path)
# 标准化到-20dB
target_dBFS = -20.0
change_in_dBFS = target_dBFS - audio.dBFS
normalized_audio = audio.apply_gain(change_in_dBFS)
# 导出
normalized_audio.export(output_path, format="mp3", bitrate="128k")
logger.info(f"简单音频标准化完成: {output_path}")
return True
except Exception as e:
logger.error(f"简单音频标准化失败: {e}")
return False
def calculate_volume_adjustment(self, tts_path: str, original_path: str) -> Tuple[float, float]:
"""
计算TTS和原声的音量调整系数,使它们达到相似的响度
Args:
tts_path: TTS音频文件路径
original_path: 原声音频文件路径
Returns:
Tuple[float, float]: (TTS音量系数, 原声音量系数)
"""
# 分析两个音频的响度
tts_lufs = self.analyze_audio_lufs(tts_path)
original_lufs = self.analyze_audio_lufs(original_path)
# 如果LUFS分析失败,使用RMS作为备用
if tts_lufs is None:
tts_lufs = self.get_audio_rms(tts_path)
if original_lufs is None:
original_lufs = self.get_audio_rms(original_path)
if tts_lufs is None or original_lufs is None:
logger.warning("无法分析音频响度,使用默认音量设置")
return 0.7, 1.0 # 默认设置
# 计算调整系数
# 目标:让两个音频达到相似的响度
target_lufs = -20.0 # 目标响度
tts_adjustment = 10 ** ((target_lufs - tts_lufs) / 20)
original_adjustment = 10 ** ((target_lufs - original_lufs) / 20)
# 限制调整范围,避免过度放大
tts_adjustment = max(0.1, min(2.0, tts_adjustment))
original_adjustment = max(0.1, min(3.0, original_adjustment)) # 原声可以放大更多
logger.info(f"音量调整建议 - TTS: {tts_adjustment:.2f}, 原声: {original_adjustment:.2f}")
return tts_adjustment, original_adjustment
def normalize_audio_for_mixing(audio_path: str, output_dir: str,
target_lufs: float = -20.0) -> Optional[str]:
"""
为音频混合准备标准化的音频文件
Args:
audio_path: 输入音频文件路径
output_dir: 输出目录
target_lufs: 目标LUFS值
Returns:
str: 标准化后的音频文件路径,失败返回None
"""
if not os.path.exists(audio_path):
return None
normalizer = AudioNormalizer()
# 生成输出文件名
base_name = os.path.splitext(os.path.basename(audio_path))[0]
output_path = os.path.join(output_dir, f"{base_name}_normalized.mp3")
# 执行标准化
if normalizer.normalize_audio_lufs(audio_path, output_path, target_lufs):
return output_path
else:
return None
if __name__ == "__main__":
# 测试代码
normalizer = AudioNormalizer()
# 测试音频分析
test_audio = "/path/to/test/audio.mp3"
if os.path.exists(test_audio):
lufs = normalizer.analyze_audio_lufs(test_audio)
rms = normalizer.get_audio_rms(test_audio)
print(f"LUFS: {lufs}, RMS: {rms}")
================================================
FILE: app/services/clip_video.py
================================================
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''
@Project: NarratoAI
@File : clip_video
@Author : Viccy同学
@Date : 2025/5/6 下午6:14
'''
import os
import subprocess
import json
import hashlib
from loguru import logger
from typing import Dict, List, Optional
from pathlib import Path
from app.utils import ffmpeg_utils
def parse_timestamp(timestamp: str) -> tuple:
"""
解析时间戳字符串,返回开始和结束时间
Args:
timestamp: 格式为'HH:MM:SS-HH:MM:SS'或'HH:MM:SS,sss-HH:MM:SS,sss'的时间戳字符串
Returns:
tuple: (开始时间, 结束时间) 格式为'HH:MM:SS'或'HH:MM:SS,sss'
"""
start_time, end_time = timestamp.split('-')
return start_time, end_time
def calculate_end_time(start_time: str, duration: float, extra_seconds: float = 1.0) -> str:
"""
根据开始时间和持续时间计算结束时间
Args:
start_time: 开始时间,格式为'HH:MM:SS'或'HH:MM:SS,sss'(带毫秒)
duration: 持续时间,单位为秒
extra_seconds: 额外添加的秒数,默认为1秒
Returns:
str: 计算后的结束时间,格式与输入格式相同
"""
# 检查是否包含毫秒
has_milliseconds = ',' in start_time
milliseconds = 0
if has_milliseconds:
time_part, ms_part = start_time.split(',')
h, m, s = map(int, time_part.split(':'))
milliseconds = int(ms_part)
else:
h, m, s = map(int, start_time.split(':'))
# 转换为总毫秒数
total_milliseconds = ((h * 3600 + m * 60 + s) * 1000 + milliseconds +
int((duration + extra_seconds) * 1000))
# 计算新的时、分、秒、毫秒
ms_new = total_milliseconds % 1000
total_seconds = total_milliseconds // 1000
h_new = int(total_seconds // 3600)
m_new = int((total_seconds % 3600) // 60)
s_new = int(total_seconds % 60)
# 返回与输入格式一致的时间字符串
if has_milliseconds:
return f"{h_new:02d}:{m_new:02d}:{s_new:02d},{ms_new:03d}"
else:
return f"{h_new:02d}:{m_new:02d}:{s_new:02d}"
def check_hardware_acceleration() -> Optional[str]:
"""
检查系统支持的硬件加速选项
Returns:
Optional[str]: 硬件加速参数,如果不支持则返回None
"""
# 使用集中式硬件加速检测
return ffmpeg_utils.get_ffmpeg_hwaccel_type()
def get_safe_encoder_config(hwaccel_type: Optional[str] = None) -> Dict[str, str]:
"""
获取安全的编码器配置,基于ffmpeg_demo.py成功方案优化
Args:
hwaccel_type: 硬件加速类型
Returns:
Dict[str, str]: 编码器配置字典
"""
# 基础配置 - 参考ffmpeg_demo.py的成功方案
config = {
"video_codec": "libx264",
"audio_codec": "aac",
"pixel_format": "yuv420p",
"preset": "medium",
"quality_param": "crf", # 质量参数类型
"quality_value": "23" # 质量值
}
# 根据硬件加速类型调整配置(简化版本)
if hwaccel_type in ["nvenc_pure", "nvenc_software", "cuda_careful", "nvenc", "cuda", "cuda_decode"]:
# NVIDIA硬件加速 - 使用ffmpeg_demo.py中验证有效的参数
config["video_codec"] = "h264_nvenc"
config["preset"] = "medium"
config["quality_param"] = "cq" # CQ质量控制,而不是CRF
config["quality_value"] = "23"
config["pixel_format"] = "yuv420p"
elif hwaccel_type == "amf":
# AMD AMF编码器
config["video_codec"] = "h264_amf"
config["preset"] = "balanced"
config["quality_param"] = "qp_i"
config["quality_value"] = "23"
elif hwaccel_type == "qsv":
# Intel QSV编码器
config["video_codec"] = "h264_qsv"
config["preset"] = "medium"
config["quality_param"] = "global_quality"
config["quality_value"] = "23"
elif hwaccel_type == "videotoolbox":
# macOS VideoToolbox编码器
config["video_codec"] = "h264_videotoolbox"
config["preset"] = "medium"
config["quality_param"] = "b:v"
config["quality_value"] = "5M"
else:
# 软件编码(默认)
config["video_codec"] = "libx264"
config["preset"] = "medium"
config["quality_param"] = "crf"
config["quality_value"] = "23"
return config
def build_ffmpeg_command(
input_path: str,
output_path: str,
start_time: str,
end_time: str,
encoder_config: Dict[str, str],
hwaccel_args: List[str] = None
) -> List[str]:
"""
构建优化的ffmpeg命令,基于测试结果使用正确的硬件加速方案
重要发现:对于视频裁剪场景,CUDA硬件解码会导致滤镜链错误,
应该使用纯NVENC编码器(无硬件解码)来获得最佳兼容性
Args:
input_path: 输入视频路径
output_path: 输出视频路径
start_time: 开始时间
end_time: 结束时间
encoder_config: 编码器配置
hwaccel_args: 硬件加速参数
Returns:
List[str]: ffmpeg命令列表
"""
cmd = ["ffmpeg", "-y"]
# 关键修正:对于视频裁剪,不使用CUDA硬件解码,只使用NVENC编码器
# 这样能避免滤镜链格式转换错误,同时保持编码性能优势
if encoder_config["video_codec"] == "h264_nvenc":
# 不添加硬件解码参数,让FFmpeg自动处理
# 这避免了 "Impossible to convert between the formats" 错误
pass
elif hwaccel_args:
# 对于其他编码器,可以使用硬件解码参数
cmd.extend(hwaccel_args)
# 输入文件
cmd.extend(["-i", input_path])
# 时间范围
cmd.extend(["-ss", start_time, "-to", end_time])
# 编码器设置
cmd.extend(["-c:v", encoder_config["video_codec"]])
cmd.extend(["-c:a", encoder_config["audio_codec"]])
# 像素格式
cmd.extend(["-pix_fmt", encoder_config["pixel_format"]])
# 质量和预设参数 - 针对NVENC优化
if encoder_config["video_codec"] == "h264_nvenc":
# 纯NVENC编码器配置(无硬件解码,兼容性最佳)
cmd.extend(["-preset", encoder_config["preset"]])
cmd.extend(["-cq", encoder_config["quality_value"]])
cmd.extend(["-profile:v", "main"]) # 提高兼容性
logger.debug("使用纯NVENC编码器(无硬件解码,避免滤镜链问题)")
elif encoder_config["video_codec"] == "h264_amf":
# AMD AMF编码器
cmd.extend(["-quality", encoder_config["preset"]])
cmd.extend(["-qp_i", encoder_config["quality_value"]])
elif encoder_config["video_codec"] == "h264_qsv":
# Intel QSV编码器
cmd.extend(["-preset", encoder_config["preset"]])
cmd.extend(["-global_quality", encoder_config["quality_value"]])
elif encoder_config["video_codec"] == "h264_videotoolbox":
# macOS VideoToolbox编码器
cmd.extend(["-profile:v", "high"])
cmd.extend(["-b:v", encoder_config["quality_value"]])
else:
# 软件编码器(libx264)
cmd.extend(["-preset", encoder_config["preset"]])
cmd.extend(["-crf", encoder_config["quality_value"]])
# 音频设置
cmd.extend(["-ar", "44100", "-ac", "2"])
# 优化参数
cmd.extend(["-avoid_negative_ts", "make_zero"])
cmd.extend(["-movflags", "+faststart"])
# 输出文件
cmd.append(output_path)
return cmd
def execute_ffmpeg_with_fallback(
cmd: List[str],
timestamp: str,
input_path: str,
output_path: str,
start_time: str,
end_time: str
) -> bool:
"""
执行ffmpeg命令,带有智能fallback机制
Args:
cmd: 主要的ffmpeg命令
timestamp: 时间戳(用于日志)
input_path: 输入路径
output_path: 输出路径
start_time: 开始时间
end_time: 结束时间
Returns:
bool: 是否成功
"""
try:
# logger.debug(f"执行ffmpeg命令: {' '.join(cmd)}")
# 在Windows系统上使用UTF-8编码处理输出
is_windows = os.name == 'nt'
process_kwargs = {
"stdout": subprocess.PIPE,
"stderr": subprocess.PIPE,
"text": True,
"check": True
}
if is_windows:
process_kwargs["encoding"] = 'utf-8'
result = subprocess.run(cmd, **process_kwargs)
# 验证输出文件
if os.path.exists(output_path) and os.path.getsize(output_path) > 0:
# logger.info(f"✓ 视频裁剪成功: {timestamp}")
return True
else:
logger.warning(f"输出文件无效: {output_path}")
return False
except subprocess.CalledProcessError as e:
error_msg = e.stderr if e.stderr else str(e)
logger.warning(f"主要命令失败: {error_msg}")
# 智能错误分析
error_type = analyze_ffmpeg_error(error_msg)
logger.debug(f"错误类型分析: {error_type}")
# 根据错误类型选择fallback策略
if error_type == "filter_chain_error":
logger.info(f"检测到滤镜链错误,尝试兼容性模式: {timestamp}")
return try_compatibility_fallback(input_path, output_path, start_time, end_time, timestamp)
elif error_type == "hardware_error":
logger.info(f"检测到硬件加速错误,尝试软件编码: {timestamp}")
return try_software_fallback(input_path, output_path, start_time, end_time, timestamp)
elif error_type == "encoder_error":
logger.info(f"检测到编码器错误,尝试基本编码: {timestamp}")
return try_basic_fallback(input_path, output_path, start_time, end_time, timestamp)
else:
logger.info(f"尝试通用fallback方案: {timestamp}")
return try_fallback_encoding(input_path, output_path, start_time, end_time, timestamp)
except Exception as e:
logger.error(f"执行ffmpeg命令时发生异常: {str(e)}")
return False
def analyze_ffmpeg_error(error_msg: str) -> str:
"""
分析ffmpeg错误信息,返回错误类型
Args:
error_msg: 错误信息
Returns:
str: 错误类型
"""
error_msg_lower = error_msg.lower()
# 滤镜链错误
if any(keyword in error_msg_lower for keyword in [
"impossible to convert", "filter", "format", "scale", "auto_scale",
"null", "parsed_null", "reinitializing filters"
]):
return "filter_chain_error"
# 硬件加速错误
if any(keyword in error_msg_lower for keyword in [
"cuda", "nvenc", "amf", "qsv", "d3d11va", "dxva2", "videotoolbox",
"hardware", "hwaccel", "gpu", "device"
]):
return "hardware_error"
# 编码器错误
if any(keyword in error_msg_lower for keyword in [
"encoder", "codec", "h264", "libx264", "bitrate", "preset"
]):
return "encoder_error"
# 文件访问错误
if any(keyword in error_msg_lower for keyword in [
"no such file", "permission denied", "access denied", "file not found"
]):
return "file_error"
return "unknown_error"
def try_compatibility_fallback(
input_path: str,
output_path: str,
start_time: str,
end_time: str,
timestamp: str
) -> bool:
"""
尝试兼容性fallback方案(解决滤镜链问题)
Args:
input_path: 输入路径
output_path: 输出路径
start_time: 开始时间
end_time: 结束时间
timestamp: 时间戳
Returns:
bool: 是否成功
"""
# 兼容性模式:避免所有可能的滤镜链问题
fallback_cmd = [
"ffmpeg", "-y", "-hide_banner", "-loglevel", "error",
"-i", input_path,
"-ss", start_time,
"-to", end_time,
"-c:v", "libx264",
"-c:a", "aac",
"-pix_fmt", "yuv420p", # 明确指定像素格式
"-preset", "fast",
"-crf", "23",
"-ar", "44100", "-ac", "2", # 标准化音频
"-avoid_negative_ts", "make_zero",
"-movflags", "+faststart",
"-max_muxing_queue_size", "1024", # 增加缓冲区大小
output_path
]
return execute_simple_command(fallback_cmd, timestamp, "兼容性模式")
def try_software_fallback(
input_path: str,
output_path: str,
start_time: str,
end_time: str,
timestamp: str
) -> bool:
"""
尝试软件编码fallback方案
Args:
input_path: 输入路径
output_path: 输出路径
start_time: 开始时间
end_time: 结束时间
timestamp: 时间戳
Returns:
bool: 是否成功
"""
# 纯软件编码
fallback_cmd = [
"ffmpeg", "-y", "-hide_banner", "-loglevel", "error",
"-i", input_path,
"-ss", start_time,
"-to", end_time,
"-c:v", "libx264",
"-c:a", "aac",
"-pix_fmt", "yuv420p",
"-preset", "fast",
"-crf", "23",
"-ar", "44100", "-ac", "2",
"-avoid_negative_ts", "make_zero",
"-movflags", "+faststart",
output_path
]
return execute_simple_command(fallback_cmd, timestamp, "软件编码")
def try_basic_fallback(
input_path: str,
output_path: str,
start_time: str,
end_time: str,
timestamp: str
) -> bool:
"""
尝试基本编码fallback方案
Args:
input_path: 输入路径
output_path: 输出路径
start_time: 开始时间
end_time: 结束时间
timestamp: 时间戳
Returns:
bool: 是否成功
"""
# 最基本的编码参数
fallback_cmd = [
"ffmpeg", "-y", "-hide_banner", "-loglevel", "error",
"-i", input_path,
"-ss", start_time,
"-to", end_time,
"-c:v", "libx264",
"-c:a", "aac",
"-pix_fmt", "yuv420p",
"-preset", "ultrafast", # 最快速度
"-crf", "28", # 稍微降低质量
"-avoid_negative_ts", "make_zero",
output_path
]
return execute_simple_command(fallback_cmd, timestamp, "基本编码")
def execute_simple_command(cmd: List[str], timestamp: str, method_name: str) -> bool:
"""
执行简单的ffmpeg命令
Args:
cmd: 命令列表
timestamp: 时间戳
method_name: 方法名称
Returns:
bool: 是否成功
"""
try:
logger.debug(f"执行{method_name}命令: {' '.join(cmd)}")
is_windows = os.name == 'nt'
process_kwargs = {
"stdout": subprocess.PIPE,
"stderr": subprocess.PIPE,
"text": True,
"check": True
}
if is_windows:
process_kwargs["encoding"] = 'utf-8'
subprocess.run(cmd, **process_kwargs)
output_path = cmd[-1] # 输出路径总是最后一个参数
if os.path.exists(output_path) and os.path.getsize(output_path) > 0:
logger.info(f"✓ {method_name}成功: {timestamp}")
return True
else:
logger.error(f"{method_name}失败,输出文件无效: {output_path}")
return False
except subprocess.CalledProcessError as e:
error_msg = e.stderr if e.stderr else str(e)
logger.error(f"{method_name}失败: {error_msg}")
return False
except Exception as e:
logger.error(f"{method_name}异常: {str(e)}")
return False
def try_fallback_encoding(
input_path: str,
output_path: str,
start_time: str,
end_time: str,
timestamp: str
) -> bool:
"""
尝试fallback编码方案(通用方案)
Args:
input_path: 输入路径
output_path: 输出路径
start_time: 开始时间
end_time: 结束时间
timestamp: 时间戳
Returns:
bool: 是否成功
"""
# 最简单的软件编码命令
fallback_cmd = [
"ffmpeg", "-y",
"-i", input_path,
"-ss", start_time,
"-to", end_time,
"-c:v", "libx264",
"-c:a", "aac",
"-pix_fmt", "yuv420p",
"-preset", "ultrafast", # 最快速度
"-crf", "28", # 稍微降低质量以提高兼容性
"-avoid_negative_ts", "make_zero",
"-movflags", "+faststart",
output_path
]
return execute_simple_command(fallback_cmd, timestamp, "通用Fallback")
def _process_narration_only_segment(
video_origin_path: str,
script_item: Dict,
tts_map: Dict,
output_dir: str,
encoder_config: Dict,
hwaccel_args: List[str]
) -> Optional[str]:
"""
处理OST=0的纯解说片段
- 根据TTS音频时长动态裁剪
- 移除原声,生成静音视频
"""
_id = script_item["_id"]
timestamp = script_item["timestamp"]
# 获取对应的TTS结果
tts_item = tts_map.get(_id)
if not tts_item:
logger.error(f"未找到片段 {_id} 的TTS结果")
return None
# 解析起始时间,使用TTS音频时长计算结束时间
start_time, _ = parse_timestamp(timestamp)
duration = tts_item["duration"]
calculated_end_time = calculate_end_time(start_time, duration, extra_seconds=0)
# 转换为FFmpeg兼容的时间格式
ffmpeg_start_time = start_time.replace(',', '.')
ffmpeg_end_time = calculated_end_time.replace(',', '.')
# 生成输出文件名
safe_start_time = start_time.replace(':', '-').replace(',', '-')
safe_end_time = calculated_end_time.replace(':', '-').replace(',', '-')
output_filename = f"ost0_vid_{safe_start_time}@{safe_end_time}.mp4"
output_path = os.path.join(output_dir, output_filename)
# 构建FFmpeg命令 - 移除音频
cmd = _build_ffmpeg_command_with_audio_control(
video_origin_path, output_path, ffmpeg_start_time, ffmpeg_end_time,
encoder_config, hwaccel_args, remove_audio=True
)
# 执行命令
success = execute_ffmpeg_with_fallback(
cmd, timestamp, video_origin_path, output_path,
ffmpeg_start_time, ffmpeg_end_time
)
return output_path if success else None
def _process_original_audio_segment(
video_origin_path: str,
script_item: Dict,
output_dir: str,
encoder_config: Dict,
hwaccel_args: List[str]
) -> Optional[str]:
"""
处理OST=1的纯原声片段
- 严格按照脚本timestamp精确裁剪
- 保持原声不变
"""
_id = script_item["_id"]
timestamp = script_item["timestamp"]
# 严格按照timestamp进行裁剪
start_time, end_time = parse_timestamp(timestamp)
# 转换为FFmpeg兼容的时间格式
ffmpeg_start_time = start_time.replace(',', '.')
ffmpeg_end_time = end_time.replace(',', '.')
# 生成输出文件名
safe_start_time = start_time.replace(':', '-').replace(',', '-')
safe_end_time = end_time.replace(':', '-').replace(',', '-')
output_filename = f"ost1_vid_{safe_start_time}@{safe_end_time}.mp4"
output_path = os.path.join(output_dir, output_filename)
# 构建FFmpeg命令 - 保持原声
cmd = _build_ffmpeg_command_with_audio_control(
video_origin_path, output_path, ffmpeg_start_time, ffmpeg_end_time,
encoder_config, hwaccel_args, remove_audio=False
)
# 执行命令
success = execute_ffmpeg_with_fallback(
cmd, timestamp, video_origin_path, output_path,
ffmpeg_start_time, ffmpeg_end_time
)
return output_path if success else None
def _process_mixed_segment(
video_origin_path: str,
script_item: Dict,
tts_map: Dict,
output_dir: str,
encoder_config: Dict,
hwaccel_args: List[str]
) -> Optional[str]:
"""
处理OST=2的解说+原声混合片段
- 根据TTS音频时长动态裁剪
- 保持原声,确保视频时长等于TTS音频时长
"""
_id = script_item["_id"]
timestamp = script_item["timestamp"]
# 获取对应的TTS结果
tts_item = tts_map.get(_id)
if not tts_item:
logger.error(f"未找到片段 {_id} 的TTS结果")
return None
# 解析起始时间,使用TTS音频时长计算结束时间
start_time, _ = parse_timestamp(timestamp)
duration = tts_item["duration"]
calculated_end_time = calculate_end_time(start_time, duration, extra_seconds=0)
# 转换为FFmpeg兼容的时间格式
ffmpeg_start_time = start_time.replace(',', '.')
ffmpeg_end_time = calculated_end_time.replace(',', '.')
# 生成输出文件名
safe_start_time = start_time.replace(':', '-').replace(',', '-')
safe_end_time = calculated_end_time.replace(':', '-').replace(',', '-')
output_filename = f"ost2_vid_{safe_start_time}@{safe_end_time}.mp4"
output_path = os.path.join(output_dir, output_filename)
# 构建FFmpeg命令 - 保持原声
cmd = _build_ffmpeg_command_with_audio_control(
video_origin_path, output_path, ffmpeg_start_time, ffmpeg_end_time,
encoder_config, hwaccel_args, remove_audio=False
)
# 执行命令
success = execute_ffmpeg_with_fallback(
cmd, timestamp, video_origin_path, output_path,
ffmpeg_start_time, ffmpeg_end_time
)
return output_path if success else None
def _build_ffmpeg_command_with_audio_control(
input_path: str,
output_path: str,
start_time: str,
end_time: str,
encoder_config: Dict[str, str],
hwaccel_args: List[str] = None,
remove_audio: bool = False
) -> List[str]:
"""
构建支持音频控制的FFmpeg命令
Args:
input_path: 输入视频路径
output_path: 输出视频路径
start_time: 开始时间
end_time: 结束时间
encoder_config: 编码器配置
hwaccel_args: 硬件加速参数
remove_audio: 是否移除音频(OST=0时为True)
Returns:
List[str]: ffmpeg命令列表
"""
cmd = ["ffmpeg", "-y"]
# 硬件加速设置(参考原有逻辑)
if encoder_config["video_codec"] == "h264_nvenc":
# 对于NVENC,不使用硬件解码以避免滤镜链问题
pass
elif hwaccel_args:
cmd.extend(hwaccel_args)
# 输入文件
cmd.extend(["-i", input_path])
# 时间范围
cmd.extend(["-ss", start_time, "-to", end_time])
# 视频编码器设置
cmd.extend(["-c:v", encoder_config["video_codec"]])
# 音频处理
if remove_audio:
# OST=0: 移除音频
cmd.extend(["-an"]) # -an 表示不包含音频流
logger.debug("OST=0: 移除音频流")
else:
# OST=1,2: 保持原声
cmd.extend(["-c:a", encoder_config["audio_codec"]])
cmd.extend(["-ar", "44100", "-ac", "2"])
logger.debug("OST=1/2: 保持原声")
# 像素格式
cmd.extend(["-pix_fmt", encoder_config["pixel_format"]])
# 质量和预设参数(参考原有逻辑)
if encoder_config["video_codec"] == "h264_nvenc":
cmd.extend(["-preset", encoder_config["preset"]])
cmd.extend(["-cq", encoder_config["quality_value"]])
cmd.extend(["-profile:v", "main"])
elif encoder_config["video_codec"] == "h264_amf":
cmd.extend(["-quality", encoder_config["preset"]])
cmd.extend(["-qp_i", encoder_config["quality_value"]])
elif encoder_config["video_codec"] == "h264_qsv":
cmd.extend(["-preset", encoder_config["preset"]])
cmd.extend(["-global_quality", encoder_config["quality_value"]])
elif encoder_config["video_codec"] == "h264_videotoolbox":
cmd.extend(["-profile:v", "high"])
cmd.extend(["-b:v", encoder_config["quality_value"]])
else:
# 软件编码器(libx264)
cmd.extend(["-preset", encoder_config["preset"]])
cmd.extend(["-crf", encoder_config["quality_value"]])
# 优化参数
cmd.extend(["-avoid_negative_ts", "make_zero"])
cmd.extend(["-movflags", "+faststart"])
# 输出文件
cmd.append(output_path)
return cmd
def clip_video_unified(
video_origin_path: str,
script_list: List[Dict],
tts_results: List[Dict],
output_dir: Optional[str] = None,
task_id: Optional[str] = None
) -> Dict[str, str]:
"""
基于OST类型的统一视频裁剪策略 - 消除双重裁剪问题
Args:
video_origin_path: 原始视频的路径
script_list: 完整的脚本列表,包含所有片段信息
tts_results: TTS结果列表,仅包含OST=0和OST=2的片段
output_dir: 输出目录路径,默认为None时会自动生成
task_id: 任务ID,用于生成唯一的输出目录,默认为None时会自动生成
Returns:
Dict[str, str]: 片段ID到裁剪后视频路径的映射
"""
# 检查视频文件是否存在
if not os.path.exists(video_origin_path):
raise FileNotFoundError(f"视频文件不存在: {video_origin_path}")
# 如果未提供task_id,则根据输入生成一个唯一ID
if task_id is None:
content_for_hash = f"{video_origin_path}_{json.dumps(script_list)}"
task_id = hashlib.md5(content_for_hash.encode()).hexdigest()
# 设置输出目录
if output_dir is None:
output_dir = os.path.join(
os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))),
"storage", "temp", "clip_video_unified", task_id
)
# 确保输出目录存在
Path(output_dir).mkdir(parents=True, exist_ok=True)
# 创建TTS结果的快速查找映射
tts_map = {item['_id']: item for item in tts_results}
# 获取硬件加速支持
hwaccel_type = check_hardware_acceleration()
hwaccel_args = []
if hwaccel_type:
hwaccel_args = ffmpeg_utils.get_ffmpeg_hwaccel_args()
hwaccel_info = ffmpeg_utils.get_ffmpeg_hwaccel_info()
logger.info(f"🚀 使用硬件加速: {hwaccel_type} ({hwaccel_info.get('message', '')})")
else:
logger.info("🔧 使用软件编码")
# 获取编码器配置
encoder_config = get_safe_encoder_config(hwaccel_type)
logger.debug(f"编码器配置: {encoder_config}")
# 统计信息
total_clips = len(script_list)
result = {}
failed_clips = []
success_count = 0
logger.info(f"📹 开始统一视频裁剪,总共{total_clips}个片段")
for i, script_item in enumerate(script_list, 1):
_id = script_item.get("_id")
ost = script_item.get("OST", 0)
timestamp = script_item["timestamp"]
logger.info(f"📹 [{i}/{total_clips}] 处理片段 ID:{_id}, OST:{ost}, 时间戳:{timestamp}")
try:
if ost == 0: # 纯解说片段
output_path = _process_narration_only_segment(
video_origin_path, script_item, tts_map, output_dir,
encoder_config, hwaccel_args
)
elif ost == 1: # 纯原声片段
output_path = _process_original_audio_segment(
video_origin_path, script_item, output_dir,
encoder_config, hwaccel_args
)
elif ost == 2: # 解说+原声混合片段
output_path = _process_mixed_segment(
video_origin_path, script_item, tts_map, output_dir,
encoder_config, hwaccel_args
)
else:
logger.warning(f"未知的OST类型: {ost},跳过片段 {_id}")
continue
if output_path and os.path.exists(output_path) and os.path.getsize(output_path) > 0:
result[_id] = output_path
success_count += 1
logger.info(f"✅ [{i}/{total_clips}] 片段处理成功: OST={ost}, ID={_id}")
else:
failed_clips.append(f"ID:{_id}, OST:{ost}")
logger.error(f"❌ [{i}/{total_clips}] 片段处理失败: OST={ost}, ID={_id}")
except Exception as e:
failed_clips.append(f"ID:{_id}, OST:{ost}")
logger.error(f"❌ [{i}/{total_clips}] 片段处理异常: OST={ost}, ID={_id}, 错误: {str(e)}")
# 最终统计
logger.info(f"📊 统一视频裁剪完成: 成功 {success_count}/{total_clips}, 失败 {len(failed_clips)}")
# 检查是否有失败的片段
if failed_clips:
logger.warning(f"⚠️ 以下片段处理失败: {failed_clips}")
if len(failed_clips) == total_clips:
raise RuntimeError("所有视频片段处理都失败了,请检查视频文件和ffmpeg配置")
elif len(failed_clips) > total_clips / 2:
logger.warning(f"⚠️ 超过一半的片段处理失败 ({len(failed_clips)}/{total_clips}),请检查硬件加速配置")
if success_count > 0:
logger.info(f"🎉 统一视频裁剪任务完成! 输出目录: {output_dir}")
return result
def clip_video(
video_origin_path: str,
tts_result: List[Dict],
output_dir: Optional[str] = None,
task_id: Optional[str] = None
) -> Dict[str, str]:
"""
根据时间戳裁剪视频 - 优化版本,增强Windows兼容性和错误处理
Args:
video_origin_path: 原始视频的路径
tts_result: 包含时间戳和持续时间信息的列表
output_dir: 输出目录路径,默认为None时会自动生成
task_id: 任务ID,用于生成唯一的输出目录,默认为None时会自动生成
Returns:
Dict[str, str]: 时间戳到裁剪后视频路径的映射
"""
# 检查视频文件是否存在
if not os.path.exists(video_origin_path):
raise FileNotFoundError(f"视频文件不存在: {video_origin_path}")
# 如果未提供task_id,则根据输入生成一个唯一ID
if task_id is None:
content_for_hash = f"{video_origin_path}_{json.dumps(tts_result)}"
task_id = hashlib.md5(content_for_hash.encode()).hexdigest()
# 设置输出目录
if output_dir is None:
output_dir = os.path.join(
os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))),
"storage", "temp", "clip_video", task_id
)
# 确保输出目录存在
Path(output_dir).mkdir(parents=True, exist_ok=True)
# 获取硬件加速支持
hwaccel_type = check_hardware_acceleration()
hwaccel_args = []
if hwaccel_type:
hwaccel_args = ffmpeg_utils.get_ffmpeg_hwaccel_args()
hwaccel_info = ffmpeg_utils.get_ffmpeg_hwaccel_info()
logger.info(f"🚀 使用硬件加速: {hwaccel_type} ({hwaccel_info.get('message', '')})")
else:
logger.info("🔧 使用软件编码")
# 获取编码器配置
encoder_config = get_safe_encoder_config(hwaccel_type)
logger.debug(f"编码器配置: {encoder_config}")
# 统计信息
total_clips = len(tts_result)
result = {}
failed_clips = []
success_count = 0
logger.info(f"📹 开始裁剪视频,总共{total_clips}个片段")
for i, item in enumerate(tts_result, 1):
_id = item.get("_id", item.get("timestamp", "unknown"))
timestamp = item["timestamp"]
start_time, _ = parse_timestamp(timestamp)
# 根据持续时间计算真正的结束时间(加上1秒余量)
duration = item["duration"]
# 时长合理性检查和修正
if duration <= 0 or duration > 300: # 超过5分钟认为不合理
logger.warning(f"检测到异常时长 {duration}秒,片段: {timestamp}")
# 尝试从时间戳计算实际时长
try:
start_time_str, end_time_str = timestamp.split('-')
# 解析开始时间
if ',' in start_time_str:
time_part, ms_part = start_time_str.split(',')
h1, m1, s1 = map(int, time_part.split(':'))
ms1 = int(ms_part)
else:
h1, m1, s1 = map(int, start_time_str.split(':'))
ms1 = 0
# 解析结束时间
if ',' in end_time_str:
time_part, ms_part = end_time_str.split(',')
h2, m2, s2 = map(int, time_part.split(':'))
ms2 = int(ms_part)
else:
h2, m2, s2 = map(int, end_time_str.split(':'))
ms2 = 0
# 计算实际时长
start_total_ms = (h1 * 3600 + m1 * 60 + s1) * 1000 + ms1
end_total_ms = (h2 * 3600 + m2 * 60 + s2) * 1000 + ms2
actual_duration = (end_total_ms - start_total_ms) / 1000.0
if actual_duration > 0 and actual_duration <= 300:
duration = actual_duration
logger.info(f"使用时间戳计算的实际时长: {duration:.3f}秒")
else:
duration = 5.0 # 默认5秒
logger.warning(f"时间戳计算也异常,使用默认时长: {duration}秒")
except Exception as e:
duration = 5.0 # 默认5秒
logger.warning(f"时长修正失败,使用默认时长: {duration}秒, 错误: {str(e)}")
calculated_end_time = calculate_end_time(start_time, duration)
# 转换为FFmpeg兼容的时间格式(逗号替换为点)
ffmpeg_start_time = start_time.replace(',', '.')
ffmpeg_end_time = calculated_end_time.replace(',', '.')
# 格式化输出文件名(使用连字符替代冒号和逗号)
safe_start_time = start_time.replace(':', '-').replace(',', '-')
safe_end_time = calculated_end_time.replace(':', '-').replace(',', '-')
output_filename = f"vid_{safe_start_time}@{safe_end_time}.mp4"
output_path = os.path.join(output_dir, output_filename)
# 构建FFmpeg命令
ffmpeg_cmd = build_ffmpeg_command(
video_origin_path,
output_path,
ffmpeg_start_time,
ffmpeg_end_time,
encoder_config,
hwaccel_args
)
# 执行FFmpeg命令
logger.info(f"📹 [{i}/{total_clips}] 裁剪视频片段: {timestamp} -> {ffmpeg_start_time}到{ffmpeg_end_time}")
success = execute_ffmpeg_with_fallback(
ffmpeg_cmd,
timestamp,
video_origin_path,
output_path,
ffmpeg_start_time,
ffmpeg_end_time
)
if success:
result[_id] = output_path
success_count += 1
logger.info(f"✅ [{i}/{total_clips}] 片段裁剪成功: {timestamp}")
else:
failed_clips.append(timestamp)
logger.error(f"❌ [{i}/{total_clips}] 片段裁剪失败: {timestamp}")
# 最终统计
logger.info(f"📊 视频裁剪完成: 成功 {success_count}/{total_clips}, 失败 {len(failed_clips)}")
# 检查是否有失败的片段
if failed_clips:
logger.warning(f"⚠️ 以下片段裁剪失败: {failed_clips}")
if len(failed_clips) == total_clips:
raise RuntimeError("所有视频片段裁剪都失败了,请检查视频文件和ffmpeg配置")
elif len(failed_clips) > total_clips / 2:
logger.warning(f"⚠️ 超过一半的片段裁剪失败 ({len(failed_clips)}/{total_clips}),请检查硬件加速配置")
if success_count > 0:
logger.info(f"🎉 视频裁剪任务完成! 输出目录: {output_dir}")
return result
if __name__ == "__main__":
video_origin_path = "/Users/apple/Desktop/home/NarratoAI/resource/videos/qyn2-2无片头片尾.mp4"
tts_result = [{'timestamp': '00:00:00-00:01:15',
'audio_file': '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/audio_00_00_00-00_01_15.mp3',
'subtitle_file': '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/subtitle_00_00_00-00_01_15.srt',
'duration': 25.55,
'text': '好的各位,欢迎回到我的频道!《庆余年 2》刚开播就给了我们一个王炸!范闲在北齐"死"了?这怎么可能!上集片尾那个巨大的悬念,这一集就立刻揭晓了!范闲假死归来,他面临的第一个,也是最大的难关,就是如何面对他最敬爱的,同时也是最可怕的那个人——庆帝!'},
{'timestamp': '00:01:15-00:04:40',
'audio_file': '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/audio_00_01_15-00_04_40.mp3',
'subtitle_file': '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/subtitle_00_01_15-00_04_40.srt',
'duration': 13.488,
'text': '但我们都知道,他绝不可能就这么轻易退场!第二集一开场,范闲就已经秘密回到了京都。他的生死传闻,可不像我们想象中那样只是小范围流传,而是…'},
{'timestamp': '00:04:58-00:05:45',
'audio_file': '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/audio_00_04_58-00_05_45.mp3',
'subtitle_file': '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/subtitle_00_04_58-00_05_45.srt',
'duration': 21.363,
'text': '"欺君之罪"!在封建王朝,这可是抄家灭族的大罪!搁一般人,肯定脚底抹油溜之大吉了。但范闲是谁啊?他偏要反其道而行之!他竟然决定,直接去见庆帝!冒着天大的风险,用"假死"这个事实去赌庆帝的态度!'},
{'timestamp': '00:05:45-00:06:00',
'audio_file': '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/audio_00_05_45-00_06_00.mp3',
'subtitle_file': '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/subtitle_00_05_45-00_06_00.srt',
'duration': 7.675, 'text': '但想见庆帝,哪有那么容易?范闲艺高人胆大,竟然选择了最激进的方式——闯宫!'}]
subclip_path_videos = {
'00:00:00-00:01:15': '/Users/apple/Desktop/home/NarratoAI/storage/temp/clip_video/6e7e343c7592c7d6f9a9636b55000f23/vid-00-00-00-00-01-15.mp4',
'00:01:15-00:04:40': '/Users/apple/Desktop/home/NarratoAI/storage/temp/clip_video/6e7e343c7592c7d6f9a9636b55000f23/vid-00-01-15-00-04-40.mp4',
'00:04:41-00:04:58': '/Users/apple/Desktop/home/NarratoAI/storage/temp/clip_video/6e7e343c7592c7d6f9a9636b55000f23/vid-00-04-41-00-04-58.mp4',
'00:04:58-00:05:45': '/Users/apple/Desktop/home/NarratoAI/storage/temp/clip_video/6e7e343c7592c7d6f9a9636b55000f23/vid-00-04-58-00-05-45.mp4',
'00:05:45-00:06:00': '/Users/apple/Desktop/home/NarratoAI/storage/temp/clip_video/6e7e343c7592c7d6f9a9636b55000f23/vid-00-05-45-00-06-00.mp4',
'00:06:00-00:06:03': '/Users/apple/Desktop/home/NarratoAI/storage/temp/clip_video/6e7e343c7592c7d6f9a9636b55000f23/vid-00-06-00-00-06-03.mp4',
}
# 使用方法示例
try:
result = clip_video(video_origin_path, tts_result, subclip_path_videos)
print("裁剪结果:")
print(json.dumps(result, indent=4, ensure_ascii=False))
except Exception as e:
print(f"发生错误: {e}")
================================================
FILE: app/services/generate_narration_script.py
================================================
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''
@Project: NarratoAI
@File : 生成介绍文案
@Author : Viccy同学
@Date : 2025/5/8 上午11:33
'''
import json
import os
import traceback
import asyncio
from openai import OpenAI
from loguru import logger
# 导入新的LLM服务模块 - 确保提供商被注册
import app.services.llm # 这会触发提供商注册
from app.services.llm.migration_adapter import generate_narration as generate_narration_new
# 导入新的提示词管理系统
from app.services.prompts import PromptManager
def parse_frame_analysis_to_markdown(json_file_path):
"""
解析视频帧分析JSON文件并转换为Markdown格式
:param json_file_path: JSON文件路径
:return: Markdown格式的字符串
"""
# 检查文件是否存在
if not os.path.exists(json_file_path):
return f"错误: 文件 {json_file_path} 不存在"
try:
# 读取JSON文件
with open(json_file_path, 'r', encoding='utf-8') as file:
data = json.load(file)
# 初始化Markdown字符串
markdown = ""
# 获取总结和帧观察数据
summaries = data.get('overall_activity_summaries', [])
frame_observations = data.get('frame_observations', [])
# 按批次组织数据
batch_frames = {}
for frame in frame_observations:
batch_index = frame.get('batch_index')
if batch_index not in batch_frames:
batch_frames[batch_index] = []
batch_frames[batch_index].append(frame)
# 生成Markdown内容
for i, summary in enumerate(summaries, 1):
batch_index = summary.get('batch_index')
time_range = summary.get('time_range', '')
batch_summary = summary.get('summary', '')
markdown += f"## 片段 {i}\n"
markdown += f"- 时间范围:{time_range}\n"
# 添加片段描述
markdown += f"- 片段描述:{batch_summary}\n" if batch_summary else f"- 片段描述:\n"
markdown += "- 详细描述:\n"
# 添加该批次的帧观察详情
frames = batch_frames.get(batch_index, [])
for frame in frames:
timestamp = frame.get('timestamp', '')
observation = frame.get('observation', '')
# 直接使用原始文本,不进行分割
markdown += f" - {timestamp}: {observation}\n" if observation else f" - {timestamp}: \n"
markdown += "\n"
return markdown
except Exception as e:
return f"处理JSON文件时出错: {traceback.format_exc()}"
def generate_narration(markdown_content, api_key, base_url, model):
"""
调用大模型API根据视频帧分析的Markdown内容生成解说文案 - 已重构为使用新的LLM服务架构
:param markdown_content: Markdown格式的视频帧分析内容
:param api_key: API密钥
:param base_url: API基础URL
:param model: 使用的模型名称
:return: 生成的解说文案
"""
try:
# 优先使用新的LLM服务架构
logger.info("使用新的LLM服务架构生成解说文案")
result = generate_narration_new(markdown_content, api_key, base_url, model)
return result
except Exception as e:
logger.warning(f"使用新LLM服务失败,回退到旧实现: {str(e)}")
# 回退到旧的实现以确保兼容性
return _generate_narration_legacy(markdown_content, api_key, base_url, model)
def _generate_narration_legacy(markdown_content, api_key, base_url, model):
"""
旧的解说文案生成实现 - 保留作为备用方案
:param markdown_content: Markdown格式的视频帧分析内容
:param api_key: API密钥
:param base_url: API基础URL
:param model: 使用的模型名称
:return: 生成的解说文案
"""
try:
# 使用新的提示词管理系统构建提示词
prompt = PromptManager.get_prompt(
category="documentary",
name="narration_generation",
parameters={
"video_frame_description": markdown_content
}
)
# 使用OpenAI SDK初始化客户端
client = OpenAI(
api_key=api_key,
base_url=base_url
)
# 使用SDK发送请求
if model not in ["deepseek-reasoner"]:
# deepseek-reasoner 不支持 json 输出
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "你是一名专业的短视频解说文案撰写专家。"},
{"role": "user", "content": prompt}
],
temperature=1.5,
response_format={"type": "json_object"},
)
# 提取生成的文案
if response.choices and len(response.choices) > 0:
narration_script = response.choices[0].message.content
# 打印消耗的tokens
logger.debug(f"消耗的tokens: {response.usage.total_tokens}")
return narration_script
else:
return "生成解说文案失败: 未获取到有效响应"
else:
# 不支持 json 输出,需要多一步处理 ```json ``` 的步骤
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": "你是一名专业的短视频解说文案撰写专家。"},
{"role": "user", "content": prompt}
],
temperature=1.5,
)
# 提取生成的文案
if response.choices and len(response.choices) > 0:
narration_script = response.choices[0].message.content
# 打印消耗的tokens
logger.debug(f"文案消耗的tokens: {response.usage.total_tokens}")
# 清理 narration_script 字符串前后的 ```json ``` 字符串
narration_script = narration_script.replace("```json", "").replace("```", "")
return narration_script
else:
return "生成解说文案失败: 未获取到有效响应"
except Exception as e:
return f"调用API生成解说文案时出错: {traceback.format_exc()}"
if __name__ == '__main__':
text_provider = 'openai'
text_api_key = "sk-xxx"
text_model = "deepseek-reasoner"
text_base_url = "https://api.deepseek.com"
video_frame_description_path = "/Users/apple/Desktop/home/NarratoAI/storage/temp/analysis/frame_analysis_20250508_1139.json"
# 测试新的JSON文件
test_file_path = "/Users/apple/Desktop/home/NarratoAI/storage/temp/analysis/frame_analysis_20250508_2258.json"
markdown_output = parse_frame_analysis_to_markdown(test_file_path)
# print(markdown_output)
# 输出到文件以便检查格式
output_file = "/Users/apple/Desktop/home/NarratoAI/storage/temp/家里家外1-5.md"
with open(output_file, 'w', encoding='utf-8') as f:
f.write(markdown_output)
# print(f"\n已将Markdown输出保存到: {output_file}")
# # 生成解说文案
# narration = generate_narration(
# markdown_output,
# text_api_key,
# base_url=text_base_url,
# model=text_model
# )
#
# # 保存解说文案
# print(narration)
# print(type(narration))
# narration_file = "/Users/apple/Desktop/home/NarratoAI/storage/temp/final_narration_script.json"
# with open(narration_file, 'w', encoding='utf-8') as f:
# f.write(narration)
# print(f"\n已将解说文案保存到: {narration_file}")
================================================
FILE: app/services/generate_video.py
================================================
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''
@Project: NarratoAI
@File : generate_video
@Author : Viccy同学
@Date : 2025/5/7 上午11:55
'''
import os
import traceback
import tempfile
from typing import Optional, Dict, Any
from loguru import logger
from moviepy import (
VideoFileClip,
AudioFileClip,
CompositeAudioClip,
CompositeVideoClip,
TextClip,
afx
)
from moviepy.video.tools.subtitles import SubtitlesClip
from PIL import ImageFont
from app.utils import utils
from app.models.schema import AudioVolumeDefaults
from app.services.audio_normalizer import AudioNormalizer, normalize_audio_for_mixing
def is_valid_subtitle_file(subtitle_path: str) -> bool:
"""
检查字幕文件是否有效
参数:
subtitle_path: 字幕文件路径
返回:
bool: 如果字幕文件存在且包含有效内容则返回True,否则返回False
"""
if not subtitle_path or not os.path.exists(subtitle_path):
return False
try:
with open(subtitle_path, 'r', encoding='utf-8') as f:
content = f.read().strip()
# 检查文件是否为空
if not content:
return False
# 检查是否包含时间戳格式(SRT格式的基本特征)
# SRT格式应该包含类似 "00:00:00,000 --> 00:00:00,000" 的时间戳
import re
time_pattern = r'\d{2}:\d{2}:\d{2},\d{3}\s*-->\s*\d{2}:\d{2}:\d{2},\d{3}'
if not re.search(time_pattern, content):
return False
return True
except Exception as e:
logger.warning(f"检查字幕文件时出错: {str(e)}")
return False
def merge_materials(
video_path: str,
audio_path: str,
output_path: str,
subtitle_path: Optional[str] = None,
bgm_path: Optional[str] = None,
options: Optional[Dict[str, Any]] = None
) -> str:
"""
合并视频、音频、BGM和字幕素材生成最终视频
参数:
video_path: 视频文件路径
audio_path: 音频文件路径
output_path: 输出文件路径
subtitle_path: 字幕文件路径,可选
bgm_path: 背景音乐文件路径,可选
options: 其他选项配置,可包含以下字段:
- voice_volume: 人声音量,默认1.0
- bgm_volume: 背景音乐音量,默认0.3
- original_audio_volume: 原始音频音量,默认0.0
- keep_original_audio: 是否保留原始音频,默认False
- subtitle_font: 字幕字体,默认None,系统会使用默认字体
- subtitle_font_size: 字幕字体大小,默认40
- subtitle_color: 字幕颜色,默认白色
- subtitle_bg_color: 字幕背景颜色,默认透明
- subtitle_position: 字幕位置,可选值'bottom', 'top', 'center',默认'bottom'
- custom_position: 自定义位置
- stroke_color: 描边颜色,默认黑色
- stroke_width: 描边宽度,默认1
- threads: 处理线程数,默认2
- fps: 输出帧率,默认30
- subtitle_enabled: 是否启用字幕,默认True
返回:
输出视频的路径
"""
# 合并选项默认值
if options is None:
options = {}
# 设置默认参数值 - 使用统一的音量配置
voice_volume = options.get('voice_volume', AudioVolumeDefaults.VOICE_VOLUME)
bgm_volume = options.get('bgm_volume', AudioVolumeDefaults.BGM_VOLUME)
# 修复bug: 将原声音量默认值从0.0改为0.7,确保短剧解说模式下原片音量正常
original_audio_volume = options.get('original_audio_volume', AudioVolumeDefaults.ORIGINAL_VOLUME)
keep_original_audio = options.get('keep_original_audio', True) # 默认保留原声
subtitle_font = options.get('subtitle_font', '')
subtitle_font_size = options.get('subtitle_font_size', 40)
subtitle_color = options.get('subtitle_color', '#FFFFFF')
subtitle_bg_color = options.get('subtitle_bg_color', 'transparent')
subtitle_position = options.get('subtitle_position', 'bottom')
custom_position = options.get('custom_position', 70)
stroke_color = options.get('stroke_color', '#000000')
stroke_width = options.get('stroke_width', 1)
threads = options.get('threads', 2)
fps = options.get('fps', 30)
subtitle_enabled = options.get('subtitle_enabled', True)
# 配置日志 - 便于调试问题
logger.info(f"音量配置详情:")
logger.info(f" - 配音音量: {voice_volume}")
logger.info(f" - 背景音乐音量: {bgm_volume}")
logger.info(f" - 原声音量: {original_audio_volume}")
logger.info(f" - 是否保留原声: {keep_original_audio}")
logger.info(f"字幕配置详情:")
logger.info(f" - 是否启用字幕: {subtitle_enabled}")
logger.info(f" - 字幕文件路径: {subtitle_path}")
# 音量参数验证
def validate_volume(volume, name):
if not (AudioVolumeDefaults.MIN_VOLUME <= volume <= AudioVolumeDefaults.MAX_VOLUME):
logger.warning(f"{name}音量 {volume} 超出有效范围 [{AudioVolumeDefaults.MIN_VOLUME}, {AudioVolumeDefaults.MAX_VOLUME}],将被限制")
return max(AudioVolumeDefaults.MIN_VOLUME, min(volume, AudioVolumeDefaults.MAX_VOLUME))
return volume
voice_volume = validate_volume(voice_volume, "配音")
bgm_volume = validate_volume(bgm_volume, "背景音乐")
original_audio_volume = validate_volume(original_audio_volume, "原声")
# 处理透明背景色问题 - MoviePy 2.1.1不支持'transparent'值
if subtitle_bg_color == 'transparent':
subtitle_bg_color = None # None在新版MoviePy中表示透明背景
# 创建输出目录(如果不存在)
output_dir = os.path.dirname(output_path)
os.makedirs(output_dir, exist_ok=True)
logger.info(f"开始合并素材...")
logger.info(f" ① 视频: {video_path}")
logger.info(f" ② 音频: {audio_path}")
if subtitle_path:
logger.info(f" ③ 字幕: {subtitle_path}")
if bgm_path:
logger.info(f" ④ 背景音乐: {bgm_path}")
logger.info(f" ⑤ 输出: {output_path}")
# 加载视频
try:
video_clip = VideoFileClip(video_path)
logger.info(f"视频尺寸: {video_clip.size[0]}x{video_clip.size[1]}, 时长: {video_clip.duration}秒")
# 提取视频原声(如果需要)
original_audio = None
if keep_original_audio and original_audio_volume > 0:
try:
original_audio = video_clip.audio
if original_audio:
# 关键修复:只有当音量不为1.0时才进行音量调整,保持原声音量不变
if abs(original_audio_volume - 1.0) > 0.001: # 使用小的容差值比较浮点数
original_audio = original_audio.with_effects([afx.MultiplyVolume(original_audio_volume)])
logger.info(f"已提取视频原声,音量调整为: {original_audio_volume}")
else:
logger.info("已提取视频原声,保持原始音量不变")
else:
logger.warning("视频没有音轨,无法提取原声")
except Exception as e:
logger.error(f"提取视频原声失败: {str(e)}")
original_audio = None
# 移除原始音轨,稍后会合并新的音频
video_clip = video_clip.without_audio()
except Exception as e:
logger.error(f"加载视频失败: {str(e)}")
raise
# 处理背景音乐和所有音频轨道合成
audio_tracks = []
# 智能音量调整(可选功能)
if AudioVolumeDefaults.ENABLE_SMART_VOLUME and audio_path and os.path.exists(audio_path) and original_audio is not None:
try:
normalizer = AudioNormalizer()
temp_dir = tempfile.mkdtemp()
temp_original_path = os.path.join(temp_dir, "temp_original.wav")
# 保存原声到临时文件进行分析
original_audio.write_audiofile(temp_original_path, verbose=False, logger=None)
# 计算智能音量调整
tts_adjustment, original_adjustment = normalizer.calculate_volume_adjustment(
audio_path, temp_original_path
)
# 应用智能调整,但保留用户设置的相对比例
smart_voice_volume = voice_volume * tts_adjustment
smart_original_volume = original_audio_volume * original_adjustment
# 限制音量范围,避免过度调整
smart_voice_volume = max(0.1, min(1.5, smart_voice_volume))
smart_original_volume = max(0.1, min(2.0, smart_original_volume))
voice_volume = smart_voice_volume
original_audio_volume = smart_original_volume
logger.info(f"智能音量调整 - TTS: {voice_volume:.2f}, 原声: {original_audio_volume:.2f}")
# 清理临时文件
import shutil
shutil.rmtree(temp_dir)
except Exception as e:
logger.warning(f"智能音量分析失败,使用原始设置: {e}")
# 先添加主音频(配音)
if audio_path and os.path.exists(audio_path):
try:
voice_audio = AudioFileClip(audio_path).with_effects([afx.MultiplyVolume(voice_volume)])
audio_tracks.append(voice_audio)
logger.info(f"已添加配音音频,音量: {voice_volume}")
except Exception as e:
logger.error(f"加载配音音频失败: {str(e)}")
# 添加原声(如果需要)
if original_audio is not None:
# 重新应用调整后的音量(因为original_audio已经应用了一次音量)
# 计算需要的额外调整
current_volume_in_original = 1.0 # original_audio中已应用的音量
additional_adjustment = original_audio_volume / current_volume_in_original
adjusted_original_audio = original_audio.with_effects([afx.MultiplyVolume(additional_adjustment)])
audio_tracks.append(adjusted_original_audio)
logger.info(f"已添加视频原声,最终音量: {original_audio_volume}")
# 添加背景音乐(如果有)
if bgm_path and os.path.exists(bgm_path):
try:
bgm_clip = AudioFileClip(bgm_path).with_effects([
afx.MultiplyVolume(bgm_volume),
afx.AudioFadeOut(3),
afx.AudioLoop(duration=video_clip.duration),
])
audio_tracks.append(bgm_clip)
logger.info(f"已添加背景音乐,音量: {bgm_volume}")
except Exception as e:
logger.error(f"添加背景音乐失败: \n{traceback.format_exc()}")
# 合成最终的音频轨道
if audio_tracks:
final_audio = CompositeAudioClip(audio_tracks)
video_clip = video_clip.with_audio(final_audio)
logger.info(f"已合成所有音频轨道,共{len(audio_tracks)}个")
else:
logger.warning("没有可用的音频轨道,输出视频将没有声音")
# 处理字体路径
font_path = None
if subtitle_path and subtitle_font:
font_path = os.path.join(utils.font_dir(), subtitle_font)
if os.name == "nt":
font_path = font_path.replace("\\", "/")
logger.info(f"使用字体: {font_path}")
# 处理视频尺寸
video_width, video_height = video_clip.size
# 字幕处理函数
def create_text_clip(subtitle_item):
"""创建单个字幕片段"""
phrase = subtitle_item[1]
max_width = video_width * 0.9
# 如果有字体路径,进行文本换行处理
wrapped_txt = phrase
txt_height = 0
if font_path:
wrapped_txt, txt_height = wrap_text(
phrase,
max_width=max_width,
font=font_path,
fontsize=subtitle_font_size
)
# 创建文本片段
try:
_clip = TextClip(
text=wrapped_txt,
font=font_path,
font_size=subtitle_font_size,
color=subtitle_color,
bg_color=subtitle_bg_color, # 这里已经在前面处理过,None表示透明
stroke_color=stroke_color,
stroke_width=stroke_width,
)
except Exception as e:
logger.error(f"创建字幕片段失败: {str(e)}, 使用简化参数重试")
# 如果上面的方法失败,尝试使用更简单的参数
_clip = TextClip(
text=wrapped_txt,
font=font_path,
font_size=subtitle_font_size,
color=subtitle_color,
)
# 设置字幕时间
duration = subtitle_item[0][1] - subtitle_item[0][0]
_clip = _clip.with_start(subtitle_item[0][0])
_clip = _clip.with_end(subtitle_item[0][1])
_clip = _clip.with_duration(duration)
# 设置字幕位置
if subtitle_position == "bottom":
_clip = _clip.with_position(("center", video_height * 0.95 - _clip.h))
elif subtitle_position == "top":
_clip = _clip.with_position(("center", video_height * 0.05))
elif subtitle_position == "custom":
margin = 10
max_y = video_height - _clip.h - margin
min_y = margin
custom_y = (video_height - _clip.h) * (custom_position / 100)
custom_y = max(
min_y, min(custom_y, max_y)
)
_clip = _clip.with_position(("center", custom_y))
else: # center
_clip = _clip.with_position(("center", "center"))
return _clip
# 创建TextClip工厂函数
def make_textclip(text):
return TextClip(
text=text,
font=font_path,
font_size=subtitle_font_size,
color=subtitle_color,
)
# 处理字幕 - 修复字幕开关bug和空字幕文件问题
if subtitle_enabled and subtitle_path:
if is_valid_subtitle_file(subtitle_path):
logger.info("字幕已启用,开始处理字幕文件")
try:
# 加载字幕文件
sub = SubtitlesClip(
subtitles=subtitle_path,
encoding="utf-8",
make_textclip=make_textclip
)
# 创建每个字幕片段
text_clips = []
for item in sub.subtitles:
clip = create_text_clip(subtitle_item=item)
text_clips.append(clip)
# 合成视频和字幕
video_clip = CompositeVideoClip([video_clip, *text_clips])
logger.info(f"已添加{len(text_clips)}个字幕片段")
except Exception as e:
logger.error(f"处理字幕失败: \n{traceback.format_exc()}")
logger.warning("字幕处理失败,继续生成无字幕视频")
else:
logger.warning(f"字幕文件无效或为空: {subtitle_path},跳过字幕处理")
elif not subtitle_enabled:
logger.info("字幕已禁用,跳过字幕处理")
elif not subtitle_path:
logger.info("未提供字幕文件路径,跳过字幕处理")
# 导出最终视频
try:
video_clip.write_videofile(
output_path,
audio_codec="aac",
temp_audiofile_path=output_dir,
threads=threads,
fps=fps,
)
logger.success(f"素材合并完成: {output_path}")
except Exception as e:
logger.error(f"导出视频失败: {str(e)}")
raise
finally:
# 释放资源
video_clip.close()
del video_clip
return output_path
def wrap_text(text, max_width, font="Arial", fontsize=60):
"""
文本换行函数,使长文本适应指定宽度
参数:
text: 需要换行的文本
max_width: 最大宽度(像素)
font: 字体路径
fontsize: 字体大小
返回:
换行后的文本和文本高度
"""
# 创建ImageFont对象
try:
font_obj = ImageFont.truetype(font, fontsize)
except:
# 如果无法加载指定字体,使用默认字体
font_obj = ImageFont.load_default()
def get_text_size(inner_text):
inner_text = inner_text.strip()
left, top, right, bottom = font_obj.getbbox(inner_text)
return right - left, bottom - top
width, height = get_text_size(text)
if width <= max_width:
return text, height
processed = True
_wrapped_lines_ = []
words = text.split(" ")
_txt_ = ""
for word in words:
_before = _txt_
_txt_ += f"{word} "
_width, _height = get_text_size(_txt_)
if _width <= max_width:
continue
else:
if _txt_.strip() == word.strip():
processed = False
break
_wrapped_lines_.append(_before)
_txt_ = f"{word} "
_wrapped_lines_.append(_txt_)
if processed:
_wrapped_lines_ = [line.strip() for line in _wrapped_lines_]
result = "\n".join(_wrapped_lines_).strip()
height = len(_wrapped_lines_) * height
return result, height
_wrapped_lines_ = []
chars = list(text)
_txt_ = ""
for word in chars:
_txt_ += word
_width, _height = get_text_size(_txt_)
if _width <= max_width:
continue
else:
_wrapped_lines_.append(_txt_)
_txt_ = ""
_wrapped_lines_.append(_txt_)
result = "\n".join(_wrapped_lines_).strip()
height = len(_wrapped_lines_) * height
return result, height
if __name__ == '__main__':
merger_mp4 = '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/merger.mp4'
merger_sub = '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/merged_subtitle_00_00_00-00_01_30.srt'
merger_audio = '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/merger_audio.mp3'
bgm_path = '/Users/apple/Desktop/home/NarratoAI/resource/songs/bgm.mp3'
output_video = '/Users/apple/Desktop/home/NarratoAI/storage/tasks/qyn2-2-demo/combined_test.mp4'
# 调用示例
options = {
'voice_volume': 1.0, # 配音音量
'bgm_volume': 0.1, # 背景音乐音量
'original_audio_volume': 1.0, # 视频原声音量,0表示不保留
'keep_original_audio': True, # 是否保留原声
'subtitle_enabled': True, # 是否启用字幕 - 修复字幕开关bug
'subtitle_font': 'MicrosoftYaHeiNormal.ttc', # 这里使用相对字体路径,会自动在 font_dir() 目录下查找
'subtitle_font_size': 40,
'subtitle_color': '#FFFFFF',
'subtitle_bg_color': None, # 直接使用None表示透明背景
'subtitle_position': 'bottom',
'threads': 2
}
try:
merge_materials(
video_path=merger_mp4,
audio_path=merger_audio,
subtitle_path=merger_sub,
bgm_path=bgm_path,
output_path=output_video,
options=options
)
except Exception as e:
logger.error(f"合并素材失败: \n{traceback.format_exc()}")
================================================
FILE: app/services/llm/__init__.py
================================================
"""
NarratoAI 大模型服务模块
统一的大模型服务抽象层,支持多供应商切换和严格的输出格式验证
包含视觉模型和文本生成模型的统一接口
主要组件:
- BaseLLMProvider: 大模型服务提供商基类
- VisionModelProvider: 视觉模型提供商基类
- TextModelProvider: 文本模型提供商基类
- LLMServiceManager: 大模型服务管理器
- OutputValidator: 输出格式验证器
支持的供应商:
视觉模型: Gemini, QwenVL, Siliconflow
文本模型: OpenAI, DeepSeek, Gemini, Qwen, Moonshot, Siliconflow
"""
from .manager import LLMServiceManager
from .base import BaseLLMProvider, VisionModelProvider, TextModelProvider
from .validators import OutputValidator, ValidationError
from .exceptions import LLMServiceError, ProviderNotFoundError, ConfigurationError
# 提供商注册由 webui.py:main() 显式调用(见 LLM 提供商注册机制重构)
# 这样更可靠,错误也更容易调试
__all__ = [
'LLMServiceManager',
'BaseLLMProvider',
'VisionModelProvider',
'TextModelProvider',
'OutputValidator',
'ValidationError',
'LLMServiceError',
'ProviderNotFoundError',
'ConfigurationError'
]
# 版本信息
__version__ = '1.0.0'
================================================
FILE: app/services/llm/base.py
================================================
"""
大模型服务提供商基类定义
定义了统一的大模型服务接口,包括视觉模型和文本生成模型的抽象基类
"""
from abc import ABC, abstractmethod
from typing import List, Dict, Any, Optional, Union
from pathlib import Path
import PIL.Image
from loguru import logger
from .exceptions import LLMServiceError, ConfigurationError
class BaseLLMProvider(ABC):
"""大模型服务提供商基类"""
def __init__(self,
api_key: str,
model_name: str,
base_url: Optional[str] = None,
**kwargs):
"""
初始化大模型服务提供商
Args:
api_key: API密钥
model_name: 模型名称
base_url: API基础URL
**kwargs: 其他配置参数
"""
self.api_key = api_key
self.model_name = model_name
self.base_url = base_url
self.config = kwargs
# 验证必要配置
self._validate_config()
# 初始化提供商特定设置
self._initialize()
@property
@abstractmethod
def provider_name(self) -> str:
"""供应商名称"""
pass
@property
@abstractmethod
def supported_models(self) -> List[str]:
"""支持的模型列表"""
pass
def _validate_config(self):
"""验证配置参数"""
if not self.api_key:
raise ConfigurationError("API密钥不能为空", "api_key")
if not self.model_name:
raise ConfigurationError("模型名称不能为空", "model_name")
# 检查模型支持情况
self._validate_model_support()
def _validate_model_support(self):
"""验证模型支持情况(宽松模式,仅记录警告)"""
from loguru import logger
# LiteLLM 已提供统一的模型验证,传统 provider 使用宽松验证
if self.model_name not in self.supported_models:
logger.warning(
f"模型 {self.model_name} 未在供应商 {self.provider_name} 的预定义支持列表中。"
f"支持的模型列表: {self.supported_models}"
)
def _initialize(self):
"""初始化提供商特定设置,子类可重写"""
pass
@abstractmethod
async def _make_api_call(self, payload: Dict[str, Any]) -> Dict[str, Any]:
"""执行API调用,子类必须实现"""
pass
def _handle_api_error(self, status_code: int, response_text: str) -> LLMServiceError:
"""处理API错误,返回适当的异常"""
from .exceptions import APICallError, RateLimitError, AuthenticationError
if status_code == 401:
return AuthenticationError()
elif status_code == 429:
return RateLimitError()
elif status_code in [502, 503, 504]:
return APICallError(f"服务器错误 HTTP {status_code}", status_code, response_text)
elif status_code == 524:
return APICallError(f"服务器处理超时 HTTP {status_code}", status_code, response_text)
else:
return APICallError(f"HTTP {status_code}", status_code, response_text)
class VisionModelProvider(BaseLLMProvider):
"""视觉模型提供商基类"""
@abstractmethod
async def analyze_images(self,
images: List[Union[str, Path, PIL.Image.Image]],
prompt: str,
batch_size: int = 10,
**kwargs) -> List[str]:
"""
分析图片并返回结果
Args:
images: 图片路径列表或PIL图片对象列表
prompt: 分析提示词
batch_size: 批处理大小
**kwargs: 其他参数
Returns:
分析结果列表
"""
pass
def _prepare_images(self, images: List[Union[str, Path, PIL.Image.Image]]) -> List[PIL.Image.Image]:
"""预处理图片,统一转换为PIL.Image对象"""
processed_images = []
for img in images:
try:
if isinstance(img, (str, Path)):
pil_img = PIL.Image.open(img)
elif isinstance(img, PIL.Image.Image):
pil_img = img
else:
logger.warning(f"不支持的图片类型: {type(img)}")
continue
# 调整图片大小以优化性能
if pil_img.size[0] > 1024 or pil_img.size[1] > 1024:
pil_img.thumbnail((1024, 1024), PIL.Image.Resampling.LANCZOS)
processed_images.append(pil_img)
except Exception as e:
logger.error(f"加载图片失败 {img}: {str(e)}")
continue
return processed_images
class TextModelProvider(BaseLLMProvider):
"""文本生成模型提供商基类"""
@abstractmethod
async def generate_text(self,
prompt: str,
system_prompt: Optional[str] = None,
temperature: float = 1.0,
max_tokens: Optional[int] = None,
response_format: Optional[str] = None,
**kwargs) -> str:
"""
生成文本内容
Args:
prompt: 用户提示词
system_prompt: 系统提示词
temperature: 生成温度
max_tokens: 最大token数
response_format: 响应格式 ('json' 或 None)
**kwargs: 其他参数
Returns:
生成的文本内容
"""
pass
def _build_messages(self, prompt: str, system_prompt: Optional[str] = None) -> List[Dict[str, str]]:
"""构建消息列表"""
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
return messages
================================================
FILE: app/services/llm/config_validator.py
================================================
"""
LLM服务配置验证器
验证大模型服务的配置是否正确,并提供配置建议
"""
from typing import Dict, List, Any, Optional
from loguru import logger
from app.config import config
from .manager import LLMServiceManager
from .exceptions import ConfigurationError
class LLMConfigValidator:
"""LLM服务配置验证器"""
@staticmethod
def validate_all_configs() -> Dict[str, Any]:
"""
验证所有LLM服务配置
Returns:
验证结果字典
"""
results = {
"vision_providers": {},
"text_providers": {},
"summary": {
"total_vision_providers": 0,
"valid_vision_providers": 0,
"total_text_providers": 0,
"valid_text_providers": 0,
"errors": [],
"warnings": []
}
}
# 验证视觉模型提供商
vision_providers = LLMServiceManager.list_vision_providers()
results["summary"]["total_vision_providers"] = len(vision_providers)
for provider in vision_providers:
try:
validation_result = LLMConfigValidator.validate_vision_provider(provider)
results["vision_providers"][provider] = validation_result
if validation_result["is_valid"]:
results["summary"]["valid_vision_providers"] += 1
else:
results["summary"]["errors"].extend(validation_result["errors"])
except Exception as e:
error_msg = f"验证视觉模型提供商 {provider} 时发生错误: {str(e)}"
results["vision_providers"][provider] = {
"is_valid": False,
"errors": [error_msg],
"warnings": []
}
results["summary"]["errors"].append(error_msg)
# 验证文本模型提供商
text_providers = LLMServiceManager.list_text_providers()
results["summary"]["total_text_providers"] = len(text_providers)
for provider in text_providers:
try:
validation_result = LLMConfigValidator.validate_text_provider(provider)
results["text_providers"][provider] = validation_result
if validation_result["is_valid"]:
results["summary"]["valid_text_providers"] += 1
else:
results["summary"]["errors"].extend(validation_result["errors"])
except Exception as e:
error_msg = f"验证文本模型提供商 {provider} 时发生错误: {str(e)}"
results["text_providers"][provider] = {
"is_valid": False,
"errors": [error_msg],
"warnings": []
}
results["summary"]["errors"].append(error_msg)
return results
@staticmethod
def validate_vision_provider(provider_name: str) -> Dict[str, Any]:
"""
验证视觉模型提供商配置
Args:
provider_name: 提供商名称
Returns:
验证结果字典
"""
result = {
"is_valid": False,
"errors": [],
"warnings": [],
"config": {}
}
try:
# 获取配置
config_prefix = f"vision_{provider_name}"
api_key = config.app.get(f'{config_prefix}_api_key')
model_name = config.app.get(f'{config_prefix}_model_name')
base_url = config.app.get(f'{config_prefix}_base_url')
result["config"] = {
"api_key": "***" if api_key else None,
"model_name": model_name,
"base_url": base_url
}
# 验证必需配置
if not api_key:
result["errors"].append(f"缺少API密钥配置: {config_prefix}_api_key")
if not model_name:
result["errors"].append(f"缺少模型名称配置: {config_prefix}_model_name")
# 尝试创建提供商实例
if api_key and model_name:
try:
provider_instance = LLMServiceManager.get_vision_provider(provider_name)
result["is_valid"] = True
logger.debug(f"视觉模型提供商 {provider_name} 配置验证成功")
except Exception as e:
result["errors"].append(f"创建提供商实例失败: {str(e)}")
# 添加警告
if not base_url:
result["warnings"].append(f"未配置base_url,将使用默认值")
except Exception as e:
result["errors"].append(f"配置验证过程中发生错误: {str(e)}")
return result
@staticmethod
def validate_text_provider(provider_name: str) -> Dict[str, Any]:
"""
验证文本模型提供商配置
Args:
provider_name: 提供商名称
Returns:
验证结果字典
"""
result = {
"is_valid": False,
"errors": [],
"warnings": [],
"config": {}
}
try:
# 获取配置
config_prefix = f"text_{provider_name}"
api_key = config.app.get(f'{config_prefix}_api_key')
model_name = config.app.get(f'{config_prefix}_model_name')
base_url = config.app.get(f'{config_prefix}_base_url')
result["config"] = {
"api_key": "***" if api_key else None,
"model_name": model_name,
"base_url": base_url
}
# 验证必需配置
if not api_key:
result["errors"].append(f"缺少API密钥配置: {config_prefix}_api_key")
if not model_name:
result["errors"].append(f"缺少模型名称配置: {config_prefix}_model_name")
# 尝试创建提供商实例
if api_key and model_name:
try:
provider_instance = LLMServiceManager.get_text_provider(provider_name)
result["is_valid"] = True
logger.debug(f"文本模型提供商 {provider_name} 配置验证成功")
except Exception as e:
result["errors"].append(f"创建提供商实例失败: {str(e)}")
# 添加警告
if not base_url:
result["warnings"].append(f"未配置base_url,将使用默认值")
except Exception as e:
result["errors"].append(f"配置验证过程中发生错误: {str(e)}")
return result
@staticmethod
def get_config_suggestions() -> Dict[str, Any]:
"""
获取配置建议
Returns:
配置建议字典
"""
suggestions = {
"vision_providers": {},
"text_providers": {},
"general_tips": [
"确保所有API密钥都已正确配置",
"建议为每个提供商配置base_url以提高稳定性",
"定期检查模型名称是否为最新版本",
"建议配置多个提供商作为备用方案",
"推荐使用 LiteLLM 作为统一接口,支持 100+ providers"
]
}
# 为每个视觉模型提供商提供建议
vision_providers = LLMServiceManager.list_vision_providers()
for provider in vision_providers:
suggestions["vision_providers"][provider] = {
"required_configs": [
f"vision_{provider}_api_key",
f"vision_{provider}_model_name"
],
"optional_configs": [
f"vision_{provider}_base_url"
],
"example_models": LLMConfigValidator._get_example_models(provider, "vision")
}
# 为每个文本模型提供商提供建议
text_providers = LLMServiceManager.list_text_providers()
for provider in text_providers:
suggestions["text_providers"][provider] = {
"required_configs": [
f"text_{provider}_api_key",
f"text_{provider}_model_name"
],
"optional_configs": [
f"text_{provider}_base_url"
],
"example_models": LLMConfigValidator._get_example_models(provider, "text")
}
return suggestions
@staticmethod
def _get_example_models(provider_name: str, model_type: str) -> List[str]:
"""获取示例模型名称"""
examples = {
"gemini": {
"vision": ["gemini-2.5-flash", "gemini-2.0-flash-lite", "gemini-2.0-flash"],
"text": ["gemini-2.5-flash", "gemini-2.0-flash", "gemini-1.5-pro"]
},
"openai": {
"vision": [],
"text": ["gpt-4o", "gpt-4o-mini", "gpt-4-turbo"]
},
"qwen": {
"vision": ["qwen2.5-vl-32b-instruct"],
"text": ["qwen-plus-1127", "qwen-turbo"]
},
"deepseek": {
"vision": [],
"text": ["deepseek-chat", "deepseek-reasoner"]
},
"siliconflow": {
"vision": ["Qwen/Qwen2.5-VL-32B-Instruct"],
"text": ["deepseek-ai/DeepSeek-R1", "Qwen/Qwen2.5-72B-Instruct"]
}
}
return examples.get(provider_name, {}).get(model_type, [])
@staticmethod
def print_validation_report(validation_results: Dict[str, Any]):
"""
打印验证报告
Args:
validation_results: 验证结果
"""
summary = validation_results["summary"]
print("\n" + "="*60)
print("LLM服务配置验证报告")
print("="*60)
print(f"\n📊 总体统计:")
print(f" 视觉模型提供商: {summary['valid_vision_providers']}/{summary['total_vision_providers']} 有效")
print(f" 文本模型提供商: {summary['valid_text_providers']}/{summary['total_text_providers']} 有效")
if summary["errors"]:
print(f"\n❌ 错误 ({len(summary['errors'])}):")
for error in summary["errors"]:
print(f" - {error}")
if summary["warnings"]:
print(f"\n⚠️ 警告 ({len(summary['warnings'])}):")
for warning in summary["warnings"]:
print(f" - {warning}")
print(f"\n✅ 配置验证完成")
print("="*60)
================================================
FILE: app/services/llm/exceptions.py
================================================
"""
大模型服务异常类定义
定义了大模型服务中可能出现的各种异常类型,
提供统一的错误处理机制
"""
from typing import Optional, Dict, Any
class LLMServiceError(Exception):
"""大模型服务基础异常类"""
def __init__(self, message: str, error_code: Optional[str] = None, details: Optional[Dict[str, Any]] = None):
super().__init__(message)
self.message = message
self.error_code = error_code
self.details = details or {}
def __str__(self):
if self.error_code:
return f"[{self.error_code}] {self.message}"
return self.message
class ProviderNotFoundError(LLMServiceError):
"""供应商未找到异常"""
def __init__(self, provider_name: str):
super().__init__(
message=f"未找到大模型供应商: {provider_name}",
error_code="PROVIDER_NOT_FOUND",
details={"provider_name": provider_name}
)
class ConfigurationError(LLMServiceError):
"""配置错误异常"""
def __init__(self, message: str, config_key: Optional[str] = None):
super().__init__(
message=f"配置错误: {message}",
error_code="CONFIGURATION_ERROR",
details={"config_key": config_key} if config_key else {}
)
class APICallError(LLMServiceError):
"""API调用错误异常"""
def __init__(self, message: str, status_code: Optional[int] = None, response_text: Optional[str] = None):
super().__init__(
message=f"API调用失败: {message}",
error_code="API_CALL_ERROR",
details={
"status_code": status_code,
"response_text": response_text
}
)
class ValidationError(LLMServiceError):
"""输出验证错误异常"""
def __init__(self, message: str, validation_type: Optional[str] = None, invalid_data: Optional[Any] = None):
super().__init__(
message=f"输出验证失败: {message}",
error_code="VALIDATION_ERROR",
details={
"validation_type": validation_type,
"invalid_data": str(invalid_data) if invalid_data else None
}
)
class ModelNotSupportedError(LLMServiceError):
"""模型不支持异常"""
def __init__(self, model_name: str, provider_name: str):
super().__init__(
message=f"供应商 {provider_name} 不支持模型 {model_name}",
error_code="MODEL_NOT_SUPPORTED",
details={
"model_name": model_name,
"provider_name": provider_name
}
)
class RateLimitError(LLMServiceError):
"""API速率限制异常"""
def __init__(self, mes
gitextract_1wpinv12/ ├── .dockerignore ├── .github/ │ ├── pull_request_template.md │ ├── release-drafter.yml │ └── workflows/ │ ├── auto-release-generator.yml │ ├── codeReview.yml │ └── discord-release-notification.yml ├── .gitignore ├── Dockerfile ├── LICENSE ├── Makefile ├── README-en.md ├── README.md ├── app/ │ ├── __init__.py │ ├── config/ │ │ ├── __init__.py │ │ ├── audio_config.py │ │ ├── config.py │ │ └── ffmpeg_config.py │ ├── models/ │ │ ├── __init__.py │ │ ├── const.py │ │ ├── exception.py │ │ └── schema.py │ ├── services/ │ │ ├── SDE/ │ │ │ └── short_drama_explanation.py │ │ ├── SDP/ │ │ │ ├── generate_script_short.py │ │ │ └── utils/ │ │ │ ├── short_schema.py │ │ │ ├── step1_subtitle_analyzer_openai.py │ │ │ ├── step5_merge_script.py │ │ │ └── utils.py │ │ ├── __init__.py │ │ ├── audio_merger.py │ │ ├── audio_normalizer.py │ │ ├── clip_video.py │ │ ├── generate_narration_script.py │ │ ├── generate_video.py │ │ ├── llm/ │ │ │ ├── __init__.py │ │ │ ├── base.py │ │ │ ├── config_validator.py │ │ │ ├── exceptions.py │ │ │ ├── litellm_provider.py │ │ │ ├── manager.py │ │ │ ├── migration_adapter.py │ │ │ ├── providers/ │ │ │ │ └── __init__.py │ │ │ ├── test_litellm_integration.py │ │ │ ├── test_llm_service.py │ │ │ ├── unified_service.py │ │ │ └── validators.py │ │ ├── llm.py │ │ ├── material.py │ │ ├── merger_video.py │ │ ├── prompts/ │ │ │ ├── __init__.py │ │ │ ├── base.py │ │ │ ├── documentary/ │ │ │ │ ├── __init__.py │ │ │ │ ├── frame_analysis.py │ │ │ │ └── narration_generation.py │ │ │ ├── exceptions.py │ │ │ ├── manager.py │ │ │ ├── registry.py │ │ │ ├── short_drama_editing/ │ │ │ │ ├── __init__.py │ │ │ │ ├── plot_extraction.py │ │ │ │ └── subtitle_analysis.py │ │ │ ├── short_drama_narration/ │ │ │ │ ├── __init__.py │ │ │ │ ├── plot_analysis.py │ │ │ │ └── script_generation.py │ │ │ ├── template.py │ │ │ └── validators.py │ │ ├── script_service.py │ │ ├── state.py │ │ ├── subtitle.py │ │ ├── subtitle_merger.py │ │ ├── subtitle_text.py │ │ ├── task.py │ │ ├── update_script.py │ │ ├── upload_validation.py │ │ ├── video.py │ │ ├── video_service.py │ │ ├── voice.py │ │ └── youtube_service.py │ └── utils/ │ ├── check_script.py │ ├── ffmpeg_utils.py │ ├── gemini_analyzer.py │ ├── gemini_openai_analyzer.py │ ├── qwenvl_analyzer.py │ ├── script_generator.py │ ├── utils.py │ └── video_processor.py ├── config.example.toml ├── docker-compose.yml ├── docker-deploy.sh ├── docker-entrypoint.sh ├── docs/ │ └── voice-list.txt ├── project_version ├── requirements.txt ├── resource/ │ ├── fonts/ │ │ └── fonts_in_here.txt │ ├── public/ │ │ └── index.html │ ├── scripts/ │ │ └── script_in_here.txt │ ├── songs/ │ │ └── song_in_here.txt │ ├── srt/ │ │ └── srt_in_here.txt │ └── videos/ │ └── video_in_here.txt ├── webui/ │ ├── __init__.py │ ├── components/ │ │ ├── __init__.py │ │ ├── audio_settings.py │ │ ├── basic_settings.py │ │ ├── ffmpeg_diagnostics.py │ │ ├── script_settings.py │ │ ├── subtitle_settings.py │ │ ├── system_settings.py │ │ └── video_settings.py │ ├── config/ │ │ └── settings.py │ ├── i18n/ │ │ ├── __init__.py │ │ ├── en.json │ │ └── zh.json │ ├── tools/ │ │ ├── base.py │ │ ├── generate_script_docu.py │ │ ├── generate_script_short.py │ │ └── generate_short_summary.py │ └── utils/ │ ├── cache.py │ ├── file_utils.py │ └── vision_analyzer.py └── webui.py
SYMBOL INDEX (683 symbols across 83 files)
FILE: app/config/__init__.py
function __init_logger (line 10) | def __init_logger():
FILE: app/config/audio_config.py
class AudioConfig (line 16) | class AudioConfig:
method get_optimized_volumes (line 50) | def get_optimized_volumes(cls, video_type: str = 'default') -> Dict[st...
method get_audio_processing_config (line 89) | def get_audio_processing_config(cls) -> Dict[str, Any]:
method get_mixing_config (line 94) | def get_mixing_config(cls) -> Dict[str, Any]:
method validate_volume (line 99) | def validate_volume(cls, volume: float, name: str) -> float:
method apply_volume_profile (line 123) | def apply_volume_profile(cls, profile_name: str) -> Dict[str, float]:
function get_recommended_volumes_for_content (line 168) | def get_recommended_volumes_for_content(content_type: str = 'mixed') -> ...
FILE: app/config/config.py
function get_version_from_file (line 12) | def get_version_from_file():
function load_config (line 24) | def load_config():
function save_config (line 47) | def save_config():
FILE: app/config/ffmpeg_config.py
class FFmpegProfile (line 14) | class FFmpegProfile:
class FFmpegConfigManager (line 27) | class FFmpegConfigManager:
method get_recommended_profile (line 99) | def get_recommended_profile(cls) -> str:
method get_profile (line 143) | def get_profile(cls, profile_name: str) -> FFmpegProfile:
method get_extraction_command (line 160) | def get_extraction_command(cls,
method list_profiles (line 235) | def list_profiles(cls) -> Dict[str, str]:
method get_compatibility_report (line 245) | def get_compatibility_report(cls) -> Dict[str, any]:
method _get_suggestions (line 271) | def _get_suggestions(cls, profile: FFmpegProfile, hwaccel_info: Dict) ...
FILE: app/models/exception.py
class HttpException (line 7) | class HttpException(Exception):
method __init__ (line 8) | def __init__(
class FileNotFoundException (line 27) | class FileNotFoundException(Exception):
FILE: app/models/schema.py
class AudioVolumeDefaults (line 16) | class AudioVolumeDefaults:
class VideoConcatMode (line 37) | class VideoConcatMode(str, Enum):
class VideoAspect (line 42) | class VideoAspect(str, Enum):
method to_resolution (line 49) | def to_resolution(self):
class _Config (line 59) | class _Config:
class MaterialInfo (line 64) | class MaterialInfo:
class VideoParams (line 108) | class VideoParams(BaseModel):
class VideoClipParams (line 160) | class VideoClipParams(BaseModel):
class SubtitlePosition (line 204) | class SubtitlePosition(str, Enum):
FILE: app/services/SDE/short_drama_explanation.py
class SubtitleAnalyzer (line 23) | class SubtitleAnalyzer:
method __init__ (line 26) | def __init__(
method _detect_provider (line 62) | def _detect_provider(self):
method _init_headers (line 66) | def _init_headers(self):
method analyze_subtitle (line 79) | def analyze_subtitle(self, subtitle_content: str) -> Dict[str, Any]:
method _call_native_gemini_api (line 117) | def _call_native_gemini_api(self, prompt: str) -> Dict[str, Any]:
method _call_openai_compatible_api (line 233) | def _call_openai_compatible_api(self, prompt: str) -> Dict[str, Any]:
method analyze_subtitle_from_file (line 293) | def analyze_subtitle_from_file(self, subtitle_file_path: str) -> Dict[...
method save_analysis_result (line 332) | def save_analysis_result(self, analysis_result: Dict[str, Any], output...
method generate_narration_script (line 366) | def generate_narration_script(self, short_name: str, plot_analysis: st...
method _generate_narration_with_native_gemini (line 406) | def _generate_narration_with_native_gemini(self, prompt: str, temperat...
method _generate_narration_with_openai_compatible (line 526) | def _generate_narration_with_openai_compatible(self, prompt: str, temp...
method save_narration_script (line 590) | def save_narration_script(self, narration_result: Dict[str, Any], outp...
function analyze_subtitle (line 625) | def analyze_subtitle(
function generate_narration_script (line 684) | def generate_narration_script(
FILE: app/services/SDP/generate_script_short.py
function generate_script_result (line 12) | def generate_script_result(
function generate_script (line 74) | def generate_script(
FILE: app/services/SDP/utils/short_schema.py
class PlotPoint (line 9) | class PlotPoint:
class Commentary (line 16) | class Commentary:
class SubtitleSegment (line 23) | class SubtitleSegment:
class ScriptItem (line 30) | class ScriptItem:
class PipelineResult (line 38) | class PipelineResult:
class VideoProcessingError (line 47) | class VideoProcessingError(Exception):
class SubtitleProcessingError (line 51) | class SubtitleProcessingError(Exception):
class PlotAnalysisError (line 55) | class PlotAnalysisError(Exception):
class CopywritingError (line 59) | class CopywritingError(Exception):
FILE: app/services/SDP/utils/step1_subtitle_analyzer_openai.py
function analyze_subtitle (line 17) | def analyze_subtitle(
FILE: app/services/SDP/utils/step5_merge_script.py
function merge_script (line 9) | def merge_script(
FILE: app/services/SDP/utils/utils.py
function load_srt (line 9) | def load_srt(file_path: str) -> List[Dict]:
function load_srt_from_content (line 83) | def load_srt_from_content(srt_content: str) -> List[Dict]:
FILE: app/services/audio_merger.py
function check_ffmpeg (line 12) | def check_ffmpeg():
function merge_audio_files (line 21) | def merge_audio_files(task_id: str, total_duration: float, list_script: ...
function time_to_seconds (line 79) | def time_to_seconds(time_str):
function extract_timestamp (line 113) | def extract_timestamp(filename):
FILE: app/services/audio_normalizer.py
class AudioNormalizer (line 22) | class AudioNormalizer:
method __init__ (line 25) | def __init__(self):
method analyze_audio_lufs (line 29) | def analyze_audio_lufs(self, audio_path: str) -> Optional[float]:
method get_audio_rms (line 87) | def get_audio_rms(self, audio_path: str) -> Optional[float]:
method normalize_audio_lufs (line 122) | def normalize_audio_lufs(self, input_path: str, output_path: str,
method _simple_normalize (line 207) | def _simple_normalize(self, input_path: str, output_path: str) -> bool:
method calculate_volume_adjustment (line 236) | def calculate_volume_adjustment(self, tts_path: str, original_path: st...
function normalize_audio_for_mixing (line 276) | def normalize_audio_for_mixing(audio_path: str, output_dir: str,
FILE: app/services/clip_video.py
function parse_timestamp (line 21) | def parse_timestamp(timestamp: str) -> tuple:
function calculate_end_time (line 35) | def calculate_end_time(start_time: str, duration: float, extra_seconds: ...
function check_hardware_acceleration (line 76) | def check_hardware_acceleration() -> Optional[str]:
function get_safe_encoder_config (line 87) | def get_safe_encoder_config(hwaccel_type: Optional[str] = None) -> Dict[...
function build_ffmpeg_command (line 143) | def build_ffmpeg_command(
function execute_ffmpeg_with_fallback (line 230) | def execute_ffmpeg_with_fallback(
function analyze_ffmpeg_error (line 304) | def analyze_ffmpeg_error(error_msg: str) -> str:
function try_compatibility_fallback (line 345) | def try_compatibility_fallback(
function try_software_fallback (line 386) | def try_software_fallback(
function try_basic_fallback (line 426) | def try_basic_fallback(
function execute_simple_command (line 464) | def execute_simple_command(cmd: List[str], timestamp: str, method_name: ...
function try_fallback_encoding (line 509) | def try_fallback_encoding(
function _process_narration_only_segment (line 548) | def _process_narration_only_segment(
function _process_original_audio_segment (line 600) | def _process_original_audio_segment(
function _process_mixed_segment (line 643) | def _process_mixed_segment(
function _build_ffmpeg_command_with_audio_control (line 695) | def _build_ffmpeg_command_with_audio_control(
function clip_video_unified (line 780) | def clip_video_unified(
function clip_video (line 901) | def clip_video(
FILE: app/services/generate_narration_script.py
function parse_frame_analysis_to_markdown (line 25) | def parse_frame_analysis_to_markdown(json_file_path):
function generate_narration (line 87) | def generate_narration(markdown_content, api_key, base_url, model):
function _generate_narration_legacy (line 110) | def _generate_narration_legacy(markdown_content, api_key, base_url, model):
FILE: app/services/generate_video.py
function is_valid_subtitle_file (line 32) | def is_valid_subtitle_file(subtitle_path: str) -> bool:
function merge_materials (line 66) | def merge_materials(
function wrap_text (line 407) | def wrap_text(text, max_width, font="Arial", fontsize=60):
FILE: app/services/llm.py
function handle_exception (line 111) | def handle_exception(err):
function _generate_response (line 134) | def _generate_response(prompt: str, llm_provider: str = None) -> str:
function _generate_response_video (line 357) | def _generate_response_video(prompt: str, llm_provider_video: str, video...
function compress_video (line 394) | def compress_video(input_path: str, output_path: str):
function generate_script (line 414) | def generate_script(
function gemini_video_transcription (line 470) | def gemini_video_transcription(video_name: str, video_path: str, languag...
function generate_terms (line 524) | def generate_terms(video_subject: str, video_script: str, amount: int = ...
function gemini_video2json (line 585) | def gemini_video2json(video_origin_name: str, video_origin_path: str, vi...
function writing_movie (line 674) | def writing_movie(video_plot, video_name, llm_provider):
function writing_short_play (line 699) | def writing_short_play(video_plot: str, video_name: str, llm_provider: s...
function screen_matching (line 730) | def screen_matching(huamian: str, wenan: str, llm_provider: str):
FILE: app/services/llm/base.py
class BaseLLMProvider (line 16) | class BaseLLMProvider(ABC):
method __init__ (line 19) | def __init__(self,
method provider_name (line 46) | def provider_name(self) -> str:
method supported_models (line 52) | def supported_models(self) -> List[str]:
method _validate_config (line 56) | def _validate_config(self):
method _validate_model_support (line 67) | def _validate_model_support(self):
method _initialize (line 78) | def _initialize(self):
method _make_api_call (line 83) | async def _make_api_call(self, payload: Dict[str, Any]) -> Dict[str, A...
method _handle_api_error (line 87) | def _handle_api_error(self, status_code: int, response_text: str) -> L...
class VisionModelProvider (line 103) | class VisionModelProvider(BaseLLMProvider):
method analyze_images (line 107) | async def analyze_images(self,
method _prepare_images (line 126) | def _prepare_images(self, images: List[Union[str, Path, PIL.Image.Imag...
class TextModelProvider (line 153) | class TextModelProvider(BaseLLMProvider):
method generate_text (line 157) | async def generate_text(self,
method _build_messages (line 180) | def _build_messages(self, prompt: str, system_prompt: Optional[str] = ...
FILE: app/services/llm/config_validator.py
class LLMConfigValidator (line 15) | class LLMConfigValidator:
method validate_all_configs (line 19) | def validate_all_configs() -> Dict[str, Any]:
method validate_vision_provider (line 88) | def validate_vision_provider(provider_name: str) -> Dict[str, Any]:
method validate_text_provider (line 145) | def validate_text_provider(provider_name: str) -> Dict[str, Any]:
method get_config_suggestions (line 202) | def get_config_suggestions() -> Dict[str, Any]:
method _get_example_models (line 252) | def _get_example_models(provider_name: str, model_type: str) -> List[s...
method print_validation_report (line 280) | def print_validation_report(validation_results: Dict[str, Any]):
FILE: app/services/llm/exceptions.py
class LLMServiceError (line 11) | class LLMServiceError(Exception):
method __init__ (line 14) | def __init__(self, message: str, error_code: Optional[str] = None, det...
method __str__ (line 20) | def __str__(self):
class ProviderNotFoundError (line 26) | class ProviderNotFoundError(LLMServiceError):
method __init__ (line 29) | def __init__(self, provider_name: str):
class ConfigurationError (line 37) | class ConfigurationError(LLMServiceError):
method __init__ (line 40) | def __init__(self, message: str, config_key: Optional[str] = None):
class APICallError (line 48) | class APICallError(LLMServiceError):
method __init__ (line 51) | def __init__(self, message: str, status_code: Optional[int] = None, re...
class ValidationError (line 62) | class ValidationError(LLMServiceError):
method __init__ (line 65) | def __init__(self, message: str, validation_type: Optional[str] = None...
class ModelNotSupportedError (line 76) | class ModelNotSupportedError(LLMServiceError):
method __init__ (line 79) | def __init__(self, model_name: str, provider_name: str):
class RateLimitError (line 90) | class RateLimitError(LLMServiceError):
method __init__ (line 93) | def __init__(self, message: str = "API调用频率超限", retry_after: Optional[i...
class AuthenticationError (line 101) | class AuthenticationError(LLMServiceError):
method __init__ (line 104) | def __init__(self, message: str = "API密钥无效或权限不足"):
class ContentFilterError (line 111) | class ContentFilterError(LLMServiceError):
method __init__ (line 114) | def __init__(self, message: str = "内容被安全过滤器阻止"):
FILE: app/services/llm/litellm_provider.py
function configure_litellm (line 39) | def configure_litellm():
class LiteLLMVisionProvider (line 59) | class LiteLLMVisionProvider(VisionModelProvider):
method provider_name (line 63) | def provider_name(self) -> str:
method supported_models (line 70) | def supported_models(self) -> List[str]:
method _validate_model_support (line 75) | def _validate_model_support(self):
method _initialize (line 98) | def _initialize(self):
method analyze_images (line 130) | async def analyze_images(self,
method _analyze_batch (line 167) | async def _analyze_batch(self, batch: List[PIL.Image.Image], prompt: s...
method _image_to_base64 (line 254) | def _image_to_base64(self, img: PIL.Image.Image) -> str:
method _make_api_call (line 261) | async def _make_api_call(self, payload: Dict[str, Any]) -> Dict[str, A...
class LiteLLMTextProvider (line 266) | class LiteLLMTextProvider(TextModelProvider):
method provider_name (line 270) | def provider_name(self) -> str:
method supported_models (line 289) | def supported_models(self) -> List[str]:
method _validate_model_support (line 294) | def _validate_model_support(self):
method _initialize (line 317) | def _initialize(self):
method generate_text (line 349) | async def generate_text(self,
method _clean_json_output (line 474) | def _clean_json_output(self, output: str) -> str:
method _make_api_call (line 488) | async def _make_api_call(self, payload: Dict[str, Any]) -> Dict[str, A...
FILE: app/services/llm/manager.py
class LLMServiceManager (line 15) | class LLMServiceManager:
method register_vision_provider (line 29) | def register_vision_provider(cls, name: str, provider_class: Type[Visi...
method register_text_provider (line 35) | def register_text_provider(cls, name: str, provider_class: Type[TextMo...
method is_registered (line 46) | def is_registered(cls) -> bool:
method get_registered_providers_info (line 56) | def get_registered_providers_info(cls) -> dict:
method get_vision_provider (line 69) | def get_vision_provider(cls, provider_name: Optional[str] = None) -> V...
method get_text_provider (line 137) | def get_text_provider(cls, provider_name: Optional[str] = None) -> Tex...
method clear_cache (line 211) | def clear_cache(cls):
method list_vision_providers (line 218) | def list_vision_providers(cls) -> list[str]:
method list_text_providers (line 223) | def list_text_providers(cls) -> list[str]:
method get_provider_info (line 228) | def get_provider_info(cls) -> Dict[str, Dict[str, any]]:
FILE: app/services/llm/migration_adapter.py
function _run_async_safely (line 23) | def _run_async_safely(coro_func, *args, **kwargs):
class LegacyLLMAdapter (line 62) | class LegacyLLMAdapter:
method create_vision_analyzer (line 66) | def create_vision_analyzer(provider: str, api_key: str, model: str, ba...
method generate_narration (line 82) | def generate_narration(markdown_content: str, api_key: str, base_url: ...
class VisionAnalyzerAdapter (line 150) | class VisionAnalyzerAdapter:
method __init__ (line 153) | def __init__(self, provider: str, api_key: str, model: str, base_url: ...
method analyze_images (line 159) | async def analyze_images(self,
class SubtitleAnalyzerAdapter (line 207) | class SubtitleAnalyzerAdapter:
method __init__ (line 210) | def __init__(self, api_key: str, model: str, base_url: str, provider: ...
method _run_async_safely (line 216) | def _run_async_safely(self, coro_func, *args, **kwargs):
method _clean_json_output (line 220) | def _clean_json_output(self, output: str) -> str:
method analyze_subtitle (line 238) | def analyze_subtitle(self, subtitle_content: str) -> Dict[str, Any]:
method generate_narration_script (line 274) | def generate_narration_script(self, short_name: str, plot_analysis: st...
function create_vision_analyzer (line 334) | def create_vision_analyzer(provider: str, api_key: str, model: str, base...
function generate_narration (line 339) | def generate_narration(markdown_content: str, api_key: str, base_url: st...
FILE: app/services/llm/providers/__init__.py
function register_all_providers (line 12) | def register_all_providers():
FILE: app/services/llm/test_litellm_integration.py
function test_provider_registration (line 20) | def test_provider_registration():
function test_litellm_import (line 45) | def test_litellm_import():
function test_text_generation_mock (line 61) | async def test_text_generation_mock():
function test_vision_analysis_mock (line 77) | async def test_vision_analysis_mock():
function test_backward_compatibility (line 93) | def test_backward_compatibility():
function print_usage_guide (line 125) | def print_usage_guide():
function main (line 188) | def main():
FILE: app/services/llm/test_llm_service.py
function test_text_generation (line 25) | async def test_text_generation():
function test_json_generation (line 50) | async def test_json_generation():
function test_narration_script_generation (line 89) | async def test_narration_script_generation():
function test_subtitle_analysis (line 124) | async def test_subtitle_analysis():
function test_config_validation (line 159) | def test_config_validation():
function test_provider_info (line 184) | def test_provider_info():
function run_all_tests (line 205) | async def run_all_tests():
FILE: app/services/llm/unified_service.py
class UnifiedLLMService (line 20) | class UnifiedLLMService:
method analyze_images (line 24) | async def analyze_images(images: List[Union[str, Path, PIL.Image.Image]],
method generate_text (line 65) | async def generate_text(prompt: str,
method generate_narration_script (line 112) | async def generate_narration_script(prompt: str,
method analyze_subtitle (line 162) | async def analyze_subtitle(subtitle_content: str,
method get_provider_info (line 209) | def get_provider_info() -> Dict[str, Any]:
method list_vision_providers (line 219) | def list_vision_providers() -> List[str]:
method list_text_providers (line 229) | def list_text_providers() -> List[str]:
method clear_cache (line 239) | def clear_cache():
function analyze_images_unified (line 246) | async def analyze_images_unified(images: List[Union[str, Path, PIL.Image...
function generate_text_unified (line 254) | async def generate_text_unified(prompt: str,
FILE: app/services/llm/validators.py
class OutputValidator (line 15) | class OutputValidator:
method validate_json_output (line 19) | def validate_json_output(output: str, schema: Optional[Dict[str, Any]]...
method _clean_json_output (line 55) | def _clean_json_output(output: str) -> str:
method _validate_json_schema (line 72) | def _validate_json_schema(data: Dict[str, Any], schema: Dict[str, Any]):
method validate_narration_script (line 90) | def validate_narration_script(output: str) -> List[Dict[str, Any]]:
method _validate_narration_item (line 146) | def _validate_narration_item(item: Dict[str, Any], index: int):
method validate_subtitle_analysis (line 166) | def validate_subtitle_analysis(output: str) -> str:
FILE: app/services/material.py
function get_api_key (line 22) | def get_api_key(cfg_key: str):
function search_videos_pexels (line 39) | def search_videos_pexels(
function search_videos_pixabay (line 93) | def search_videos_pixabay(
function save_video (line 149) | def save_video(video_url: str, save_dir: str = "") -> str:
function download_videos (line 190) | def download_videos(
function time_to_seconds (line 257) | def time_to_seconds(time_str: str) -> float:
function format_timestamp (line 292) | def format_timestamp(seconds: float) -> str:
function _detect_hardware_acceleration (line 311) | def _detect_hardware_acceleration() -> Optional[str]:
function save_clip_video (line 323) | def save_clip_video(timestamp: str, origin_video: str, save_dir: str = "...
function clip_videos (line 492) | def clip_videos(task_id: str, timestamp_terms: List[str], origin_video: ...
function merge_videos (line 525) | def merge_videos(video_paths, ost_list):
FILE: app/services/merger_video.py
class VideoAspect (line 21) | class VideoAspect(Enum):
method to_resolution (line 29) | def to_resolution(self) -> Tuple[int, int]:
function check_ffmpeg_installation (line 45) | def check_ffmpeg_installation() -> bool:
function get_hardware_acceleration_option (line 60) | def get_hardware_acceleration_option() -> Optional[str]:
function check_video_has_audio (line 71) | def check_video_has_audio(video_path: str) -> bool:
function create_ffmpeg_concat_file (line 101) | def create_ffmpeg_concat_file(video_paths: List[str], concat_file_path: ...
function process_single_video (line 130) | def process_single_video(
function combine_clip_videos (line 328) | def combine_clip_videos(
FILE: app/services/prompts/__init__.py
function initialize_prompts (line 56) | def initialize_prompts():
FILE: app/services/prompts/base.py
class ModelType (line 19) | class ModelType(Enum):
class OutputFormat (line 26) | class OutputFormat(Enum):
class PromptMetadata (line 35) | class PromptMetadata:
class BasePrompt (line 50) | class BasePrompt(ABC):
method __init__ (line 53) | def __init__(self, metadata: PromptMetadata):
method name (line 60) | def name(self) -> str:
method category (line 65) | def category(self) -> str:
method version (line 70) | def version(self) -> str:
method model_type (line 75) | def model_type(self) -> ModelType:
method output_format (line 80) | def output_format(self) -> OutputFormat:
method get_template (line 85) | def get_template(self) -> str:
method get_system_prompt (line 89) | def get_system_prompt(self) -> Optional[str]:
method get_examples (line 93) | def get_examples(self) -> List[str]:
method validate_parameters (line 97) | def validate_parameters(self, parameters: Dict[str, Any]) -> bool:
method render (line 112) | def render(self, parameters: Dict[str, Any] = None) -> str:
method to_dict (line 134) | def to_dict(self) -> Dict[str, Any]:
class TextPrompt (line 156) | class TextPrompt(BasePrompt):
method __init__ (line 159) | def __init__(self, metadata: PromptMetadata):
class VisionPrompt (line 165) | class VisionPrompt(BasePrompt):
method __init__ (line 168) | def __init__(self, metadata: PromptMetadata):
class ParameterizedPrompt (line 174) | class ParameterizedPrompt(BasePrompt):
method __init__ (line 177) | def __init__(self, metadata: PromptMetadata, required_parameters: List...
FILE: app/services/prompts/documentary/__init__.py
function register_prompts (line 17) | def register_prompts():
FILE: app/services/prompts/documentary/frame_analysis.py
class FrameAnalysisPrompt (line 15) | class FrameAnalysisPrompt(VisionPrompt):
method __init__ (line 18) | def __init__(self):
method get_template (line 33) | def get_template(self) -> str:
FILE: app/services/prompts/documentary/narration_generation.py
class NarrationGenerationPrompt (line 15) | class NarrationGenerationPrompt(TextPrompt):
method __init__ (line 18) | def __init__(self):
method get_template (line 33) | def get_template(self) -> str:
FILE: app/services/prompts/exceptions.py
class PromptError (line 13) | class PromptError(Exception):
class PromptNotFoundError (line 18) | class PromptNotFoundError(PromptError):
method __init__ (line 21) | def __init__(self, category: str, name: str, version: str = None):
class PromptValidationError (line 34) | class PromptValidationError(PromptError):
method __init__ (line 37) | def __init__(self, message: str, validation_errors: list = None):
class TemplateRenderError (line 42) | class TemplateRenderError(PromptError):
method __init__ (line 45) | def __init__(self, template_name: str, error_message: str, missing_par...
class PromptRegistrationError (line 57) | class PromptRegistrationError(PromptError):
method __init__ (line 60) | def __init__(self, category: str, name: str, reason: str):
class PromptVersionError (line 69) | class PromptVersionError(PromptError):
method __init__ (line 72) | def __init__(self, category: str, name: str, version: str, reason: str):
FILE: app/services/prompts/manager.py
class PromptManager (line 26) | class PromptManager:
method __init__ (line 29) | def __init__(self):
method get_prompt (line 34) | def get_prompt(cls,
method get_prompt_object (line 63) | def get_prompt_object(cls,
method register_prompt (line 82) | def register_prompt(cls, prompt: BasePrompt, is_default: bool = True) ...
method list_categories (line 94) | def list_categories(cls) -> List[str]:
method list_prompts (line 100) | def list_prompts(cls, category: str) -> List[str]:
method list_versions (line 106) | def list_versions(cls, category: str, name: str) -> List[str]:
method exists (line 112) | def exists(cls, category: str, name: str, version: Optional[str] = Non...
method search_prompts (line 118) | def search_prompts(cls,
method get_stats (line 150) | def get_stats(cls) -> Dict[str, Any]:
method validate_output (line 164) | def validate_output(cls,
method get_prompt_info (line 204) | def get_prompt_info(cls, category: str, name: str, version: Optional[s...
method export_prompts (line 240) | def export_prompts(cls, category: Optional[str] = None) -> Dict[str, A...
method _get_current_time (line 273) | def _get_current_time(self) -> str:
function get_prompt (line 280) | def get_prompt(category: str, name: str, version: str = None, **paramete...
function validate_prompt_output (line 285) | def validate_prompt_output(output: Union[str, Dict], category: str, name...
FILE: app/services/prompts/registry.py
class PromptRegistry (line 24) | class PromptRegistry:
method __init__ (line 27) | def __init__(self):
method register (line 35) | def register(self, prompt: BasePrompt, is_default: bool = True) -> None:
method get (line 65) | def get(self, category: str, name: str, version: Optional[str] = None)...
method list_categories (line 94) | def list_categories(self) -> List[str]:
method list_prompts (line 98) | def list_prompts(self, category: str) -> List[str]:
method list_versions (line 104) | def list_versions(self, category: str, name: str) -> List[str]:
method get_default_version (line 110) | def get_default_version(self, category: str, name: str) -> Optional[str]:
method set_default_version (line 114) | def set_default_version(self, category: str, name: str, version: str) ...
method exists (line 124) | def exists(self, category: str, name: str, version: Optional[str] = No...
method remove (line 132) | def remove(self, category: str, name: str, version: Optional[str] = No...
method search (line 158) | def search(self,
method get_stats (line 200) | def get_stats(self) -> Dict[str, int]:
function get_registry (line 221) | def get_registry() -> PromptRegistry:
FILE: app/services/prompts/short_drama_editing/__init__.py
function register_prompts (line 17) | def register_prompts():
FILE: app/services/prompts/short_drama_editing/plot_extraction.py
class PlotExtractionPrompt (line 15) | class PlotExtractionPrompt(TextPrompt):
method __init__ (line 18) | def __init__(self):
method get_template (line 33) | def get_template(self) -> str:
FILE: app/services/prompts/short_drama_editing/subtitle_analysis.py
class SubtitleAnalysisPrompt (line 15) | class SubtitleAnalysisPrompt(TextPrompt):
method __init__ (line 18) | def __init__(self):
method get_template (line 33) | def get_template(self) -> str:
FILE: app/services/prompts/short_drama_narration/__init__.py
function register_prompts (line 17) | def register_prompts():
FILE: app/services/prompts/short_drama_narration/plot_analysis.py
class PlotAnalysisPrompt (line 15) | class PlotAnalysisPrompt(TextPrompt):
method __init__ (line 18) | def __init__(self):
method get_template (line 33) | def get_template(self) -> str:
FILE: app/services/prompts/short_drama_narration/script_generation.py
class ScriptGenerationPrompt (line 15) | class ScriptGenerationPrompt(ParameterizedPrompt):
method __init__ (line 18) | def __init__(self):
method get_template (line 33) | def get_template(self) -> str:
FILE: app/services/prompts/template.py
class TemplateRenderer (line 20) | class TemplateRenderer:
method __init__ (line 23) | def __init__(self):
method register_filter (line 26) | def register_filter(self, name: str, func: callable) -> None:
method render (line 31) | def render(self, template: str, parameters: Dict[str, Any] = None) -> ...
method _apply_filters (line 65) | def _apply_filters(self, text: str, parameters: Dict[str, Any]) -> str:
method extract_variables (line 92) | def extract_variables(self, template: str) -> List[str]:
method validate_template (line 99) | def validate_template(self, template: str, required_params: List[str] ...
function _upper_filter (line 127) | def _upper_filter(value: Any) -> str:
function _lower_filter (line 132) | def _lower_filter(value: Any) -> str:
function _title_filter (line 137) | def _title_filter(value: Any) -> str:
function _strip_filter (line 142) | def _strip_filter(value: Any) -> str:
function _truncate_filter (line 147) | def _truncate_filter(value: Any, length: int = 100) -> str:
function _json_filter (line 155) | def _json_filter(value: Any) -> str:
function get_renderer (line 173) | def get_renderer() -> TemplateRenderer:
function render_template (line 178) | def render_template(template: str, parameters: Dict[str, Any] = None) ->...
FILE: app/services/prompts/validators.py
class PromptOutputValidator (line 21) | class PromptOutputValidator:
method validate_json (line 25) | def validate_json(output: str, schema: Dict[str, Any] = None) -> Dict[...
method validate_narration_script (line 55) | def validate_narration_script(output: Union[str, Dict]) -> Dict[str, A...
method validate_plot_analysis (line 90) | def validate_plot_analysis(output: Union[str, Dict]) -> Dict[str, Any]:
method _clean_json_output (line 123) | def _clean_json_output(output: str) -> str:
method _validate_json_schema (line 140) | def _validate_json_schema(data: Dict[str, Any], schema: Dict[str, Any]...
method _validate_narration_item (line 153) | def _validate_narration_item(item: Dict[str, Any], index: int) -> None:
method _validate_plot_point (line 190) | def _validate_plot_point(point: Dict[str, Any], index: int) -> None:
method validate_by_format (line 217) | def validate_by_format(output: str, format_type: OutputFormat, schema:...
function validate_json_output (line 243) | def validate_json_output(output: str, schema: Dict[str, Any] = None) -> ...
function validate_narration_output (line 248) | def validate_narration_output(output: Union[str, Dict]) -> Dict[str, Any]:
FILE: app/services/script_service.py
class ScriptGenerator (line 15) | class ScriptGenerator:
method __init__ (line 16) | def __init__(self):
method generate_script (line 20) | async def generate_script(
method _extract_keyframes (line 76) | async def _extract_keyframes(
method _process_with_llm (line 120) | async def _process_with_llm(
method _get_batch_files (line 248) | def _get_batch_files(
method _get_batch_timestamps (line 259) | def _get_batch_timestamps(
FILE: app/services/state.py
class BaseState (line 8) | class BaseState(ABC):
method update_task (line 10) | def update_task(self, task_id: str, state: int, progress: int = 0, **k...
method get_task (line 14) | def get_task(self, task_id: str):
class MemoryState (line 19) | class MemoryState(BaseState):
method __init__ (line 20) | def __init__(self):
method update_task (line 23) | def update_task(
method get_task (line 40) | def get_task(self, task_id: str):
method delete_task (line 43) | def delete_task(self, task_id: str):
class RedisState (line 49) | class RedisState(BaseState):
method __init__ (line 50) | def __init__(self, host="localhost", port=6379, db=0, password=None):
method update_task (line 55) | def update_task(
method get_task (line 75) | def get_task(self, task_id: str):
method delete_task (line 86) | def delete_task(self, task_id: str):
method _convert_to_original_type (line 90) | def _convert_to_original_type(value):
FILE: app/services/subtitle.py
function create (line 23) | def create(audio_file, subtitle_file: str = ""):
function file_to_subtitles (line 197) | def file_to_subtitles(filename):
function levenshtein_distance (line 228) | def levenshtein_distance(s1, s2):
function similarity (line 248) | def similarity(a, b):
function correct (line 254) | def correct(subtitle_file, video_script):
function create_with_gemini (line 348) | def create_with_gemini(audio_file: str, subtitle_file: str = "", api_key...
function extract_audio_and_create_subtitle (line 380) | def extract_audio_and_create_subtitle(video_file: str, subtitle_file: st...
FILE: app/services/subtitle_merger.py
function parse_time (line 16) | def parse_time(time_str):
function format_time (line 30) | def format_time(td):
function parse_edited_time_range (line 41) | def parse_edited_time_range(time_range_str):
function merge_subtitle_files (line 62) | def merge_subtitle_files(subtitle_items, output_file=None):
FILE: app/services/subtitle_text.py
class DecodedSubtitle (line 28) | class DecodedSubtitle:
function has_timecodes (line 33) | def has_timecodes(text: str) -> bool:
function normalize_subtitle_text (line 40) | def normalize_subtitle_text(text: str) -> str:
function decode_subtitle_bytes (line 69) | def decode_subtitle_bytes(
function read_subtitle_text (line 114) | def read_subtitle_text(file_path: str) -> DecodedSubtitle:
FILE: app/services/task.py
function start_subclip (line 18) | def start_subclip(task_id: str, params: VideoClipParams, subclip_path_vi...
function start_subclip_unified (line 266) | def start_subclip_unified(task_id: str, params: VideoClipParams):
function validate_params (line 480) | def validate_params(video_path, audio_path, output_file, params):
FILE: app/services/update_script.py
function extract_timestamp_from_video_path (line 16) | def extract_timestamp_from_video_path(video_path: str) -> str:
function calculate_duration (line 48) | def calculate_duration(timestamp: str) -> float:
function update_script_timestamps (line 90) | def update_script_timestamps(
FILE: app/services/upload_validation.py
class InputValidationError (line 16) | class InputValidationError(ValueError):
function ensure_existing_file (line 21) | def ensure_existing_file(
function resolve_subtitle_input (line 63) | def resolve_subtitle_input(
FILE: app/services/video.py
function wrap_text (line 22) | def wrap_text(text, max_width, font, fontsize=60):
function manage_clip (line 92) | def manage_clip(clip):
function resize_video_with_padding (line 108) | def resize_video_with_padding(clip, target_width: int, target_height: int):
function loop_audio_clip (line 145) | def loop_audio_clip(audio_clip: AudioFileClip, target_duration: float) -...
function calculate_subtitle_position (line 170) | def calculate_subtitle_position(position, video_height: int, text_height...
function generate_video_v3 (line 200) | def generate_video_v3(
FILE: app/services/video_service.py
class VideoService (line 9) | class VideoService:
method crop_video (line 11) | async def crop_video(
FILE: app/services/voice.py
function mktimestamp (line 28) | def mktimestamp(time_seconds: float) -> str:
function new_sub_maker (line 44) | def new_sub_maker() -> SubMaker:
function add_subtitle_event (line 54) | def add_subtitle_event(
function get_all_azure_voices (line 80) | def get_all_azure_voices(filter_locals=None) -> list[str]:
function parse_voice_name (line 1083) | def parse_voice_name(name: str):
function is_azure_v2_voice (line 1091) | def is_azure_v2_voice(voice_name: str):
function should_use_azure_speech_services (line 1098) | def should_use_azure_speech_services(voice_name: str) -> bool:
function tts (line 1119) | def tts(
function convert_rate_to_percent (line 1156) | def convert_rate_to_percent(rate: float) -> str:
function convert_pitch_to_percent (line 1166) | def convert_pitch_to_percent(rate: float) -> str:
function get_edge_tts_proxy (line 1176) | def get_edge_tts_proxy() -> str | None:
function azure_tts_v1 (line 1186) | def azure_tts_v1(
function azure_tts_v2 (line 1248) | def azure_tts_v2(text: str, voice_name: str, voice_file: str) -> Union[S...
function _format_text (line 1342) | def _format_text(text: str) -> str:
function create_subtitle_from_multiple (line 1357) | def create_subtitle_from_multiple(text: str, sub_maker_list: List[SubMak...
function create_subtitle (line 1460) | def create_subtitle(sub_maker: submaker.SubMaker, text: str, subtitle_fi...
function get_audio_duration (line 1561) | def get_audio_duration(sub_maker: submaker.SubMaker):
function tts_multiple (line 1570) | def tts_multiple(task_id: str, list_script: list, voice_name: str, voice...
function get_audio_duration_from_file (line 1638) | def get_audio_duration_from_file(audio_file: str) -> float:
function parse_soulvoice_voice (line 1670) | def parse_soulvoice_voice(voice_name: str) -> str:
function parse_tencent_voice (line 1681) | def parse_tencent_voice(voice_name: str) -> str:
function parse_qwen3_voice (line 1691) | def parse_qwen3_voice(voice_name: str) -> str:
function qwen3_tts (line 1700) | def qwen3_tts(text: str, voice_name: str, voice_file: str, speed: float ...
function tencent_tts (line 1796) | def tencent_tts(text: str, voice_name: str, voice_file: str, speed: floa...
function soulvoice_tts (line 1899) | def soulvoice_tts(text: str, voice_name: str, voice_file: str, speed: fl...
function is_soulvoice_voice (line 1990) | def is_soulvoice_voice(voice_name: str) -> bool:
function is_qwen_engine (line 1996) | def is_qwen_engine(tts_engine: str) -> bool:
function parse_soulvoice_voice (line 1999) | def parse_soulvoice_voice(voice_name: str) -> str:
function parse_indextts2_voice (line 2011) | def parse_indextts2_voice(voice_name: str) -> str:
function indextts2_tts (line 2022) | def indextts2_tts(text: str, voice_name: str, voice_file: str, speed: fl...
FILE: app/services/youtube_service.py
class YoutubeService (line 11) | class YoutubeService:
method __init__ (line 12) | def __init__(self):
method _get_video_formats (line 15) | def _get_video_formats(self, url: str) -> List[Dict]:
method _validate_format (line 44) | def _validate_format(self, output_format: str) -> None:
method download_video (line 52) | async def download_video(
FILE: app/utils/check_script.py
function check_format (line 5) | def check_format(script_content: str) -> Dict[str, Any]:
FILE: app/utils/ffmpeg_utils.py
function get_null_input (line 64) | def get_null_input() -> str:
function create_test_video (line 78) | def create_test_video() -> str:
function cleanup_test_video (line 104) | def cleanup_test_video(path: str) -> None:
function check_ffmpeg_installation (line 118) | def check_ffmpeg_installation() -> bool:
function detect_gpu_vendor (line 138) | def detect_gpu_vendor() -> str:
function test_hwaccel_method (line 183) | def test_hwaccel_method(method: str, test_input: str) -> bool:
function detect_hardware_acceleration (line 252) | def detect_hardware_acceleration() -> Dict[str, Union[bool, str, List[st...
function _get_gpu_info (line 358) | def _get_gpu_info() -> str:
function _get_macos_gpu_info (line 377) | def _get_macos_gpu_info() -> str:
function _find_vaapi_device (line 403) | def _find_vaapi_device() -> Optional[str]:
function _detect_macos_acceleration (line 440) | def _detect_macos_acceleration(supported_hwaccels: str) -> None:
function _detect_windows_acceleration (line 470) | def _detect_windows_acceleration(supported_hwaccels: str) -> None:
function _detect_linux_acceleration (line 639) | def _detect_linux_acceleration(supported_hwaccels: str) -> None:
function _get_windows_gpu_info (line 719) | def _get_windows_gpu_info() -> str:
function _get_linux_gpu_info (line 748) | def _get_linux_gpu_info() -> str:
function get_ffmpeg_hwaccel_args (line 778) | def get_ffmpeg_hwaccel_args() -> List[str]:
function get_ffmpeg_hwaccel_type (line 792) | def get_ffmpeg_hwaccel_type() -> Optional[str]:
function get_ffmpeg_hwaccel_encoder (line 806) | def get_ffmpeg_hwaccel_encoder() -> Optional[str]:
function get_ffmpeg_hwaccel_info (line 820) | def get_ffmpeg_hwaccel_info() -> Dict[str, Union[bool, str, List[str], N...
function is_ffmpeg_hwaccel_available (line 834) | def is_ffmpeg_hwaccel_available() -> bool:
function is_dedicated_gpu (line 848) | def is_dedicated_gpu() -> bool:
function get_optimal_ffmpeg_encoder (line 862) | def get_optimal_ffmpeg_encoder() -> str:
function get_ffmpeg_command_with_hwaccel (line 881) | def get_ffmpeg_command_with_hwaccel(input_path: str, output_path: str, *...
function test_ffmpeg_compatibility (line 925) | def test_ffmpeg_compatibility() -> Dict[str, any]:
function force_software_encoding (line 981) | def force_software_encoding() -> None:
function reset_hwaccel_detection (line 1001) | def reset_hwaccel_detection() -> None:
function test_nvenc_directly (line 1028) | def test_nvenc_directly() -> bool:
function force_use_nvenc_pure (line 1059) | def force_use_nvenc_pure() -> None:
function get_hwaccel_status (line 1082) | def get_hwaccel_status() -> Dict[str, any]:
function _auto_reset_on_import (line 1106) | def _auto_reset_on_import():
FILE: app/utils/gemini_analyzer.py
class VisionAnalyzer (line 17) | class VisionAnalyzer:
method __init__ (line 20) | def __init__(self, model_name: str = "gemini-2.0-flash-exp", api_key: ...
method _configure_client (line 32) | def _configure_client(self):
method _generate_content_with_retry (line 43) | async def _generate_content_with_retry(self, prompt, batch):
method _generate_with_gemini_api (line 54) | async def _generate_with_gemini_api(self, prompt, batch):
method analyze_images (line 165) | async def analyze_images(self,
method save_results_to_txt (line 249) | def save_results_to_txt(self, results: List[Dict], output_dir: str):
method load_images (line 290) | def load_images(self, image_paths: List[str]) -> List[PIL.Image.Image]:
FILE: app/utils/gemini_openai_analyzer.py
class GeminiOpenAIAnalyzer (line 22) | class GeminiOpenAIAnalyzer:
method __init__ (line 25) | def __init__(self, model_name: str = "gemini-2.0-flash-exp", api_key: ...
method _configure_client (line 40) | def _configure_client(self):
method _generate_content_with_retry (line 54) | async def _generate_content_with_retry(self, prompt, batch):
method _generate_with_openai_api (line 62) | async def _generate_with_openai_api(self, prompt, batch):
method analyze_images (line 108) | async def analyze_images(self,
method analyze_images_sync (line 170) | def analyze_images_sync(self,
FILE: app/utils/qwenvl_analyzer.py
class QwenAnalyzer (line 16) | class QwenAnalyzer:
method __init__ (line 19) | def __init__(self, model_name: str = "qwen-vl-max-latest", api_key: st...
method _configure_client (line 38) | def _configure_client(self):
method _image_to_base64 (line 52) | def _image_to_base64(self, image: PIL.Image.Image) -> str:
method _generate_content_with_retry (line 64) | async def _generate_content_with_retry(self, prompt: str, batch: List[...
method analyze_images (line 102) | async def analyze_images(self,
method save_results_to_txt (line 208) | def save_results_to_txt(self, results: List[Dict], output_dir: str):
method load_images (line 233) | def load_images(self, image_paths: List[str]) -> List[PIL.Image.Image]:
FILE: app/utils/script_generator.py
class BaseGenerator (line 13) | class BaseGenerator:
method __init__ (line 14) | def __init__(self, model_name: str, api_key: str, prompt: str):
method _try_generate (line 29) | def _try_generate(self, messages: list, params: dict = None) -> str:
method _generate (line 44) | def _generate(self, messages: list, params: dict) -> any:
method _process_response (line 47) | def _process_response(self, response: any) -> str:
method generate_script (line 50) | def generate_script(self, scene_description: str, word_count: int) -> ...
class OpenAIGenerator (line 82) | class OpenAIGenerator(BaseGenerator):
method __init__ (line 84) | def __init__(self, model_name: str, api_key: str, prompt: str, base_ur...
method _generate (line 104) | def _generate(self, messages: list, params: dict) -> any:
method _process_response (line 117) | def _process_response(self, response: any) -> str:
method _count_tokens (line 123) | def _count_tokens(self, messages: list) -> int:
class GeminiGenerator (line 136) | class GeminiGenerator(BaseGenerator):
method __init__ (line 138) | def __init__(self, model_name: str, api_key: str, prompt: str, base_ur...
class GeminiOpenAIGenerator (line 155) | class GeminiOpenAIGenerator(BaseGenerator):
method __init__ (line 157) | def __init__(self, model_name: str, api_key: str, prompt: str, base_ur...
method _generate (line 179) | def _generate(self, messages: list, params: dict) -> any:
method _process_response (line 192) | def _process_response(self, response: any) -> str:
method _generate (line 198) | def _generate(self, messages: list, params: dict) -> any:
method _process_response (line 309) | def _process_response(self, response: any) -> str:
class QwenGenerator (line 316) | class QwenGenerator(BaseGenerator):
method __init__ (line 318) | def __init__(self, model_name: str, api_key: str, prompt: str, base_ur...
method _generate (line 332) | def _generate(self, messages: list, params: dict) -> any:
method _process_response (line 345) | def _process_response(self, response: any) -> str:
class MoonshotGenerator (line 352) | class MoonshotGenerator(BaseGenerator):
method __init__ (line 354) | def __init__(self, model_name: str, api_key: str, prompt: str, base_ur...
method _generate (line 370) | def _generate(self, messages: list, params: dict) -> any:
method _process_response (line 390) | def _process_response(self, response: any) -> str:
class DeepSeekGenerator (line 397) | class DeepSeekGenerator(BaseGenerator):
method __init__ (line 399) | def __init__(self, model_name: str, api_key: str, prompt: str, base_ur...
method _generate (line 413) | def _generate(self, messages: list, params: dict) -> any:
method _process_response (line 426) | def _process_response(self, response: any) -> str:
class ScriptProcessor (line 433) | class ScriptProcessor:
method __init__ (line 434) | def __init__(self, model_name: str, api_key: str = None, base_url: str...
method _get_default_prompt (line 454) | def _get_default_prompt(self) -> str:
method calculate_duration_and_word_count (line 499) | def calculate_duration_and_word_count(self, time_range: str) -> int:
method process_frames (line 560) | def process_frames(self, frame_content_list: List[Dict]) -> List[Dict]:
method _save_results (line 572) | def _save_results(self, frame_content_list: List[Dict]):
FILE: app/utils/utils.py
function get_response (line 22) | def get_response(status: int, data: Any = None, message: str = ""):
function to_json (line 33) | def to_json(obj):
function get_uuid (line 65) | def get_uuid(remove_hyphen: bool = False):
function root_dir (line 72) | def root_dir():
function storage_dir (line 76) | def storage_dir(sub_dir: str = "", create: bool = False):
function resource_dir (line 86) | def resource_dir(sub_dir: str = ""):
function task_dir (line 93) | def task_dir(sub_dir: str = ""):
function font_dir (line 102) | def font_dir(sub_dir: str = ""):
function song_dir (line 111) | def song_dir(sub_dir: str = ""):
function get_bgm_file (line 120) | def get_bgm_file(bgm_type: str = "random", bgm_file: str = ""):
function public_dir (line 161) | def public_dir(sub_dir: str = ""):
function srt_dir (line 170) | def srt_dir(sub_dir: str = ""):
function run_in_background (line 179) | def run_in_background(func, *args, **kwargs):
function time_convert_seconds_to_hmsm (line 191) | def time_convert_seconds_to_hmsm(seconds) -> str:
function format_time (line 200) | def format_time(seconds: float) -> str:
function text_to_srt (line 222) | def text_to_srt(idx: int, msg: str, start_time: float, end_time: float) ...
function str_contains_punctuation (line 237) | def str_contains_punctuation(word):
function split_string_by_punctuations (line 244) | def split_string_by_punctuations(s):
function md5 (line 278) | def md5(text):
function get_system_locale (line 284) | def get_system_locale():
function load_locales (line 295) | def load_locales(i18n_dir):
function parse_extension (line 306) | def parse_extension(filename):
function script_dir (line 310) | def script_dir(sub_dir: str = ""):
function video_dir (line 319) | def video_dir(sub_dir: str = ""):
function subtitle_dir (line 328) | def subtitle_dir(sub_dir: str = ""):
function split_timestamp (line 337) | def split_timestamp(timestamp):
function reduce_video_time (line 351) | def reduce_video_time(txt: str, duration: float = 0.21531):
function get_current_country (line 361) | def get_current_country():
function time_to_seconds (line 385) | def time_to_seconds(time_str: str) -> float:
function seconds_to_time (line 431) | def seconds_to_time(seconds: float) -> str:
function calculate_total_duration (line 437) | def calculate_total_duration(scenes):
function add_new_timestamps (line 461) | def add_new_timestamps(scenes):
function clean_model_output (line 503) | def clean_model_output(output):
function cut_video (line 511) | def cut_video(params, progress_callback=None):
function temp_dir (line 557) | def temp_dir(sub_dir: str = ""):
function clear_keyframes_cache (line 573) | def clear_keyframes_cache(video_path: str = None):
function init_resources (line 602) | def init_resources():
function download_font (line 640) | def download_font(url: str, font_path: str):
function init_imagemagick (line 658) | def init_imagemagick():
FILE: app/utils/video_processor.py
class VideoProcessor (line 26) | class VideoProcessor:
method __init__ (line 27) | def __init__(self, video_path: str):
method _get_video_info (line 45) | def _get_video_info(self) -> Dict[str, str]:
method extract_frames_by_interval (line 89) | def extract_frames_by_interval(self, output_dir: str, interval_seconds...
method _extract_single_frame_optimized (line 188) | def _extract_single_frame_optimized(self, timestamp: float, output_pat...
method _try_extract_with_software_decode (line 221) | def _try_extract_with_software_decode(self, timestamp: float, output_p...
method _try_extract_with_hwaccel (line 249) | def _try_extract_with_hwaccel(self, timestamp: float, output_path: str...
method _try_extract_with_software (line 282) | def _try_extract_with_software(self, timestamp: float, output_path: st...
method _try_extract_with_ultra_compatibility (line 311) | def _try_extract_with_ultra_compatibility(self, timestamp: float, outp...
method _execute_ffmpeg_command (line 409) | def _execute_ffmpeg_command(self, cmd: List[str], description: str) ->...
method _detect_hw_accelerator (line 452) | def _detect_hw_accelerator(self) -> List[str]:
method process_video_pipeline (line 464) | def process_video_pipeline(self,
method extract_frames_by_interval_ultra_compatible (line 495) | def extract_frames_by_interval_ultra_compatible(self, output_dir: str,...
method _extract_frame_ultra_compatible (line 586) | def _extract_frame_ultra_compatible(self, timestamp: float, output_pat...
FILE: webui.py
function init_log (line 34) | def init_log():
function init_global_state (line 111) | def init_global_state():
function tr (line 122) | def tr(key):
function render_generate_button (line 130) | def render_generate_button():
function main (line 225) | def main():
FILE: webui/components/audio_settings.py
function get_soulvoice_voices (line 11) | def get_soulvoice_voices():
function get_tts_engine_options (line 22) | def get_tts_engine_options():
function get_tts_engine_descriptions (line 33) | def get_tts_engine_descriptions():
function is_valid_azure_voice_name (line 69) | def is_valid_azure_voice_name(voice_name: str) -> bool:
function render_audio_panel (line 83) | def render_audio_panel(tr):
function render_tts_settings (line 95) | def render_tts_settings(tr):
function render_edge_tts_settings (line 155) | def render_edge_tts_settings(tr):
function render_azure_speech_settings (line 257) | def render_azure_speech_settings(tr):
function render_tencent_tts_settings (line 381) | def render_tencent_tts_settings(tr):
function render_qwen3_tts_settings (line 495) | def render_qwen3_tts_settings(tr):
function render_indextts2_tts_settings (line 574) | def render_indextts2_tts_settings(tr):
function render_voice_preview_new (line 706) | def render_voice_preview_new(tr, selected_engine):
function render_azure_v2_settings (line 784) | def render_azure_v2_settings(tr):
function render_voice_parameters (line 803) | def render_voice_parameters(tr, voice_name):
function render_voice_preview (line 853) | def render_voice_preview(tr, voice_name):
function render_bgm_settings (line 894) | def render_bgm_settings(tr):
function get_audio_params (line 932) | def get_audio_params():
FILE: webui/components/basic_settings.py
function build_base_url_help (line 19) | def build_base_url_help(provider: str, model_type: str) -> tuple[str, bo...
function validate_api_key (line 43) | def validate_api_key(api_key: str, provider: str) -> tuple[bool, str]:
function validate_base_url (line 55) | def validate_base_url(base_url: str, provider: str) -> tuple[bool, str]:
function validate_model_name (line 67) | def validate_model_name(model_name: str, provider: str) -> tuple[bool, s...
function validate_litellm_model_name (line 75) | def validate_litellm_model_name(model_name: str, model_type: str) -> tup...
function show_config_validation_errors (line 118) | def show_config_validation_errors(errors: list):
function render_basic_settings (line 125) | def render_basic_settings(tr):
function render_language_settings (line 144) | def render_language_settings(tr):
function render_proxy_settings (line 171) | def render_proxy_settings(tr):
function test_vision_model_connection (line 203) | def test_vision_model_connection(api_key, base_url, model_name, provider...
function test_litellm_vision_model (line 315) | def test_litellm_vision_model(api_key: str, base_url: str, model_name: s...
function test_litellm_text_model (line 437) | def test_litellm_text_model(api_key: str, base_url: str, model_name: str...
function render_vision_llm_settings (line 541) | def render_vision_llm_settings(tr):
function test_text_model_connection (line 710) | def test_text_model_connection(api_key, base_url, model_name, provider, ...
function render_text_llm_settings (line 813) | def render_text_llm_settings(tr):
FILE: webui/components/ffmpeg_diagnostics.py
function show_ffmpeg_diagnostics (line 20) | def show_ffmpeg_diagnostics():
function show_ffmpeg_settings (line 110) | def show_ffmpeg_settings():
function show_troubleshooting_guide (line 201) | def show_troubleshooting_guide():
function render_ffmpeg_diagnostics_page (line 262) | def render_ffmpeg_diagnostics_page():
FILE: webui/components/script_settings.py
function render_script_panel (line 18) | def render_script_panel(tr):
function render_script_file (line 51) | def render_script_file(tr, params):
function render_video_file (line 225) | def render_video_file(tr, params):
function render_short_generate_options (line 273) | def render_short_generate_options(tr):
function render_video_details (line 291) | def render_video_details(tr):
function short_drama_summary (line 325) | def short_drama_summary(tr):
function render_script_buttons (line 404) | def render_script_buttons(tr, params):
function load_script (line 450) | def load_script(tr, script_path):
function save_script_with_validation (line 464) | def save_script_with_validation(tr, video_clip_json_details):
function get_script_params (line 543) | def get_script_params():
FILE: webui/components/subtitle_settings.py
function render_subtitle_panel (line 9) | def render_subtitle_panel(tr):
function render_font_settings (line 47) | def render_font_settings(tr):
function is_disabled_subtitle_settings (line 91) | def is_disabled_subtitle_settings(tts_engine:str)->bool:
function render_position_settings (line 95) | def render_position_settings(tr):
function render_style_settings (line 130) | def render_style_settings(tr):
function get_subtitle_params (line 152) | def get_subtitle_params():
FILE: webui/components/system_settings.py
function clear_directory (line 9) | def clear_directory(dir_path, tr):
function render_system_panel (line 30) | def render_system_panel(tr):
FILE: webui/components/video_settings.py
function render_video_panel (line 5) | def render_video_panel(tr):
function render_video_config (line 13) | def render_video_config(tr, params):
function get_video_params (line 56) | def get_video_params():
FILE: webui/config/settings.py
function get_version_from_file (line 7) | def get_version_from_file():
class WebUIConfig (line 23) | class WebUIConfig:
method __post_init__ (line 44) | def __post_init__(self):
function load_config (line 52) | def load_config(config_path: Optional[str] = None) -> WebUIConfig:
function save_config (line 98) | def save_config(config: WebUIConfig, config_path: Optional[str] = None) ...
function get_config (line 137) | def get_config() -> WebUIConfig:
function update_config (line 146) | def update_config(config_dict: Dict[str, Any]) -> bool:
FILE: webui/tools/base.py
function create_vision_analyzer (line 16) | def create_vision_analyzer(provider, api_key, model, base_url):
function get_batch_timestamps (line 50) | def get_batch_timestamps(batch_files, prev_batch_files=None):
function get_batch_files (line 140) | def get_batch_files(keyframe_files, result, batch_size=5):
FILE: webui/tools/generate_script_docu.py
function generate_script_docu (line 16) | def generate_script_docu(params):
FILE: webui/tools/generate_script_short.py
function generate_script_short (line 13) | def generate_script_short(tr, params, custom_clips=5):
FILE: webui/tools/generate_short_summary.py
function parse_and_fix_json (line 26) | def parse_and_fix_json(json_string):
function generate_script_short_sunmmary (line 138) | def generate_script_short_sunmmary(params, subtitle_path, video_theme, t...
FILE: webui/utils/cache.py
function get_fonts_cache (line 6) | def get_fonts_cache(font_dir):
function get_video_files_cache (line 18) | def get_video_files_cache():
function get_songs_cache (line 26) | def get_songs_cache(song_dir):
FILE: webui/utils/file_utils.py
function open_task_folder (line 10) | def open_task_folder(root_dir, task_id):
function cleanup_temp_files (line 29) | def cleanup_temp_files(temp_dir, max_age=3600):
function get_file_list (line 48) | def get_file_list(directory, file_types=None, sort_by='ctime', reverse=T...
function save_uploaded_file (line 89) | def save_uploaded_file(uploaded_file, save_dir, allowed_types=None):
function create_temp_file (line 127) | def create_temp_file(prefix='tmp', suffix='', directory=None):
function get_file_size (line 150) | def get_file_size(file_path, format='MB'):
function ensure_directory (line 176) | def ensure_directory(directory):
function create_zip (line 191) | def create_zip(files: list, zip_path: str, base_dir: str = None, folder_...
FILE: webui/utils/vision_analyzer.py
class VisionAnalyzer (line 7) | class VisionAnalyzer:
method __init__ (line 8) | def __init__(self):
method initialize_gemini (line 15) | def initialize_gemini(self, api_key: str, model: str, base_url: str) -...
method initialize_qwenvl (line 33) | def initialize_qwenvl(self, api_key: str, model: str, base_url: str) -...
method analyze_images (line 51) | async def analyze_images(self, images: List[str], prompt: str, batch_s...
function create_vision_analyzer (line 72) | def create_vision_analyzer(provider: str, **kwargs) -> VisionAnalyzer:
Condensed preview — 118 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (918K chars).
[
{
"path": ".dockerignore",
"chars": 802,
"preview": "# Git 相关\n.git/\n.gitignore\n.gitattributes\n.svn/\n\n# Python 相关\n__pycache__/\n*.py[cod]\n*$py.class\n*.so\n.Python\nbuild/\ndevelo"
},
{
"path": ".github/pull_request_template.md",
"chars": 600,
"preview": "## PR 类型\n请选择一个适当的标签(必选其一):\n\n- [ ] 破坏性变更 (breaking)\n- [ ] 安全修复 (security)\n- [ ] 新功能 (feature)\n- [ ] Bug修复 (bug)\n- [ ] 代码重"
},
{
"path": ".github/release-drafter.yml",
"chars": 697,
"preview": "name-template: 'v$RESOLVED_VERSION'\ntag-template: 'v$RESOLVED_VERSION'\ncategories:\n - title: '🚀 新功能'\n labels:\n "
},
{
"path": ".github/workflows/auto-release-generator.yml",
"chars": 5617,
"preview": "name: Auto Release Generator\n\non:\n push:\n branches:\n - main\n paths:\n - 'project_version' # 确保路径准确,不使用通"
},
{
"path": ".github/workflows/codeReview.yml",
"chars": 535,
"preview": "name: Code Review\n\npermissions:\n contents: read\n pull-requests: write\n\non:\n # 在提合并请求的时候触发\n pull_request:\n types: "
},
{
"path": ".github/workflows/discord-release-notification.yml",
"chars": 8086,
"preview": "name: Discord Release Notification\n\non:\n release:\n types: [published]\n\njobs:\n notify-discord:\n runs-on: ubuntu-l"
},
{
"path": ".gitignore",
"chars": 776,
"preview": ".DS_Store\n/config.toml\n/storage/\n/.idea/\n/app/services/__pycache__\n/app/__pycache__/\n/app/config/__pycache__/\n/app/model"
},
{
"path": "Dockerfile",
"chars": 2280,
"preview": "# 多阶段构建 - 构建阶段\nFROM python:3.12-slim-bookworm AS builder\n\n# 设置构建参数\nARG DEBIAN_FRONTEND=noninteractive\n\n# 设置工作目录\nWORKDIR "
},
{
"path": "LICENSE",
"chars": 1655,
"preview": "Modified MIT License - Non-Commercial Use Only\n\nCopyright (c) 2024 linyq\n\nPermission is hereby granted, free of charge, "
},
{
"path": "Makefile",
"chars": 1288,
"preview": "# NarratoAI Docker Makefile\n\n.PHONY: help build up down restart logs shell clean deploy\n\n# 默认目标\n.DEFAULT_GOAL := help\n\n#"
},
{
"path": "README-en.md",
"chars": 6216,
"preview": "<div align=\"center\">\n<h1 align=\"center\" style=\"font-size: 2cm;\"> NarratoAI 😎📽️ </h1>\n<h3 align=\"center\">An all-in-one AI"
},
{
"path": "README.md",
"chars": 6145,
"preview": "\n<div align=\"center\">\n<h1 align=\"center\" style=\"font-size: 2cm;\"> NarratoAI 😎📽️ </h1>\n<h3 align=\"center\">一站式 AI 影视解说+自动化"
},
{
"path": "app/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "app/config/__init__.py",
"chars": 1813,
"preview": "import os\nimport sys\n\nfrom loguru import logger\n\nfrom app.config import config\nfrom app.utils import utils\n\n\ndef __init_"
},
{
"path": "app/config/audio_config.py",
"chars": 5952,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n'''\n@Project: NarratoAI\n@File : audio_config\n@Author : Viccy同学\n@Date "
},
{
"path": "app/config/config.py",
"chars": 2970,
"preview": "import os\nimport socket\nimport toml\nimport shutil\nfrom loguru import logger\n\nroot_dir = os.path.dirname(os.path.dirname("
},
{
"path": "app/config/ffmpeg_config.py",
"chars": 8626,
"preview": "\"\"\"\nFFmpeg 配置管理模块\n专门用于管理 FFmpeg 兼容性设置和优化参数\n\"\"\"\n\nimport os\nimport platform\nfrom typing import Dict, List, Optional\nfrom d"
},
{
"path": "app/models/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "app/models/const.py",
"chars": 338,
"preview": "PUNCTUATIONS = [\n \"?\",\n \",\",\n \".\",\n \"、\",\n \";\",\n \":\",\n \"!\",\n \"…\",\n \"?\",\n \",\",\n \"。\",\n "
},
{
"path": "app/models/exception.py",
"chars": 750,
"preview": "import traceback\nfrom typing import Any\n\nfrom loguru import logger\n\n\nclass HttpException(Exception):\n def __init__(\n "
},
{
"path": "app/models/schema.py",
"chars": 6319,
"preview": "import warnings\nfrom enum import Enum\nfrom typing import Any, List, Optional, Union\n\nimport pydantic\nfrom pydantic impor"
},
{
"path": "app/services/SDE/short_drama_explanation.py",
"chars": 26316,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n'''\n@Project: NarratoAI\n@File : 短剧解说\n@Author : 小林同学\n@Date : 2025/5/9 "
},
{
"path": "app/services/SDP/generate_script_short.py",
"chars": 3586,
"preview": "\"\"\"\n视频脚本生成pipeline,串联各个处理步骤\n\"\"\"\nfrom typing import Any, Dict, Optional\nfrom loguru import logger\n\nfrom .utils.step1_subt"
},
{
"path": "app/services/SDP/utils/short_schema.py",
"chars": 893,
"preview": "\"\"\"\n定义项目中使用的数据类型\n\"\"\"\nfrom typing import List, Dict, Optional\nfrom dataclasses import dataclass\n\n\n@dataclass\nclass PlotPo"
},
{
"path": "app/services/SDP/utils/step1_subtitle_analyzer_openai.py",
"chars": 5758,
"preview": "\"\"\"\n使用统一LLM服务,分析字幕文件,返回剧情梗概和爆点\n\"\"\"\nimport traceback\nimport json\nfrom loguru import logger\n\nfrom app.services.subtitle_te"
},
{
"path": "app/services/SDP/utils/step5_merge_script.py",
"chars": 1118,
"preview": "\"\"\"\n合并生成最终脚本\n\"\"\"\nimport os\nimport json\nfrom typing import Dict, List\n\n\ndef merge_script(\n plot_points: List[Dict]"
},
{
"path": "app/services/SDP/utils/utils.py",
"chars": 3041,
"preview": "# 公共方法\nimport json\nimport requests # 新增\nimport pysrt\nfrom loguru import logger\nfrom typing import List, Dict\n\n\ndef load"
},
{
"path": "app/services/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "app/services/audio_merger.py",
"chars": 5776,
"preview": "import os\nimport json\nimport subprocess\nimport edge_tts\nfrom edge_tts import submaker\nfrom pydub import AudioSegment\nfro"
},
{
"path": "app/services/audio_normalizer.py",
"chars": 9680,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n'''\n@Project: NarratoAI\n@File : audio_normalizer\n@Author : Viccy同学\n@Dat"
},
{
"path": "app/services/clip_video.py",
"chars": 34316,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n'''\n@Project: NarratoAI\n@File : clip_video\n@Author : Viccy同学\n@Date : "
},
{
"path": "app/services/generate_narration_script.py",
"chars": 6878,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n'''\n@Project: NarratoAI\n@File : 生成介绍文案\n@Author : Viccy同学\n@Date : 2025"
},
{
"path": "app/services/generate_video.py",
"chars": 16982,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n'''\n@Project: NarratoAI\n@File : generate_video\n@Author : Viccy同学\n@Date "
},
{
"path": "app/services/llm/__init__.py",
"chars": 927,
"preview": "\"\"\"\nNarratoAI 大模型服务模块\n\n统一的大模型服务抽象层,支持多供应商切换和严格的输出格式验证\n包含视觉模型和文本生成模型的统一接口\n\n主要组件:\n- BaseLLMProvider: 大模型服务提供商基类\n- VisionMo"
},
{
"path": "app/services/llm/base.py",
"chars": 5460,
"preview": "\"\"\"\n大模型服务提供商基类定义\n\n定义了统一的大模型服务接口,包括视觉模型和文本生成模型的抽象基类\n\"\"\"\n\nfrom abc import ABC, abstractmethod\nfrom typing import List, Dic"
},
{
"path": "app/services/llm/config_validator.py",
"chars": 10286,
"preview": "\"\"\"\nLLM服务配置验证器\n\n验证大模型服务的配置是否正确,并提供配置建议\n\"\"\"\n\nfrom typing import Dict, List, Any, Optional\nfrom loguru import logger\n\nfrom"
},
{
"path": "app/services/llm/exceptions.py",
"chars": 3244,
"preview": "\"\"\"\n大模型服务异常类定义\n\n定义了大模型服务中可能出现的各种异常类型,\n提供统一的错误处理机制\n\"\"\"\n\nfrom typing import Optional, Dict, Any\n\n\nclass LLMServiceError(Ex"
},
{
"path": "app/services/llm/litellm_provider.py",
"chars": 16605,
"preview": "\"\"\"\nLiteLLM 统一提供商实现\n\n使用 LiteLLM 库提供统一的 LLM 接口,支持 100+ providers\n包括 OpenAI, Anthropic, Gemini, Qwen, DeepSeek, SiliconFlo"
},
{
"path": "app/services/llm/manager.py",
"chars": 7734,
"preview": "\"\"\"\n大模型服务管理器\n\n统一管理所有大模型服务提供商,提供简单的工厂方法来创建和获取服务实例\n\"\"\"\n\nfrom typing import Dict, Type, Optional\nfrom loguru import logger\n"
},
{
"path": "app/services/llm/migration_adapter.py",
"chars": 10186,
"preview": "\"\"\"\n迁移适配器\n\n为现有代码提供向后兼容的接口,方便逐步迁移到新的LLM服务架构\n\"\"\"\n\nimport asyncio\nimport json\nfrom typing import List, Dict, Any, Optional,"
},
{
"path": "app/services/llm/providers/__init__.py",
"chars": 1057,
"preview": "\"\"\"\n大模型服务提供商实现\n\n包含各种大模型服务提供商的具体实现\n推荐使用 LiteLLM 统一接口(支持 100+ providers)\n\"\"\"\n\n# 不在模块顶部导入 provider 类,避免循环依赖\n# 所有导入都在 regist"
},
{
"path": "app/services/llm/test_litellm_integration.py",
"chars": 5985,
"preview": "\"\"\"\nLiteLLM 集成测试脚本\n\n测试 LiteLLM provider 是否正确集成到系统中\n\"\"\"\n\nimport asyncio\nimport sys\nfrom pathlib import Path\n\n# 添加项目根目录到 P"
},
{
"path": "app/services/llm/test_llm_service.py",
"chars": 6229,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\nLLM服务测试脚本\n\n测试新的LLM服务架构是否正常工作\n\"\"\"\n\nimport asyncio\nimport sys\nimport os"
},
{
"path": "app/services/llm/unified_service.py",
"chars": 7950,
"preview": "\"\"\"\n统一的大模型服务接口\n\n提供简化的API接口,方便现有代码迁移到新的架构\n\"\"\"\n\nfrom typing import List, Dict, Any, Optional, Union\nfrom pathlib import Pa"
},
{
"path": "app/services/llm/validators.py",
"chars": 6680,
"preview": "\"\"\"\n输出格式验证器\n\n提供严格的输出格式验证机制,确保大模型输出符合预期格式\n\"\"\"\n\nimport json\nimport re\nfrom typing import Any, Dict, List, Optional, Union\n"
},
{
"path": "app/services/llm.py",
"chars": 27914,
"preview": "import os\nimport re\nimport json\nimport traceback\nimport streamlit as st\nfrom typing import List\nfrom loguru import logge"
},
{
"path": "app/services/material.py",
"chars": 18840,
"preview": "import os\nimport subprocess\nimport random\nimport traceback\nfrom urllib.parse import urlencode\nfrom datetime import datet"
},
{
"path": "app/services/merger_video.py",
"chars": 23599,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n'''\n@Project: NarratoAI\n@File : merger_video\n@Author : Viccy同学\n@Date "
},
{
"path": "app/services/prompts/__init__.py",
"chars": 1328,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: NarratoAI\n@File : __init__.py\n@Author : viccy同学\n@Date :"
},
{
"path": "app/services/prompts/base.py",
"chars": 5641,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: NarratoAI\n@File : base.py\n@Author : viccy同学\n@Date : 202"
},
{
"path": "app/services/prompts/documentary/__init__.py",
"chars": 742,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: NarratoAI\n@File : __init__.py\n@Author : viccy同学\n@Date :"
},
{
"path": "app/services/prompts/documentary/frame_analysis.py",
"chars": 1493,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: 逐帧解说-画面分析\n@File : frame_analysis.py\n@Author : viccy同学\n@Da"
},
{
"path": "app/services/prompts/documentary/narration_generation.py",
"chars": 2554,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: 逐帧解说-文案生成\n@File : narration_generation.py\n@Author : viccy"
},
{
"path": "app/services/prompts/exceptions.py",
"chars": 2078,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: NarratoAI\n@File : exceptions.py\n@Author : viccy同学\n@Date "
},
{
"path": "app/services/prompts/manager.py",
"chars": 9128,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: NarratoAI\n@File : manager.py\n@Author : viccy同学\n@Date : "
},
{
"path": "app/services/prompts/registry.py",
"chars": 7594,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: NarratoAI\n@File : registry.py\n@Author : viccy同学\n@Date :"
},
{
"path": "app/services/prompts/short_drama_editing/__init__.py",
"chars": 748,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: NarratoAI\n@File : __init__.py\n@Author : viccy同学\n@Date :"
},
{
"path": "app/services/prompts/short_drama_editing/plot_extraction.py",
"chars": 2727,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: 短剧混剪-画面匹配\n@File : plot_extraction.py\n@Author : viccy同学\n@D"
},
{
"path": "app/services/prompts/short_drama_editing/subtitle_analysis.py",
"chars": 2445,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: 短剧混剪-剧情分析\n@File : subtitle_analysis.py\n@Author : viccy同学\n"
},
{
"path": "app/services/prompts/short_drama_narration/__init__.py",
"chars": 737,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: NarratoAI\n@File : __init__.py\n@Author : viccy同学\n@Date :"
},
{
"path": "app/services/prompts/short_drama_narration/plot_analysis.py",
"chars": 1923,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: 短剧解说-剧情分析\n@File : plot_analysis.py\n@Author : viccy同学\n@Dat"
},
{
"path": "app/services/prompts/short_drama_narration/script_generation.py",
"chars": 6658,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: 短剧解说-文案画面匹配\n@File : script_generation.py\n@Author : viccy同"
},
{
"path": "app/services/prompts/template.py",
"chars": 5215,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: NarratoAI\n@File : template.py\n@Author : viccy同学\n@Date :"
},
{
"path": "app/services/prompts/validators.py",
"chars": 8240,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: NarratoAI\n@File : validators.py\n@Author : viccy同学\n@Date "
},
{
"path": "app/services/script_service.py",
"chars": 11206,
"preview": "import os\nimport json\nimport time\nimport asyncio\nimport requests\nfrom app.utils import video_processor\nfrom loguru impor"
},
{
"path": "app/services/state.py",
"chars": 3138,
"preview": "import ast\nfrom abc import ABC, abstractmethod\nfrom app.config import config\nfrom app.models import const\n\n\n# Base class"
},
{
"path": "app/services/subtitle.py",
"chars": 14469,
"preview": "import json\nimport os.path\nimport re\nimport traceback\nfrom typing import Optional\n\n# from faster_whisper import WhisperM"
},
{
"path": "app/services/subtitle_merger.py",
"chars": 8509,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n'''\n@Project: NarratoAI\n@File : subtitle_merger\n@Author : viccy\n@Date "
},
{
"path": "app/services/subtitle_text.py",
"chars": 3531,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\nSubtitle text utilities.\n\nThis module provides a shared, cross-platfo"
},
{
"path": "app/services/task.py",
"chars": 19027,
"preview": "import math\nimport json\nimport os.path\nimport re\nimport traceback\nfrom os import path\nfrom loguru import logger\n\nfrom ap"
},
{
"path": "app/services/update_script.py",
"chars": 10544,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n'''\n@Project: NarratoAI\n@File : update_script\n@Author : Viccy同学\n@Date "
},
{
"path": "app/services/upload_validation.py",
"chars": 2695,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n\"\"\"\n@Project: NarratoAI\n@File : upload_validation.py\n@Author : AI Assis"
},
{
"path": "app/services/video.py",
"chars": 12350,
"preview": "import traceback\n\n# import pysrt\nfrom typing import Optional\nfrom typing import List\nfrom loguru import logger\nfrom movi"
},
{
"path": "app/services/video_service.py",
"chars": 1595,
"preview": "import os\nfrom uuid import uuid4\nfrom loguru import logger\nfrom typing import Dict, List, Optional, Tuple\n\nfrom app.serv"
},
{
"path": "app/services/voice.py",
"chars": 51295,
"preview": "import os\nimport re\nimport json\nimport traceback\nimport edge_tts\nimport asyncio\nimport requests\nimport uuid\nfrom loguru "
},
{
"path": "app/services/youtube_service.py",
"chars": 5067,
"preview": "import yt_dlp\nimport os\nfrom typing import List, Dict, Optional, Tuple\nfrom loguru import logger\nfrom uuid import uuid4\n"
},
{
"path": "app/utils/check_script.py",
"chars": 3756,
"preview": "import json\nimport re\nfrom typing import Dict, Any\n\ndef check_format(script_content: str) -> Dict[str, Any]:\n \"\"\"检查脚本"
},
{
"path": "app/utils/ffmpeg_utils.py",
"chars": 36464,
"preview": "\"\"\"\nFFmpeg 工具模块 - 提供 FFmpeg 相关的工具函数,特别是硬件加速检测\n优化多平台兼容性,支持渐进式降级和智能错误处理\n\"\"\"\nimport os\nimport platform\nimport subprocess\nim"
},
{
"path": "app/utils/gemini_analyzer.py",
"chars": 11395,
"preview": "import json\nfrom typing import List, Union, Dict\nimport os\nfrom pathlib import Path\nfrom loguru import logger\nfrom tqdm "
},
{
"path": "app/utils/gemini_openai_analyzer.py",
"chars": 5603,
"preview": "\"\"\"\nOpenAI兼容的Gemini视觉分析器\n使用标准OpenAI格式调用Gemini代理服务\n\"\"\"\n\nimport json\nfrom typing import List, Union, Dict\nimport os\nfrom p"
},
{
"path": "app/utils/qwenvl_analyzer.py",
"chars": 9008,
"preview": "import json\nfrom typing import List, Union, Dict\nimport os\nfrom pathlib import Path\nfrom loguru import logger\nfrom tqdm "
},
{
"path": "app/utils/script_generator.py",
"chars": 22602,
"preview": "import os\nimport json\nimport traceback\nfrom loguru import logger\n# import tiktoken\nfrom typing import List, Dict\nfrom da"
},
{
"path": "app/utils/utils.py",
"chars": 17342,
"preview": "import locale\nimport os\nimport traceback\n\nimport requests\nimport threading\nfrom typing import Any\nfrom loguru import log"
},
{
"path": "app/utils/video_processor.py",
"chars": 20955,
"preview": "\"\"\"\n视频帧提取工具\n\n这个模块提供了简单高效的视频帧提取功能。主要特点:\n1. 使用ffmpeg进行视频处理,支持硬件加速\n2. 按指定时间间隔提取视频关键帧\n3. 支持多种视频格式\n4. 支持高清视频帧输出\n5. 直接从原视频提取高质"
},
{
"path": "config.example.toml",
"chars": 5285,
"preview": "[app]\n project_version=\"0.7.6\"\n\n # LLM API 超时配置(秒)\n llm_vision_timeout = 120 # 视觉模型基础超时时间\n llm_text_timeout"
},
{
"path": "docker-compose.yml",
"chars": 602,
"preview": "services:\n narratoai-webui:\n build:\n context: .\n dockerfile: Dockerfile\n image: narratoai:latest\n co"
},
{
"path": "docker-deploy.sh",
"chars": 3320,
"preview": "#!/bin/bash\n\n# NarratoAI Docker 一键部署脚本\n\nset -e\n\n# 颜色定义\nGREEN='\\033[0;32m'\nYELLOW='\\033[1;33m'\nRED='\\033[0;31m'\nNC='\\033["
},
{
"path": "docker-entrypoint.sh",
"chars": 3654,
"preview": "#!/bin/bash\nset -e\n\n# 函数:打印日志\nlog() {\n echo \"[$(date +'%Y-%m-%d %H:%M:%S')] $1\"\n}\n\n# 函数:安装运行时依赖\ninstall_runtime_depen"
},
{
"path": "docs/voice-list.txt",
"chars": 12539,
"preview": "Name: af-ZA-AdriNeural\nGender: Female\n\nName: af-ZA-WillemNeural\nGender: Male\n\nName: am-ET-AmehaNeural\nGender: Male\n\nName"
},
{
"path": "project_version",
"chars": 5,
"preview": "0.7.6"
},
{
"path": "requirements.txt",
"chars": 642,
"preview": "# 核心依赖\nrequests>=2.32.0\nmoviepy==2.1.1\nedge-tts==7.2.7\nstreamlit>=1.45.0\nwatchdog==6.0.0\nloguru>=0.7.3\ntomli>=2.2.1\ntoml"
},
{
"path": "resource/fonts/fonts_in_here.txt",
"chars": 7,
"preview": "此处放字体文件"
},
{
"path": "resource/public/index.html",
"chars": 732,
"preview": "<!DOCTYPE html>\n<html lang=\"en\">\n<head>\n <meta charset=\"UTF-8\">\n <title>NarratoAI</title>\n</head>\n<body>\n<h1>Narra"
},
{
"path": "resource/scripts/script_in_here.txt",
"chars": 0,
"preview": ""
},
{
"path": "resource/songs/song_in_here.txt",
"chars": 0,
"preview": ""
},
{
"path": "resource/srt/srt_in_here.txt",
"chars": 0,
"preview": ""
},
{
"path": "resource/videos/video_in_here.txt",
"chars": 0,
"preview": ""
},
{
"path": "webui/__init__.py",
"chars": 380,
"preview": "\"\"\"\nNarratoAI WebUI Package\n\"\"\"\nfrom webui.config.settings import config\nfrom webui.components import (\n basic_settin"
},
{
"path": "webui/components/__init__.py",
"chars": 396,
"preview": "from .basic_settings import render_basic_settings\nfrom .script_settings import render_script_panel\nfrom .video_settings "
},
{
"path": "webui/components/audio_settings.py",
"chars": 29810,
"preview": "import streamlit as st\nimport os\nfrom uuid import uuid4\nfrom app.config import config\nfrom app.services import voice\nfro"
},
{
"path": "webui/components/basic_settings.py",
"chars": 33580,
"preview": "import traceback\n\nimport streamlit as st\nimport os\nfrom app.config import config\nfrom app.utils import utils\nfrom loguru"
},
{
"path": "webui/components/ffmpeg_diagnostics.py",
"chars": 8148,
"preview": "\"\"\"\nFFmpeg 诊断和配置组件\n为用户提供 FFmpeg 兼容性诊断和配置选项\n\"\"\"\n\nimport streamlit as st\nimport platform\nfrom typing import Dict, Any\nfrom"
},
{
"path": "webui/components/script_settings.py",
"chars": 19343,
"preview": "import os\nimport glob\nimport json\nimport time\nimport traceback\nimport streamlit as st\nfrom loguru import logger\n\nfrom ap"
},
{
"path": "webui/components/subtitle_settings.py",
"chars": 5231,
"preview": "\nfrom loguru import logger\nimport streamlit as st\nfrom app.config import config\nfrom webui.utils.cache import get_fonts_"
},
{
"path": "webui/components/system_settings.py",
"chars": 1701,
"preview": "import streamlit as st\nimport os\nimport shutil\nfrom loguru import logger\n\nfrom app.utils.utils import storage_dir\n\n\ndef "
},
{
"path": "webui/components/video_settings.py",
"chars": 2032,
"preview": "import streamlit as st\nfrom app.models.schema import VideoClipParams, VideoAspect, AudioVolumeDefaults\n\n\ndef render_vide"
},
{
"path": "webui/config/settings.py",
"chars": 4797,
"preview": "import os\nimport tomli\nfrom loguru import logger\nfrom typing import Dict, Any, Optional\nfrom dataclasses import dataclas"
},
{
"path": "webui/i18n/__init__.py",
"chars": 12,
"preview": "# 空文件,用于标记包 "
},
{
"path": "webui/i18n/en.json",
"chars": 6314,
"preview": "{\n \"Language\": \"English\",\n \"Translation\": {\n \"Video Script Configuration\": \"**Video Script Configuration**\",\n \"V"
},
{
"path": "webui/i18n/zh.json",
"chars": 7265,
"preview": "{\n \"Language\": \"简体中文\",\n \"Translation\": {\n \"Video Script Configuration\": \"**视频脚本配置**\",\n \"Generate Video Script\": "
},
{
"path": "webui/tools/base.py",
"chars": 4725,
"preview": "import os\nimport requests\nimport streamlit as st\nfrom loguru import logger\nfrom requests.adapters import HTTPAdapter\nfro"
},
{
"path": "webui/tools/generate_script_docu.py",
"chars": 21473,
"preview": "# 纪录片脚本生成\nimport os\nimport json\nimport time\nimport asyncio\nimport traceback\nimport streamlit as st\nfrom loguru import lo"
},
{
"path": "webui/tools/generate_script_short.py",
"chars": 4746,
"preview": "import os\nimport json\nimport time\nimport traceback\nimport streamlit as st\nfrom loguru import logger\n\nfrom app.config imp"
},
{
"path": "webui/tools/generate_short_summary.py",
"chars": 9295,
"preview": "#!/usr/bin/env python\n# -*- coding: UTF-8 -*-\n\n'''\n@Project: NarratoAI\n@File : 短剧解说脚本生成\n@Author : 小林同学\n@Date : 2025/"
},
{
"path": "webui/utils/cache.py",
"chars": 1229,
"preview": "import streamlit as st\nimport os\nimport glob\nfrom app.utils import utils\n\ndef get_fonts_cache(font_dir):\n if 'fonts_c"
},
{
"path": "webui/utils/file_utils.py",
"chars": 6822,
"preview": "import os\nimport glob\nimport time\nimport platform\nimport shutil\nfrom uuid import uuid4\nfrom loguru import logger\nfrom ap"
},
{
"path": "webui/utils/vision_analyzer.py",
"chars": 2678,
"preview": "import logging\nfrom typing import List, Dict, Any, Optional\nfrom app.utils import gemini_analyzer, qwenvl_analyzer\n\nlogg"
},
{
"path": "webui.py",
"chars": 9333,
"preview": "import streamlit as st\nimport os\nimport sys\nfrom loguru import logger\nfrom app.config import config\nfrom webui.component"
}
]
About this extraction
This page contains the full source code of the linyqh/NarratoAI GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 118 files (844.1 KB), approximately 242.4k tokens, and a symbol index with 683 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.