[
  {
    "path": ".dockerignore",
    "content": "# Git and local metadata\n.git\n.github\n.gitignore\n\n# Python caches and virtual environments\n__pycache__/\n*.py[cod]\n*.so\nvenv/\n.venv/\n\n# Runtime/generated artifacts\noutputs/\n.cache/\n.gradio/\n*.log\n\n# Local model artifacts (download at runtime in container)\nvoices/\n*.pth\n\n# Editor/OS files\n.vscode/\n.idea/\n.DS_Store\n"
  },
  {
    "path": ".github/workflows/claude.yml",
    "content": "name: Claude PR Assistant\n\non:\n  issue_comment:\n    types: [created]\n  pull_request_review_comment:\n    types: [created]\n  issues:\n    types: [opened, assigned]\n  pull_request_review:\n    types: [submitted]\n\njobs:\n  claude-code-action:\n    if: |\n      (github.event_name == 'issue_comment' && contains(github.event.comment.body, '@claude')) ||\n      (github.event_name == 'pull_request_review_comment' && contains(github.event.comment.body, '@claude')) ||\n      (github.event_name == 'pull_request_review' && contains(github.event.review.body, '@claude')) ||\n      (github.event_name == 'issues' && contains(github.event.issue.body, '@claude'))\n    runs-on: ubuntu-latest\n    permissions:\n      contents: read\n      pull-requests: read\n      issues: read\n      id-token: write\n    steps:\n      - name: Checkout repository\n        uses: actions/checkout@v4\n        with:\n          fetch-depth: 1\n\n      - name: Run Claude PR Action\n        uses: anthropics/claude-code-action@beta\n        with:\n          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}\n          # Or use OAuth token instead:\n          # claude_code_oauth_token: ${{ secrets.CLAUDE_CODE_OAUTH_TOKEN }}\n          timeout_minutes: \"60\"\n          # Optional: Restrict network access to specific domains only\n          # experimental_allowed_domains: |\n          #   .anthropic.com\n          #   .github.com\n          #   api.github.com\n          #   .githubusercontent.com\n          #   bun.sh\n          #   registry.npmjs.org\n          #   .blob.core.windows.net"
  },
  {
    "path": ".gitignore",
    "content": "# Python\n__pycache__/\n*.py[cod]\n*$py.class\n*.so\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\n*.egg-info/\n.installed.cfg\n*.egg\n\n# Virtual Environment\nvenv/\nENV/\n\n# IDE\n.idea/\n.vscode/\n*.swp\n*.swo\n\n# Project specific\noutput*.wav\n*.pth\n*.onnx\nvoices/\nvoices/*.pt\nvoices/**/*.pt\nconfig.json\n"
  },
  {
    "path": ".gradio/certificate.pem",
    "content": "-----BEGIN CERTIFICATE-----\nMIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw\nTzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh\ncmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4\nWhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu\nZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY\nMTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc\nh77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+\n0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U\nA5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW\nT8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH\nB5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC\nB5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv\nKBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn\nOlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn\njh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw\nqHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI\nrU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV\nHRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq\nhkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL\nubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ\n3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK\nNFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5\nORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur\nTkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC\njNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc\noyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq\n4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA\nmRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d\nemyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=\n-----END CERTIFICATE-----\n"
  },
  {
    "path": "CHINESE_TTS_GUIDE.md",
    "content": "# Kokoro Chinese TTS Guide\n## 科克罗中文文本转语音指南\n\nComplete guide for setting up and using the Kokoro-82M-v1.1_zh Chinese TTS model locally.\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Installation](#installation)\n3. [Quick Start](#quick-start)\n4. [Available Chinese Voices](#available-chinese-voices)\n5. [Advanced Usage](#advanced-usage)\n6. [Troubleshooting](#troubleshooting)\n\n---\n\n## Overview\n\nThe **Kokoro-82M-v1.1_zh** is a fine-tuned Mandarin Chinese TTS model for high-quality speech synthesis.\n\n### Key Features\n\n- 8 Chinese voices (4 female + 4 male)\n- Natural Mandarin pronunciation\n- Adjustable speech speed (0.5x - 2.0x)\n- Automatic text normalization\n- Offline operation after setup\n- Cross-platform support\n\n---\n\n## Installation\n\n### Prerequisites\n\n- Python 3.8+\n- ~1GB free disk space\n- Internet connection (for initial download)\n\n### Automated Setup (Recommended)\n\n```bash\npython setup_chinese_tts.py\n```\n\nThis script automatically downloads the model and all voice files.\n\n### Manual Setup\n\n1. **Download the model:**\n   ```bash\n   # From Hugging Face\n   git clone https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh\n   # Place kokoro-v1_1-zh.pth in project root\n   ```\n\n2. **Download voice files** to `voices/` directory:\n   - Female: `zf_xiaobei.pt`, `zf_xiaoni.pt`, `zf_xiaoxiao.pt`, `zf_xiaoyi.pt`\n   - Male: `zm_yunjian.pt`, `zm_yunxi.pt`, `zm_yunxia.pt`, `zm_yunyang.pt`\n\n3. **Install dependencies:**\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n---\n\n## Quick Start\n\n### Interactive CLI\n\n```bash\npython chinese_tts_demo.py\n```\n\nThe interactive menu provides:\n1. List available voices\n2. Generate speech from custom text\n3. Generate from sample texts\n4. Help information\n5. Exit\n\n### Python API\n\n```python\nfrom chinese_tts_demo import load_chinese_model, generate_chinese_speech, save_audio\nimport torch\n\n# Load model\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\nmodel = load_chinese_model('kokoro-v1_1-zh.pth', device)\n\n# Generate speech\ntext = \"你好，世界！这是一个测试。\"\naudio, _ = generate_chinese_speech(model, text, 'zf_xiaobei', device, speed=1.0)\n\n# Save audio\nif audio is not None:\n    save_audio(audio, 'output.wav')\n```\n\n---\n\n## Available Chinese Voices\n\n### Female Voices (女性声音)\n\n| Voice ID | Name | Description | Quality |\n|----------|------|-------------|---------| \n| `zf_xiaobei` | 晓蓓 | Young, energetic | B |\n| `zf_xiaoni` | 晓妮 | Clear, friendly | B+ |\n| `zf_xiaoxiao` | 晓晓 | Soft, gentle | B |\n| `zf_xiaoyi` | 晓艺 | Professional, articulate | A- |\n\n### Male Voices (男性声音)\n\n| Voice ID | Name | Description | Quality |\n|----------|------|-------------|---------| \n| `zm_yunjian` | 云健 | Strong, confident | B- |\n| `zm_yunxi` | 云析 | Warm, professional | B+ |\n| `zm_yunxia` | 云夏 | Calm, steady | B |\n| `zm_yunyang` | 云阳 | Resonant, deep | B- |\n\n### Recommendations\n\n- **Natural speech**: `zf_xiaoyi` (female) or `zm_yunxi` (male)\n- **Energetic content**: `zf_xiaobei` (female) or `zm_yunjian` (male)\n- **Gentle/soft content**: `zf_xiaoxiao` (female) or `zm_yunxia` (male)\n\n---\n\n## Troubleshooting\n\n### \"WARNING - words count mismatch\"\n\n**Cause**: Wrong phonemizer language configuration.\n\n**Solution**: Use `chinese_tts_demo.py` (not `tts_demo.py`). The code automatically initializes the Chinese phonemizer.\n\n### \"Model file not found\"\n\n**Solution**: Run `python setup_chinese_tts.py` or download manually:\n```bash\npython -c \"from huggingface_hub import hf_hub_download; hf_hub_download('hexgrad/Kokoro-82M-v1.1-zh', 'kokoro-v1_1-zh.pth', local_dir='.')\"\n```\n\n### \"Voice file not found\"\n\n**Solution**: Run `python setup_chinese_tts.py` to download all voice files automatically.\n\n### \"No Chinese phonemizer support\"\n\n**Solution**: TTS works without phonemizer (no phoneme visualization). To install:\n```bash\npip install phonemizer espeakng-loader\n# Then install espeak-ng for your platform\n```\n\n### Out of memory errors\n\n**Solution**:\n- System auto-falls back to CPU\n- Use shorter text segments\n- Close other applications\n- Use already-loaded voice files\n\n---\n\n## Advanced Usage\n\n### Text Processing\n\nThe system automatically handles Chinese character validation, normalization, punctuation, and text segmentation. Use utilities for manual processing:\n\n```python\nfrom chinese_config import ChineseTextProcessor\n\n# Check if text is Chinese\nis_chinese = ChineseTextProcessor.is_chinese(\"你好\")\n\n# Normalize text\nnormalized = ChineseTextProcessor.normalize_chinese_text(\"你好  ，  世界  ！\")\n\n# Split long text\nsegments = ChineseTextProcessor.split_chinese_text(\"长文本...\", max_length=100)\n```\n\n### Batch Processing\n\n```python\ntexts = [\"你好，世界\", \"欢迎使用中文文本转语音\", \"这是一个测试\"]\n\nfor i, text in enumerate(texts):\n    audio, _ = generate_chinese_speech(model, text, 'zf_xiaobei', device)\n    if audio is not None:\n        save_audio(audio, f'output_{i}.wav')\n```\n\n### Performance Tips\n\n- **First run**: Slower due to model loading\n- **Voice caching**: Faster subsequent generations\n- **GPU**: ~3x faster with CUDA\n- **Memory**: ~400MB when loaded\n\n**Typical generation times (with GPU):**\n- Short text (< 30 chars): ~0.5s\n- Medium text (30-100 chars): ~1-2s\n- Long text (100+ chars): ~2-5s\n\n### Offline Usage\n\nAfter initial setup, run offline:\n\n```bash\n# Linux/macOS\nexport HF_HUB_OFFLINE=1\n\n# Windows PowerShell\n$env:HF_HUB_OFFLINE=\"1\"\n\n# Windows CMD\nset HF_HUB_OFFLINE=1\n\npython chinese_tts_demo.py\n```\n\n## FAQ\n\n**Q: Can I use this with English TTS?**  \nA: Yes, but use different scripts: `tts_demo.py` for English, `chinese_tts_demo.py` for Chinese.\n\n**Q: Can I mix Chinese and English text?**  \nA: The system is optimized for pure Chinese text. Mixed text may have lower quality.\n\n**Q: How do I improve audio quality?**  \nA: Try different voices, adjust speed, ensure sufficient disk space, and use GPU if available.\n\n**Q: Is there a REST API?**  \nA: Not yet, but you can modify `gradio_interface.py` to support Chinese.\n\n## Additional Resources\n\n- **Kokoro Project**: https://github.com/hexgrad/kokoro\n- **Model Repository**: https://huggingface.co/hexgrad/Kokoro-82M-v1.1-zh\n- **Main README**: See [README.md](README.md) for general project information\n\n---\n\n**Version**: 1.0 | **Last Updated**: 2024\n\n"
  },
  {
    "path": "Dockerfile",
    "content": "FROM python:3.11-slim\n\nENV PYTHONDONTWRITEBYTECODE=1 \\\n    PYTHONUNBUFFERED=1 \\\n    HF_HOME=/app/.cache/huggingface\n\nWORKDIR /app\n\nRUN apt-get update \\\n    && apt-get install -y --no-install-recommends \\\n        build-essential \\\n        cmake \\\n        curl \\\n        ffmpeg \\\n        espeak-ng \\\n        libsndfile1 \\\n    && rm -rf /var/lib/apt/lists/*\n\nCOPY requirements.txt ./\n# NOTE: requirements.txt uses unpinned dependencies for flexibility.\n# For fully reproducible builds, generate a lock file:\n#   pip install -r requirements.txt && pip freeze > requirements-lock.txt\n# Then replace \"requirements.txt\" below with \"requirements-lock.txt\".\n\nRUN pip install --no-cache-dir --upgrade pip setuptools wheel \\\n    && pip install --no-cache-dir -r requirements.txt \\\n    && python -m spacy download en_core_web_sm \\\n    && python -c \"import spacy; spacy.load('en_core_web_sm'); print('spaCy model OK')\"\n\nCOPY . .\n\nRUN useradd --create-home --uid 10001 appuser \\\n    && mkdir -p /app/outputs /app/voices /app/.cache \\\n    && chown -R appuser:appuser /app\n\nUSER appuser\n\nEXPOSE 7860\n\n# Allow extra time on first start for model/voice downloads from Hugging Face\nHEALTHCHECK --interval=30s --timeout=10s --start-period=120s --retries=3 \\\n    CMD curl -f http://localhost:7860/ || exit 1\n\nCMD [\"python\", \"gradio_interface.py\", \"--host\", \"0.0.0.0\", \"--port\", \"7860\"]\n"
  },
  {
    "path": "IMPROVEMENTS.md",
    "content": "# Kokoro TTS Local - Code Improvements Summary\n\nThis document summarizes all the improvements made to fix the issues identified in the codebase analysis.\n\n## ✅ Completed Improvements\n\n### 1. Replace Monkey Patching with Proper Subclassing\n**Files Modified:** `models.py`, `gradio_interface.py`\n\n- **Issue:** The code was monkey patching `KPipeline.load_voice` and `json.load` functions, which could lead to unexpected behavior.\n- **Solution:** Created `EnhancedKPipeline` class that properly inherits from `KPipeline` and overrides the `load_voice` method.\n- **Benefits:** \n  - More maintainable and predictable code\n  - Better error handling and logging\n  - Eliminates potential conflicts with library updates\n\n### 2. Standardize File Path Handling\n**Files Modified:** `models.py`, `gradio_interface.py`, `tts_demo.py`\n\n- **Issue:** Inconsistent use of `os.path` vs `pathlib.Path` across the codebase.\n- **Solution:** Standardized on using `pathlib.Path` throughout with `.resolve()` for consistent path handling.\n- **Benefits:**\n  - Better cross-platform compatibility\n  - More readable and maintainable code\n  - Consistent path resolution\n\n### 3. Create Centralized Configuration System\n**Files Created:** `config.py`\n\n- **Issue:** Hardcoded constants scattered across multiple files with inconsistent values.\n- **Solution:** Created `TTSConfig` class with centralized configuration management.\n- **Features:**\n  - JSON-based configuration with defaults\n  - Dot notation access (e.g., `config.get(\"audio.sample_rate\")`)\n  - Validation methods for common settings\n  - Easy configuration persistence\n- **Benefits:**\n  - Single source of truth for all settings\n  - Easy customization without code changes\n  - Consistent validation across components\n\n### 4. Fix Format Discrepancy\n**Files Modified:** `speed_dial.py`\n\n- **Issue:** `speed_dial.py` supported \"ogg\" format while `gradio_interface.py` supported \"aac\" format.\n- **Solution:** Standardized on supporting \"wav\", \"mp3\", and \"aac\" formats across all components.\n- **Benefits:** Consistent format support throughout the application\n\n### 5. Improve Error Handling and Logging\n**Files Modified:** `models.py`, `gradio_interface.py`, `tts_demo.py`\n\n- **Issue:** Inconsistent error messages and reliance on print statements.\n- **Solution:** \n  - Implemented proper logging with the `logging` module\n  - Added structured error handling with context\n  - Improved user-friendly error messages\n- **Benefits:**\n  - Better debugging capabilities\n  - Consistent error reporting\n  - Configurable logging levels\n\n### 6. Enhance Voice Download Mechanism\n**Files Modified:** `models.py`\n\n- **Issue:** Sequential downloads with basic retry logic and no progress indication.\n- **Solution:**\n  - Implemented parallel downloads with `ThreadPoolExecutor`\n  - Added progress bars with `tqdm`\n  - Enhanced retry logic with exponential backoff\n  - Better file integrity checking\n- **Benefits:**\n  - Faster download times\n  - Better user experience with progress indication\n  - More robust download handling\n\n### 7. Add Dependency Version Checks\n**Files Created:** `dependency_checker.py`\n**Files Modified:** `requirements.txt`\n\n- **Issue:** No validation of dependency versions or availability.\n- **Solution:**\n  - Created comprehensive dependency checker\n  - Added version validation for all dependencies\n  - CUDA availability detection\n  - Clear installation instructions for missing dependencies\n- **Benefits:**\n  - Early detection of compatibility issues\n  - Better user guidance for setup\n  - Proactive problem prevention\n\n### 8. Improve Thread Safety\n**Files Modified:** `models.py`\n\n- **Issue:** Potential race conditions in multi-threaded environments (Gradio web interface).\n- **Solution:**\n  - Added separate locks for different operations (`_voice_cache_lock`, `_download_lock`)\n  - Enhanced thread-safe resource management\n  - Better synchronization for shared resources\n- **Benefits:**\n  - Safer concurrent operations\n  - Reduced risk of race conditions\n  - Better stability in multi-user scenarios\n\n### 9. Enhance Memory Management\n**Files Modified:** `gradio_interface.py`, `tts_demo.py`\n\n- **Issue:** No memory monitoring or management for large inputs.\n- **Solution:**\n  - Added memory monitoring with `psutil`\n  - Dynamic text length limits based on available memory\n  - Proactive garbage collection and CUDA cache clearing\n  - Memory warnings for low-memory situations\n- **Benefits:**\n  - Better handling of resource-constrained environments\n  - Reduced risk of out-of-memory errors\n  - Improved user experience with appropriate warnings\n\n## 📁 New Files Created\n\n1. **`config.py`** - Centralized configuration management system\n2. **`dependency_checker.py`** - Comprehensive dependency validation\n3. **`IMPROVEMENTS.md`** - This summary document\n\n## 🔧 Files Modified\n\n1. **`models.py`** - Core improvements to pipeline handling, logging, thread safety\n2. **`gradio_interface.py`** - Memory management, path standardization, enhanced pipeline usage\n3. **`tts_demo.py`** - Memory management, path standardization, improved error handling\n4. **`speed_dial.py`** - Format consistency fix\n5. **`requirements.txt`** - Added version constraints and new dependencies\n\n## 🚀 Key Benefits\n\n- **Maintainability:** Cleaner, more organized code structure\n- **Reliability:** Better error handling and resource management\n- **Performance:** Parallel downloads, memory optimization, thread safety\n- **User Experience:** Progress indicators, better error messages, memory warnings\n- **Compatibility:** Standardized paths, dependency validation, version checking\n- **Configurability:** Centralized settings management\n\n### 10. Security and Code Quality Improvements\n**Files Modified:** `models.py`, `gradio_interface.py`, `speed_dial.py`, `tts_demo.py`\n\n- **Issue:** Security vulnerabilities and code quality issues including unsafe torch.load usage, public Gradio exposure, and insufficient input validation.\n- **Solution:**\n  - **Security Fixes:**\n    - Fixed critical `torch.load` security vulnerability by using `weights_only=True`\n    - Removed public exposure of Gradio interface (`share=False`)\n    - Added comprehensive input validation for speed dial presets with regex patterns\n    - Enhanced resource management and cleanup with proper warnings\n  - **Code Quality Improvements:**\n    - Replaced hardcoded values with named constants (`MAX_TEXT_LENGTH`, `DEFAULT_SAMPLE_RATE`, etc.)\n    - Added missing type hints for better code safety and IDE support\n    - Enhanced race condition protection with proper locking mechanisms\n    - Improved error handling consistency with specific error types\n    - Added proper warning suppression for model-related deprecation warnings\n- **Benefits:**\n  - **Security:** Protection against arbitrary code execution via malicious model files\n  - **Privacy:** Prevents accidental public exposure of the interface\n  - **Reliability:** Better input validation prevents crashes and unexpected behavior\n  - **Maintainability:** Named constants and type hints improve code readability\n  - **Stability:** Enhanced thread safety and error handling\n\n## 📋 Usage Notes\n\n### Running Dependency Check\n```bash\npython dependency_checker.py\n```\n\n### Using Configuration System\n```python\nfrom config import config\nsample_rate = config.get(\"audio.sample_rate\")\nconfig.set(\"audio.sample_rate\", 48000)\nconfig.save()\n```\n\n### Memory Monitoring\nThe system now automatically monitors memory usage and adjusts behavior accordingly:\n- Reduces text limits on low memory systems\n- Provides warnings when memory is low\n- Automatically triggers garbage collection when needed\n\nAll improvements maintain backward compatibility while significantly enhancing the robustness and maintainability of the codebase.\n\n## 📈 Recent Updates (July 2025)\n\n### Latest Commits Summary\n\n#### v1.0.3 - Enhanced Audio Processing Support\n**Commit:** `ca106b3` - feat(deps): add torchaudio for enhanced audio processing  \n**Date:** July 19, 2025\n\n- **Added:** `torchaudio` dependency to requirements.txt\n- **Purpose:** Provides comprehensive PyTorch audio processing capabilities\n- **Benefits:** Enhanced audio handling, better format support, improved compatibility with PyTorch ecosystem\n\n#### v1.0.2 - Comprehensive System Improvements  \n**Commit:** `14fc956` - feat: add comprehensive system improvements and documentation  \n**Date:** July 19, 2025\n\nMajor improvements including all the fixes documented above:\n- Centralized configuration management system (`config.py`)\n- Dependency validation and system checks (`dependency_checker.py`) \n- Enhanced security with proper torch.load usage and input validation\n- Improved code quality with type hints and named constants\n- Memory management and monitoring capabilities\n- Enhanced pipeline with better error handling\n- Parallel downloads with progress tracking\n- Standardized path handling across all components\n\n#### v1.0.1 - Dependency Flexibility\n**Commit:** `41c8da8` - remove version constraints from requirements.txt  \n**Date:** July 19, 2025\n\n- **Changed:** Removed strict version constraints from all dependencies\n- **Benefits:** Better compatibility with different Python environments, reduced conflicts, easier installation\n\n### Windows Host Resolution Fix (Current Session)\n**Issue:** Empty UI on Windows due to `0.0.0.0` host resolution problems  \n**Solution:** Added flexible command-line argument parsing\n\n- **Added:** `argparse` support for `--host` and `--port` arguments  \n- **Changed:** Default host from `0.0.0.0` to `127.0.0.1`\n- **Usage:** \n  ```bash\n  python gradio_interface.py --port 8000          # Custom port\n  python gradio_interface.py --host 0.0.0.0      # Custom host\n  ```\n- **Benefits:** Resolves Windows issues, provides deployment flexibility, enables multiple instances"
  },
  {
    "path": "LICENSE",
    "content": "                                 Apache License\n                           Version 2.0, January 2004\n                        http://www.apache.org/licenses/\n\n   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\n\n   1. Definitions.\n\n      \"License\" shall mean the terms and conditions for use, reproduction,\n      and distribution as defined by Sections 1 through 9 of this document.\n\n      \"Licensor\" shall mean the copyright owner or entity authorized by\n      the copyright owner that is granting the License.\n\n      \"Legal Entity\" shall mean the union of the acting entity and all\n      other entities that control, are controlled by, or are under common\n      control with that entity. For the purposes of this definition,\n      \"control\" means (i) the power, direct or indirect, to cause the\n      direction or management of such entity, whether by contract or\n      otherwise, or (ii) ownership of fifty percent (50%) or more of the\n      outstanding shares, or (iii) beneficial ownership of such entity.\n\n      \"You\" (or \"Your\") shall mean an individual or Legal Entity\n      exercising permissions granted by this License.\n\n      \"Source\" form shall mean the preferred form for making modifications,\n      including but not limited to software source code, documentation\n      source, and configuration files.\n\n      \"Object\" form shall mean any form resulting from mechanical\n      transformation or translation of a Source form, including but\n      not limited to compiled object code, generated documentation,\n      and conversions to other media types.\n\n      \"Work\" shall mean the work of authorship, whether in Source or\n      Object form, made available under the License, as indicated by a\n      copyright notice that is included in or attached to the work\n      (an example is provided in the Appendix below).\n\n      \"Derivative Works\" shall mean any work, whether in Source or Object\n      form, that is based on (or derived from) the Work and for which the\n      editorial revisions, annotations, elaborations, or other modifications\n      represent, as a whole, an original work of authorship. For the purposes\n      of this License, Derivative Works shall not include works that remain\n      separable from, or merely link (or bind by name) to the interfaces of,\n      the Work and Derivative Works thereof.\n\n      \"Contribution\" shall mean any work of authorship, including\n      the original version of the Work and any modifications or additions\n      to that Work or Derivative Works thereof, that is intentionally\n      submitted to Licensor for inclusion in the Work by the copyright owner\n      or by an individual or Legal Entity authorized to submit on behalf of\n      the copyright owner. For the purposes of this definition, \"submitted\"\n      means any form of electronic, verbal, or written communication sent\n      to the Licensor or its representatives, including but not limited to\n      communication on electronic mailing lists, source code control systems,\n      and issue tracking systems that are managed by, or on behalf of, the\n      Licensor for the purpose of discussing and improving the Work, but\n      excluding communication that is conspicuously marked or otherwise\n      designated in writing by the copyright owner as \"Not a Contribution.\"\n\n      \"Contributor\" shall mean Licensor and any individual or Legal Entity\n      on behalf of whom a Contribution has been received by Licensor and\n      subsequently incorporated within the Work.\n\n   2. Grant of Copyright License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      copyright license to reproduce, prepare Derivative Works of,\n      publicly display, publicly perform, sublicense, and distribute the\n      Work and such Derivative Works in Source or Object form.\n\n   3. Grant of Patent License. Subject to the terms and conditions of\n      this License, each Contributor hereby grants to You a perpetual,\n      worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n      (except as stated in this section) patent license to make, have made,\n      use, offer to sell, sell, import, and otherwise transfer the Work,\n      where such license applies only to those patent claims licensable\n      by such Contributor that are necessarily infringed by their\n      Contribution(s) alone or by combination of their Contribution(s)\n      with the Work to which such Contribution(s) was submitted. If You\n      institute patent litigation against any entity (including a\n      cross-claim or counterclaim in a lawsuit) alleging that the Work\n      or a Contribution incorporated within the Work constitutes direct\n      or contributory patent infringement, then any patent licenses\n      granted to You under this License for that Work shall terminate\n      as of the date such litigation is filed.\n\n   4. Redistribution. You may reproduce and distribute copies of the\n      Work or Derivative Works thereof in any medium, with or without\n      modifications, and in Source or Object form, provided that You\n      meet the following conditions:\n\n      (a) You must give any other recipients of the Work or\n          Derivative Works a copy of this License; and\n\n      (b) You must cause any modified files to carry prominent notices\n          stating that You changed the files; and\n\n      (c) You must retain, in the Source form of any Derivative Works\n          that You distribute, all copyright, patent, trademark, and\n          attribution notices from the Source form of the Work,\n          excluding those notices that do not pertain to any part of\n          the Derivative Works; and\n\n      (d) If the Work includes a \"NOTICE\" text file as part of its\n          distribution, then any Derivative Works that You distribute must\n          include a readable copy of the attribution notices contained\n          within such NOTICE file, excluding those notices that do not\n          pertain to any part of the Derivative Works, in at least one\n          of the following places: within a NOTICE text file distributed\n          as part of the Derivative Works; within the Source form or\n          documentation, if provided along with the Derivative Works; or,\n          within a display generated by the Derivative Works, if and\n          wherever such third-party notices normally appear. The contents\n          of the NOTICE file are for informational purposes only and\n          do not modify the License. You may add Your own attribution\n          notices within Derivative Works that You distribute, alongside\n          or as an addendum to the NOTICE text from the Work, provided\n          that such additional attribution notices cannot be construed\n          as modifying the License.\n\n      You may add Your own copyright statement to Your modifications and\n      may provide additional or different license terms and conditions\n      for use, reproduction, or distribution of Your modifications, or\n      for any such Derivative Works as a whole, provided Your use,\n      reproduction, and distribution of the Work otherwise complies with\n      the conditions stated in this License.\n\n   5. Submission of Contributions. Unless You explicitly state otherwise,\n      any Contribution intentionally submitted for inclusion in the Work\n      by You to the Licensor shall be under the terms and conditions of\n      this License, without any additional terms or conditions.\n      Notwithstanding the above, nothing herein shall supersede or modify\n      the terms of any separate license agreement you may have executed\n      with Licensor regarding such Contributions.\n\n   6. Trademarks. This License does not grant permission to use the trade\n      names, trademarks, service marks, or product names of the Licensor,\n      except as required for reasonable and customary use in describing the\n      origin of the Work and reproducing the content of the NOTICE file.\n\n   7. Disclaimer of Warranty. Unless required by applicable law or\n      agreed to in writing, Licensor provides the Work (and each\n      Contributor provides its Contributions) on an \"AS IS\" BASIS,\n      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or\n      implied, including, without limitation, any warranties or conditions\n      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A\n      PARTICULAR PURPOSE. You are solely responsible for determining the\n      appropriateness of using or redistributing the Work and assume any\n      risks associated with Your exercise of permissions under this License.\n\n   8. Limitation of Liability. In no event and under no legal theory,\n      whether in tort (including negligence), contract, or otherwise,\n      unless required by applicable law (such as deliberate and grossly\n      negligent acts) or agreed to in writing, shall any Contributor be\n      liable to You for damages, including any direct, indirect, special,\n      incidental, or consequential damages of any character arising as a\n      result of this License or out of the use or inability to use the\n      Work (including but not limited to damages for loss of goodwill,\n      work stoppage, computer failure or malfunction, or any and all\n      other commercial damages or losses), even if such Contributor\n      has been advised of the possibility of such damages.\n\n   9. Accepting Warranty or Additional Liability. While redistributing\n      the Work or Derivative Works thereof, You may choose to offer,\n      and charge a fee for, acceptance of support, warranty, indemnity,\n      or other liability obligations and/or rights consistent with this\n      License. However, in accepting such obligations, You may act only\n      on Your own behalf and on Your sole responsibility, not on behalf\n      of any other Contributor, and only if You agree to indemnify,\n      defend, and hold each Contributor harmless for any liability\n      incurred by, or claims asserted against, such Contributor by reason\n      of your accepting any such warranty or additional liability.\n\n   END OF TERMS AND CONDITIONS\n\n   Copyright 2025 PierrunoYT (Kokoro TTS Local)\n\n   Licensed under the Apache License, Version 2.0 (the \"License\");\n   you may not use this file except in compliance with the License.\n   You may obtain a copy of the License at\n\n       http://www.apache.org/licenses/LICENSE-2.0\n\n   Unless required by applicable law or agreed to in writing, software\n   distributed under the License is distributed on an \"AS IS\" BASIS,\n   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n   See the License for the specific language governing permissions and\n   limitations under the License. "
  },
  {
    "path": "README.md",
    "content": "# Kokoro TTS Local\n\nA local implementation of the Kokoro Text-to-Speech model, featuring dynamic module loading, automatic dependency management, and a web interface.\n\n## Features\n\n- Local text-to-speech synthesis using the Kokoro-82M model\n- Multiple voice support with easy voice selection (54 voices available across 8 languages)\n- Automatic model and voice downloading from Hugging Face\n- **Offline mode support** - Run completely offline after initial setup\n- Phoneme output support and visualization\n- Interactive CLI and web interface\n- Voice listing functionality\n- Cross-platform support (Windows, Linux, macOS)\n- Real-time generation progress display\n- Multiple output formats (WAV, MP3, AAC)\n- Enhanced security and code quality features\n- Centralized configuration management\n- Comprehensive dependency validation\n- Memory management and optimization\n- Thread-safe operations for multi-user scenarios\n\n## Recent Improvements\n\nThis project has been significantly enhanced with security and code quality improvements:\n\n### 🔒 Security Enhancements\n- **Fixed critical security vulnerability** in model loading by using `weights_only=True` for `torch.load`\n- **Removed public exposure** of Gradio interface (`share=False`) to prevent accidental public access\n- **Added comprehensive input validation** for all user inputs with regex pattern matching\n- **Enhanced resource management** with proper cleanup and warning systems\n\n### 🛠️ Code Quality Improvements\n- **Replaced hardcoded values** with named constants for better maintainability\n- **Added comprehensive type hints** throughout the codebase for better IDE support and safety\n- **Enhanced thread safety** with proper locking mechanisms for concurrent operations\n- **Improved error handling** with specific error types and consistent messaging\n- **Added proper warning suppression** for model-related deprecation warnings\n\n### 📁 New Components\n- **`config.py`** - Centralized configuration management system\n- **`dependency_checker.py`** - Comprehensive dependency validation and CUDA detection\n- **`IMPROVEMENTS.md`** - Detailed documentation of all enhancements\n\nFor complete details, see [`IMPROVEMENTS.md`](IMPROVEMENTS.md).\n\n## Prerequisites\n\n- Python 3.8 or higher\n- FFmpeg (optional, for MP3/AAC conversion)\n- CUDA-compatible GPU (optional, for faster generation)\n- Git (for version control and package management)\n\n## Installation\n\n1. Clone the repository and create a Python virtual environment:\n```bash\n# Windows\npython -m venv venv\n.\\venv\\Scripts\\activate\n\n# Linux/macOS\npython3 -m venv venv\nsource venv/bin/activate\n```\n\n2. Install dependencies:\n```bash\npip install -r requirements.txt\n```\n\n**Alternative Installation (Simplified):**\nFor a simpler setup, you can also install the official Kokoro package directly:\n```bash\npip install kokoro soundfile\napt-get install espeak-ng  # On Linux\n# or brew install espeak  # On macOS\n```\n\n3. (Optional) For GPU acceleration, install PyTorch with CUDA support:\n```bash\n# For CUDA 11.8\npip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n\n# For CUDA 12.1\npip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121\n\n# For CUDA 12.6\npip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126\n\n# For CUDA 12.8 (for RTX 50-series cards)\npip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128\n```\n\nYou can verify CUDA support is enabled with:\n```python\nimport torch\nprint(torch.cuda.is_available())  # Should print True if CUDA is available\n```\n\nThe system will automatically download required models and voice files on first run.\n\n## Docker Quick Start\n\nThis project can be run in a CPU-first Docker setup with runtime model and voice downloads.\n\n### Build and Run with Docker\n\n**Linux/macOS (bash/zsh):**\n```bash\ndocker build -t kokoro-tts-local:cpu .\ndocker run --rm -it \\\n   -p 7860:7860 \\\n   -v \"$(pwd)/outputs:/app/outputs\" \\\n   -v \"$(pwd)/voices:/app/voices\" \\\n   -v \"$(pwd)/.cache:/app/.cache\" \\\n   kokoro-tts-local:cpu\n```\n\n**Windows (PowerShell):**\n```powershell\ndocker build -t kokoro-tts-local:cpu .\ndocker run --rm -it `\n   -p 7860:7860 `\n   -v \"${PWD}/outputs:/app/outputs\" `\n   -v \"${PWD}/voices:/app/voices\" `\n   -v \"${PWD}/.cache:/app/.cache\" `\n   kokoro-tts-local:cpu\n```\n\nOpen `http://localhost:7860` in your browser.\n\n### Run with Docker Compose\n\n```bash\ndocker compose up --build\n```\n\n### Docker Notes\n\n- First startup can take longer because model and voice files are downloaded from Hugging Face.\n- Volumes for `outputs`, `voices`, and `.cache` are recommended so downloads and generated audio persist across restarts.\n- The Docker image pre-installs `en_core_web_sm` during build to avoid non-root runtime initialization errors.\n- This initial Docker support is CPU-first. GPU and pre-baked model image variants are intentionally out of scope for this first implementation.\n- To force offline mode after assets are downloaded, set `HF_HUB_OFFLINE=1` in your Docker environment.\n\n## Offline Mode\n\nAfter the initial setup, you can run Kokoro-TTS-Local completely offline without an internet connection.\n\n### Quick Start - Offline Mode\n\n**Linux/macOS:**\n```bash\nexport HF_HUB_OFFLINE=1\npython tts_demo.py\n```\n\n**Windows (PowerShell):**\n```powershell\n$env:HF_HUB_OFFLINE=\"1\"\npython tts_demo.py\n```\n\n**Windows (Command Prompt):**\n```cmd\nset HF_HUB_OFFLINE=1\npython tts_demo.py\n```\n\n### Requirements for Offline Mode\n\nBefore enabling offline mode, ensure you have:\n1. Run the application at least once with internet connection\n2. Downloaded the model file (`kokoro-v1_0.pth`)\n3. Downloaded the config file (`config.json`)\n4. Downloaded at least one voice file in the `voices/` directory\n\n### Testing Offline Mode\n\nUse the provided test script to verify your offline setup:\n\n```bash\nexport HF_HUB_OFFLINE=1  # Enable offline mode\npython test_offline.py   # Run the test\n```\n\nThe script checks:\n- Offline mode environment variables are set\n- Required files exist (`kokoro-v1_0.pth`, `config.json`, `voices/`)\n- All required Python packages are installed\n- Model initializes correctly\n- Voices can be listed\n- Speech can be generated and saved\n\nFor detailed offline usage instructions, set `HF_HUB_OFFLINE=1` before running and use `test_offline.py` to verify your setup.\n\n## Usage\n\nYou can use either the command-line interface or the web interface:\n\n### Command Line Interface\n\nRun the interactive CLI:\n```bash\npython tts_demo.py\n```\n\nThe CLI provides an interactive menu with the following options:\n1. List available voices - Shows all available voice options\n2. Generate speech - Interactive process to:\n   - Select a voice from the numbered list\n   - Enter text to convert to speech\n   - Adjust speech speed (0.5-2.0)\n3. Exit - Quit the program\n\nExample session:\n```\n=== Kokoro TTS Menu ===\n1. List available voices\n2. Generate speech\n3. Exit\nSelect an option (1-3): 2\n\nAvailable voices:\n1. af_alloy\n2. af_aoede\n3. af_bella\n...\n\nSelect a voice number (or press Enter for default 'af_bella'): 3\n\nEnter the text you want to convert to speech\n(or press Enter for default text)\n> Hello, world!\n\nEnter speech speed (0.1-3.0, default 1.0): 1.2\n\nGenerating speech for: 'Hello, world!'\nUsing voice: af_bella\nSpeed: 1.2x\n...\n```\n\n### Web Interface\n\nFor a more user-friendly experience, launch the web interface:\n\n```bash\npython gradio_interface.py\n```\n\nThen open your browser to the URL shown in the console (typically http://localhost:7860).\n\nThe web interface provides:\n- Easy voice selection from a dropdown menu\n- Text input field with examples\n- Speed control slider (0.5–2.0x)\n- Output format selection (WAV, MP3, AAC)\n- Real-time generation progress\n- Audio playback in the browser\n- Download options for generated audio\n- **Speed Dial presets** — save, load, and delete frequently used voice/text/speed combinations\n\n### Dependency Validation\n\nBefore running the application, you can validate your system setup:\n\n```bash\npython dependency_checker.py\n```\n\nThis will check:\n- Python version compatibility\n- All required dependencies and their versions\n- CUDA availability and GPU detection\n- System memory and disk space\n- Audio system functionality\n\n### Configuration Management\n\nThe system now includes centralized configuration management:\n\n```python\nfrom config import config\n\n# Get configuration values\nsample_rate = config.get(\"audio.sample_rate\")\nmax_text_length = config.get(\"limits.max_text_length\")\n\n# Set configuration values\nconfig.set(\"audio.sample_rate\", 48000)\nconfig.set(\"interface.auto_play\", True)\n\n# Save configuration\nconfig.save()\n```\n\nConfiguration files are automatically created with sensible defaults.\n\n## Available Voices\n\nThe system includes 54 different voices across 8 languages:\n\n### 🇺🇸 American English (20 voices)\n**Language code: 'a'**\n\n**Female voices (af_*):**\n- af_heart: ❤️ Premium quality voice (Grade A)\n- af_alloy: Clear and professional (Grade C)\n- af_aoede: Smooth and melodic (Grade C+)\n- af_bella: 🔥 Warm and friendly (Grade A-)\n- af_jessica: Natural and engaging (Grade D)\n- af_kore: Bright and energetic (Grade C+)\n- af_nicole: 🎧 Professional and articulate (Grade B-)\n- af_nova: Modern and dynamic (Grade C)\n- af_river: Soft and flowing (Grade D)\n- af_sarah: Casual and approachable (Grade C+)\n- af_sky: Light and airy (Grade C-)\n\n**Male voices (am_*):**\n- am_adam: Strong and confident (Grade F+)\n- am_echo: Resonant and clear (Grade D)\n- am_eric: Professional and authoritative (Grade D)\n- am_fenrir: Deep and powerful (Grade C+)\n- am_liam: Friendly and conversational (Grade D)\n- am_michael: Warm and trustworthy (Grade C+)\n- am_onyx: Rich and sophisticated (Grade D)\n- am_puck: Playful and energetic (Grade C+)\n- am_santa: Holiday-themed voice (Grade D-)\n\n### 🇬🇧 British English (8 voices)\n**Language code: 'b'**\n\n**Female voices (bf_*):**\n- bf_alice: Refined and elegant (Grade D)\n- bf_emma: Warm and professional (Grade B-)\n- bf_isabella: Sophisticated and clear (Grade C)\n- bf_lily: Sweet and gentle (Grade D)\n\n**Male voices (bm_*):**\n- bm_daniel: Polished and professional (Grade D)\n- bm_fable: Storytelling and engaging (Grade C)\n- bm_george: Classic British accent (Grade C)\n- bm_lewis: Modern British accent (Grade D+)\n\n### 🇯🇵 Japanese (5 voices)\n**Language code: 'j'**\n\n**Female voices (jf_*):**\n- jf_alpha: Standard Japanese female (Grade C+)\n- jf_gongitsune: Based on classic tale (Grade C)\n- jf_nezumi: Mouse bride tale voice (Grade C-)\n- jf_tebukuro: Glove story voice (Grade C)\n\n**Male voices (jm_*):**\n- jm_kumo: Spider thread tale voice (Grade C-)\n\n### 🇨🇳 Mandarin Chinese (8 voices)\n**Language code: 'z'**\n\n**Female voices (zf_*):**\n- zf_xiaobei: Chinese female voice (Grade D)\n- zf_xiaoni: Chinese female voice (Grade D)\n- zf_xiaoxiao: Chinese female voice (Grade D)\n- zf_xiaoyi: Chinese female voice (Grade D)\n\n**Male voices (zm_*):**\n- zm_yunjian: Chinese male voice (Grade D)\n- zm_yunxi: Chinese male voice (Grade D)\n- zm_yunxia: Chinese male voice (Grade D)\n- zm_yunyang: Chinese male voice (Grade D)\n\n**Note:** Run `python setup_chinese_tts.py` to download the Chinese model and voice files automatically. For full usage details see [CHINESE_TTS_GUIDE.md](CHINESE_TTS_GUIDE.md) or [README_CHINESE_TTS.md](README_CHINESE_TTS.md).\n\n### 🇪🇸 Spanish (3 voices)\n**Language code: 'e'**\n\n**Female voices (ef_*):**\n- ef_dora: Spanish female voice\n\n**Male voices (em_*):**\n- em_alex: Spanish male voice\n- em_santa: Spanish holiday voice\n\n### 🇫🇷 French (1 voice)\n**Language code: 'f'**\n\n**Female voices (ff_*):**\n- ff_siwis: French female voice (Grade B-)\n\n### 🇮🇳 Hindi (4 voices)\n**Language code: 'h'**\n\n**Female voices (hf_*):**\n- hf_alpha: Hindi female voice (Grade C)\n- hf_beta: Hindi female voice (Grade C)\n\n**Male voices (hm_*):**\n- hm_omega: Hindi male voice (Grade C)\n- hm_psi: Hindi male voice (Grade C)\n\n### 🇮🇹 Italian (2 voices)\n**Language code: 'i'**\n\n**Female voices (if_*):**\n- if_sara: Italian female voice (Grade C)\n\n**Male voices (im_*):**\n- im_nicola: Italian male voice (Grade C)\n\n### 🇧🇷 Brazilian Portuguese (3 voices)\n**Language code: 'p'**\n\n**Female voices (pf_*):**\n- pf_dora: Portuguese female voice\n\n**Male voices (pm_*):**\n- pm_alex: Portuguese male voice\n- pm_santa: Portuguese holiday voice\n\n**Note:** Quality grades (A to F) indicate the overall quality based on training data quality and duration. Higher grades generally produce better speech quality.\n\n## Project Structure\n\n```\n.\n├── .cache/                 # Cache directory for downloaded models\n│   └── huggingface/       # Hugging Face model cache\n├── .git/                   # Git repository data\n├── .gitignore             # Git ignore rules\n├── __pycache__/           # Python cache files\n├── voices/                # Voice model files (downloaded on demand)\n│   └── *.pt              # Individual voice files\n├── venv/                  # Python virtual environment\n├── outputs/               # Generated audio files directory\n├── LICENSE                # Apache 2.0 License file\n├── README.md             # Project documentation\n├── README_CHINESE_TTS.md # Chinese TTS quick reference\n├── CHINESE_TTS_GUIDE.md  # Complete Chinese TTS guide\n├── IMPROVEMENTS.md       # Detailed improvement documentation\n├── models.py             # Core TTS model implementation\n├── gradio_interface.py   # Web interface implementation\n├── tts_demo.py          # CLI implementation (English)\n├── chinese_tts_demo.py   # CLI implementation (Chinese, 5-option menu)\n├── chinese_config.py     # Chinese text processing and voice configuration\n├── setup_chinese_tts.py  # Downloads Chinese model and voice files\n├── config.py            # Centralized configuration management\n├── dependency_checker.py # Dependency validation and system checks\n├── speed_dial.py        # Speed Dial preset management (save/load/delete)\n├── test_offline.py      # Offline mode verification script\n└── requirements.txt     # Python dependencies (no version constraints)\n```\n\n## Model Information\n\nThe project uses the latest Kokoro model from Hugging Face:\n- Repository: [hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M)\n- Model file: `kokoro-v1_0.pth` (downloaded automatically)\n- Sample rate: 24kHz\n- Voice files: Located in the `voices/` directory (downloaded automatically)\n- Available voices: 54 voices across 8 languages\n- Languages: American English ('a'), British English ('b'), Japanese ('j'), Mandarin Chinese ('z'), Spanish ('e'), French ('f'), Hindi ('h'), Italian ('i'), Brazilian Portuguese ('p')\n- Model size: 82M parameters\n\n## Troubleshooting\n\nCommon issues and solutions:\n\n### Quick System Check\n\nFirst, run the dependency checker to identify potential issues:\n```bash\npython dependency_checker.py\n```\n\nThis will automatically detect and report:\n- Missing or incompatible dependencies\n- CUDA/GPU configuration issues\n- System resource problems\n- Audio system issues\n\n### Common Issues\n\n1. **Offline Mode / Network Connection Issues**\n   - **Problem:** Getting \"Failed to resolve 'huggingface.co'\" errors even with cached files\n   - **Solution:** Enable offline mode with `export HF_HUB_OFFLINE=1` (Linux/macOS) or `$env:HF_HUB_OFFLINE=\"1\"` (Windows)\n   - **Verify:** Run `python test_offline.py` to confirm your offline setup is working\n\n2. **Model Download Issues**\n   - Ensure stable internet connection\n   - Check Hugging Face is accessible\n   - Verify sufficient disk space\n   - Try clearing the `.cache/huggingface` directory\n\n3. **CUDA/GPU Issues**\n   - Verify CUDA installation with `nvidia-smi`\n   - Update GPU drivers\n   - Install PyTorch with CUDA support using the appropriate command:\n     ```bash\n     # For CUDA 11.8\n     pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n\n     # For CUDA 12.1\n     pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121\n\n     # For CUDA 12.6\n     pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126\n\n     # For CUDA 12.8 (for RTX 50-series cards)\n     pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128\n     ```\n   - Verify CUDA is available in PyTorch:\n     ```python\n     import torch\n     print(torch.cuda.is_available())  # Should print True\n     ```\n   - Fall back to CPU if needed\n\n4. **Audio Output Issues**\n   - Check system audio settings\n   - Verify output directory permissions\n   - Install FFmpeg for MP3/AAC support\n   - Try different output formats\n\n5. **Voice File Issues**\n   - Delete and let system redownload voice files\n   - Check `voices/` directory permissions\n   - Verify voice file integrity\n   - Try using a different voice\n\n6. **Web Interface Issues**\n   - Check port 7860 availability\n   - Try different browser\n   - Clear browser cache\n   - Check network firewall settings\n\nFor any other issues:\n1. Check the console output for error messages\n2. Verify all prerequisites are installed\n3. Ensure virtual environment is activated\n4. Check system resource usage\n5. Try reinstalling dependencies\n\n## Contributing\n\nFeel free to contribute by:\n1. Opening issues for bugs or feature requests\n2. Submitting pull requests with improvements\n3. Helping with documentation\n4. Testing different voices and reporting issues\n5. Suggesting new features or optimizations\n6. Testing on different platforms and reporting results\n\n## License\n\nApache 2.0 - See LICENSE file for details"
  },
  {
    "path": "README_CHINESE_TTS.md",
    "content": "# Kokoro Chinese TTS - Quick Reference\n\nQuick start guide for Chinese TTS. For complete documentation, see [CHINESE_TTS_GUIDE.md](CHINESE_TTS_GUIDE.md).\n\n## Quick Start\n\n```bash\n# 1. Setup (downloads model and voices)\npython setup_chinese_tts.py\n\n# 2. Run interactive demo\npython chinese_tts_demo.py\n```\n\n## Python API\n\n```python\nfrom chinese_tts_demo import load_chinese_model, generate_chinese_speech, save_audio\nimport torch\n\n# Load model\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\nmodel = load_chinese_model('kokoro-v1_1-zh.pth', device)\n\n# Generate speech\naudio, _ = generate_chinese_speech(\n    model, \n    \"你好，世界！\",  # Your Chinese text\n    'zf_xiaobei',    # Voice ID\n    device,\n    speed=1.0\n)\n\n# Save audio\nif audio is not None:\n    save_audio(audio, 'output.wav')\n```\n\n## Available Voices\n\n**Female (女性)**: `zf_xiaobei`, `zf_xiaoni`, `zf_xiaoxiao`, `zf_xiaoyi`  \n**Male (男性)**: `zm_yunjian`, `zm_yunxi`, `zm_yunxia`, `zm_yunyang`\n\n**Recommended**: `zf_xiaoyi` (female) or `zm_yunxi` (male) for natural speech.\n\n## Key Features\n\n- ✅ 8 Chinese voices (4 female + 4 male)\n- ✅ Natural Mandarin pronunciation\n- ✅ Adjustable speed (0.5x - 2.0x)\n- ✅ Offline operation after setup\n- ✅ Cross-platform support\n\n## Troubleshooting\n\n| Issue | Solution |\n|-------|----------|\n| Model not found | Run `python setup_chinese_tts.py` |\n| Voice files missing | Run `python setup_chinese_tts.py` |\n| \"words count mismatch\" warning | Use `chinese_tts_demo.py` (not `tts_demo.py`) |\n| Out of memory | System auto-falls back to CPU |\n\n## Documentation\n\n- **Complete Guide**: [CHINESE_TTS_GUIDE.md](CHINESE_TTS_GUIDE.md)\n- **Main README**: [README.md](README.md)\n\n---\n\n**Version**: 1.0 | **Model**: Kokoro-82M-v1.1_zh\n"
  },
  {
    "path": "chinese_config.py",
    "content": "\"\"\"\r\nChinese TTS Configuration Module for Kokoro-v1.1-zh\r\n====================================================\r\n\r\nThis module provides specialized configuration and utilities for the Kokoro Chinese TTS model.\r\nIt handles Chinese-specific phonemization, text processing, and voice management.\r\n\"\"\"\r\n\r\nimport os\r\nimport json\r\nfrom pathlib import Path\r\nfrom typing import Dict, Any, Optional, List\r\nimport logging\r\n\r\nlogger = logging.getLogger(__name__)\r\n\r\n# Chinese language code\r\nCHINESE_LANG_CODE = 'z'\r\n\r\n# Chinese Model Configuration\r\nCHINESE_MODEL_CONFIG = {\r\n    \"model_name\": \"Kokoro-v1.1-zh\",\r\n    \"model_file\": \"kokoro-v1_1-zh.pth\",\r\n    \"repo_id\": \"hexgrad/Kokoro-82M-v1.1-zh\",\r\n    \"language_code\": 'z',\r\n    \"description\": \"Kokoro 82M Chinese (Mandarin) TTS Model v1.1\",\r\n    \"phonemizer\": \"espeak-zh\",  # Specialized Chinese phonemizer\r\n    \"sample_rate\": 24000,\r\n    \"voice_prefix\": [\"zf_\", \"zm_\"],  # Chinese female (zf_) and male (zm_) voices\r\n}\r\n\r\n# Chinese Voice Files - 8 voices total (4 female + 4 male)\r\nCHINESE_VOICES = {\r\n    # Female voices\r\n    \"zf_xiaobei\": {\r\n        \"name\": \"晓蓓\",\r\n        \"gender\": \"Female\",\r\n        \"description\": \"Young, energetic female voice\",\r\n        \"language\": \"Mandarin Chinese\",\r\n        \"file\": \"zf_xiaobei.pt\"\r\n    },\r\n    \"zf_xiaoni\": {\r\n        \"name\": \"晓妮\",\r\n        \"gender\": \"Female\",\r\n        \"description\": \"Clear, friendly female voice\",\r\n        \"language\": \"Mandarin Chinese\",\r\n        \"file\": \"zf_xiaoni.pt\"\r\n    },\r\n    \"zf_xiaoxiao\": {\r\n        \"name\": \"晓晓\",\r\n        \"gender\": \"Female\",\r\n        \"description\": \"Soft, gentle female voice\",\r\n        \"language\": \"Mandarin Chinese\",\r\n        \"file\": \"zf_xiaoxiao.pt\"\r\n    },\r\n    \"zf_xiaoyi\": {\r\n        \"name\": \"晓艺\",\r\n        \"gender\": \"Female\",\r\n        \"description\": \"Professional, articulate female voice\",\r\n        \"language\": \"Mandarin Chinese\",\r\n        \"file\": \"zf_xiaoyi.pt\"\r\n    },\r\n    # Male voices\r\n    \"zm_yunjian\": {\r\n        \"name\": \"云健\",\r\n        \"gender\": \"Male\",\r\n        \"description\": \"Strong, confident male voice\",\r\n        \"language\": \"Mandarin Chinese\",\r\n        \"file\": \"zm_yunjian.pt\"\r\n    },\r\n    \"zm_yunxi\": {\r\n        \"name\": \"云析\",\r\n        \"gender\": \"Male\",\r\n        \"description\": \"Warm, professional male voice\",\r\n        \"language\": \"Mandarin Chinese\",\r\n        \"file\": \"zm_yunxi.pt\"\r\n    },\r\n    \"zm_yunxia\": {\r\n        \"name\": \"云夏\",\r\n        \"gender\": \"Male\",\r\n        \"description\": \"Calm, steady male voice\",\r\n        \"language\": \"Mandarin Chinese\",\r\n        \"file\": \"zm_yunxia.pt\"\r\n    },\r\n    \"zm_yunyang\": {\r\n        \"name\": \"云阳\",\r\n        \"gender\": \"Male\",\r\n        \"description\": \"Resonant, deep male voice\",\r\n        \"language\": \"Mandarin Chinese\",\r\n        \"file\": \"zm_yunyang.pt\"\r\n    }\r\n}\r\n\r\nclass ChineseTextProcessor:\r\n    \"\"\"Handle Chinese-specific text processing and normalization\"\"\"\r\n    \r\n    @staticmethod\r\n    def is_chinese(text: str) -> bool:\r\n        \"\"\"Check if text contains Chinese characters\"\"\"\r\n        for char in text:\r\n            if '\\u4e00' <= char <= '\\u9fff':  # Unicode range for CJK unified ideographs\r\n                return True\r\n        return False\r\n    \r\n    @staticmethod\r\n    def normalize_chinese_text(text: str) -> str:\r\n        \"\"\"Normalize Chinese text for TTS processing\"\"\"\r\n        # Remove extra whitespace\r\n        text = ' '.join(text.split())\r\n        \r\n        # Ensure proper spacing around punctuation\r\n        import re\r\n        # Add space after sentence punctuation, removing any existing spaces first.\r\n        # Keep brackets/quotes untouched to avoid introducing awkward spaces.\r\n        text = re.sub(r\"\\s*([。，！？；：])\\s*\", r\"\\1 \", text)\r\n        # Clean up any double spaces that may have been created\r\n        text = ' '.join(text.split())\r\n        \r\n        return text.strip()\r\n    \r\n    @staticmethod\r\n    def split_chinese_text(text: str, max_length: int = 100) -> List[str]:\r\n        \"\"\"Split Chinese text into proper segments for TTS processing\r\n        \r\n        Args:\r\n            text: Chinese text to split\r\n            max_length: Maximum characters per segment\r\n            \r\n        Returns:\r\n            List of text segments\r\n        \"\"\"\r\n        segments = []\r\n        current_segment = \"\"\r\n        \r\n        for char in text:\r\n            current_segment += char\r\n            \r\n            # Split on punctuation or max length\r\n            if char in '。！？；，\\n' or len(current_segment) >= max_length:\r\n                if current_segment.strip():\r\n                    segments.append(current_segment.strip())\r\n                current_segment = \"\"\r\n        \r\n        # Add remaining text\r\n        if current_segment.strip():\r\n            segments.append(current_segment.strip())\r\n        \r\n        return segments\r\n\r\n\r\nclass ChineseTTSConfig:\r\n    \"\"\"Specialized configuration manager for Chinese TTS\"\"\"\r\n    \r\n    def __init__(self, config_file: Optional[str] = None):\r\n        self.config_file = Path(config_file or \"chinese_tts_config.json\").resolve()\r\n        self.chinese_voices_dir = Path(\"voices\").resolve()\r\n        self._config = self._load_default_config()\r\n        self._load_config_file()\r\n    \r\n    def _load_default_config(self) -> Dict[str, Any]:\r\n        \"\"\"Load default configuration values for Chinese TTS\"\"\"\r\n        return {\r\n            \"model\": CHINESE_MODEL_CONFIG,\r\n            \"voices\": CHINESE_VOICES,\r\n            \"phonemizer\": {\r\n                \"backend\": \"espeak-ng\",\r\n                \"language\": \"zh\",  # Chinese language code for espeak\r\n                \"preserve_punctuation\": True,\r\n                \"strip\": False\r\n            },\r\n            \"text_processing\": {\r\n                \"normalize\": True,\r\n                \"split_long_text\": True,\r\n                \"max_segment_length\": 100,\r\n                \"min_segment_length\": 10\r\n            },\r\n            \"audio\": {\r\n                \"sample_rate\": 24000,\r\n                \"default_speed\": 1.0,\r\n                \"min_speed\": 0.5,\r\n                \"max_speed\": 2.0\r\n            },\r\n            \"paths\": {\r\n                \"voices_dir\": \"voices\",\r\n                \"models_dir\": \".\",\r\n                \"output_dir\": \"outputs\"\r\n            }\r\n        }\r\n    \r\n    def _load_config_file(self):\r\n        \"\"\"Load configuration from file if it exists\"\"\"\r\n        if self.config_file.exists():\r\n            try:\r\n                with open(self.config_file, 'r', encoding='utf-8') as f:\r\n                    file_config = json.load(f)\r\n                self._merge_config(file_config)\r\n                logger.info(f\"Loaded Chinese TTS configuration from {self.config_file}\")\r\n            except (json.JSONDecodeError, IOError) as e:\r\n                logger.warning(f\"Failed to load Chinese config file {self.config_file}: {e}\")\r\n    \r\n    def _merge_config(self, file_config: Dict[str, Any]):\r\n        \"\"\"Merge file configuration with default configuration\"\"\"\r\n        def merge_dict(default: Dict, override: Dict):\r\n            for key, value in override.items():\r\n                if key in default and isinstance(default[key], dict) and isinstance(value, dict):\r\n                    merge_dict(default[key], value)\r\n                else:\r\n                    default[key] = value\r\n        \r\n        merge_dict(self._config, file_config)\r\n    \r\n    def get(self, key: str, default: Any = None) -> Any:\r\n        \"\"\"Get configuration value using dot notation\"\"\"\r\n        keys = key.split('.')\r\n        value = self._config\r\n        \r\n        for k in keys:\r\n            if isinstance(value, dict) and k in value:\r\n                value = value[k]\r\n            else:\r\n                return default\r\n        \r\n        return value\r\n    \r\n    def set(self, key: str, value: Any):\r\n        \"\"\"Set configuration value using dot notation\"\"\"\r\n        keys = key.split('.')\r\n        config = self._config\r\n        \r\n        for k in keys[:-1]:\r\n            if k not in config:\r\n                config[k] = {}\r\n            config = config[k]\r\n        \r\n        config[keys[-1]] = value\r\n    \r\n    def save(self):\r\n        \"\"\"Save current configuration to file\"\"\"\r\n        try:\r\n            self.config_file.parent.mkdir(parents=True, exist_ok=True)\r\n            with open(self.config_file, 'w', encoding='utf-8') as f:\r\n                json.dump(self._config, f, indent=2, ensure_ascii=False)\r\n            logger.info(f\"Chinese TTS configuration saved to {self.config_file}\")\r\n        except IOError as e:\r\n            logger.error(f\"Failed to save Chinese TTS configuration: {e}\")\r\n    \r\n    def get_voices_list(self) -> List[str]:\r\n        \"\"\"Get list of available Chinese voices\"\"\"\r\n        return list(CHINESE_VOICES.keys())\r\n    \r\n    def get_voice_info(self, voice_name: str) -> Optional[Dict[str, Any]]:\r\n        \"\"\"Get information about a specific voice\"\"\"\r\n        return CHINESE_VOICES.get(voice_name)\r\n    \r\n    def ensure_voices_directory(self):\r\n        \"\"\"Ensure Chinese voices directory exists\"\"\"\r\n        self.chinese_voices_dir.mkdir(parents=True, exist_ok=True)\r\n        logger.info(f\"Chinese voices directory ready: {self.chinese_voices_dir}\")\r\n    \r\n    def validate_chinese_model(self, model_path: str) -> bool:\r\n        \"\"\"Validate Chinese model file\"\"\"\r\n        model_file = Path(model_path).resolve()\r\n        if not model_file.exists():\r\n            logger.error(f\"Chinese model file not found: {model_file}\")\r\n            return False\r\n        \r\n        # Basic file size check (should be > 100MB)\r\n        if model_file.stat().st_size < 100 * 1024 * 1024:\r\n            logger.warning(f\"Model file size seems too small: {model_file.stat().st_size}\")\r\n        \r\n        return True\r\n\r\n\r\n# Global configuration instance for Chinese TTS\r\nchinese_config = ChineseTTSConfig()\r\n\r\n\r\n# Convenience functions\r\ndef get_chinese_config(key: str, default: Any = None) -> Any:\r\n    \"\"\"Get Chinese TTS configuration value\"\"\"\r\n    return chinese_config.get(key, default)\r\n\r\n\r\ndef get_chinese_voices() -> List[str]:\r\n    \"\"\"Get list of available Chinese voices\"\"\"\r\n    voices_dir = chinese_config.chinese_voices_dir\r\n    if not voices_dir.exists():\r\n        return []\r\n\r\n    available_voice_names = {voice_path.stem for voice_path in voices_dir.glob(\"*.pt\")}\r\n    return [voice_name for voice_name in CHINESE_VOICES if voice_name in available_voice_names]\r\n\r\n\r\ndef get_chinese_voice_info(voice_name: str) -> Optional[Dict[str, Any]]:\r\n    \"\"\"Get information about a specific Chinese voice\"\"\"\r\n    return CHINESE_VOICES.get(voice_name)\r\n\r\n\r\ndef is_chinese_text(text: str) -> bool:\r\n    \"\"\"Check if text is in Chinese\"\"\"\r\n    return ChineseTextProcessor.is_chinese(text)\r\n\r\n\r\ndef normalize_chinese(text: str) -> str:\r\n    \"\"\"Normalize Chinese text\"\"\"\r\n    return ChineseTextProcessor.normalize_chinese_text(text)\r\n\r\n\r\ndef split_chinese_text(text: str, max_length: int = 100) -> List[str]:\r\n    \"\"\"Split Chinese text into segments\"\"\"\r\n    return ChineseTextProcessor.split_chinese_text(text, max_length)\r\n\r\n"
  },
  {
    "path": "chinese_tts_demo.py",
    "content": "\"\"\"\r\nChinese TTS Demo - Interactive CLI for Kokoro Chinese TTS Model\r\n================================================================\r\n\r\nThis script provides an interactive command-line interface for the Kokoro-v1.1-zh\r\nChinese TTS model. It handles Chinese-specific text processing and voice selection.\r\n\r\nUsage:\r\n    python chinese_tts_demo.py\r\n\r\nRequirements:\r\n    - kokoro-v1_1-zh.pth model file\r\n    - Chinese voice files in voices/ directory\r\n    - All dependencies from requirements.txt\r\n\"\"\"\r\n\r\nimport torch\r\nimport os\r\nimport sys\r\nimport time\r\nimport logging\r\nfrom pathlib import Path\r\nfrom typing import Optional, List, Tuple, Union\r\nimport soundfile as sf\r\nimport numpy as np\r\n\r\n# Set up logging\r\nlogging.basicConfig(\r\n    level=logging.INFO,\r\n    format='%(asctime)s - %(levelname)s - %(message)s'\r\n)\r\nlogger = logging.getLogger(__name__)\r\n\r\n# Import from local modules\r\nfrom models import build_model, generate_speech, EnhancedKPipeline\r\nfrom chinese_config import (\r\n    ChineseTextProcessor,\r\n    ChineseTTSConfig,\r\n    CHINESE_VOICES,\r\n    get_chinese_voices,\r\n    get_chinese_voice_info\r\n)\r\nfrom config import TTSConfig\r\n\r\n# Constants\r\nDEFAULT_CHINESE_MODEL = \"kokoro-v1_1-zh.pth\"\r\nDEFAULT_CHINESE_OUTPUT = \"output_chinese.wav\"\r\nSAMPLE_RATE = 24000\r\nMIN_SPEED = 0.5\r\nMAX_SPEED = 2.0\r\nDEFAULT_SPEED = 1.0\r\n\r\n# Sample Chinese texts for testing\r\nSAMPLE_CHINESE_TEXTS = {\r\n    \"1\": {\r\n        \"title\": \"北风与太阳 (The North Wind and the Sun)\",\r\n        \"text\": \"你当旅行者裹着温暖的斗篷走来时,北风和太阳之争更强。他们同意,一个首先成功使旅行者脱下斗篷的人应该被认为比另一个更强大。然后,北风吹得尽力而为,但吹得越厉害,旅行者就越披上斗篷。最后,北风放弃了这一尝试。然后,太阳温暖地照耀着,旅行者立刻脱下了斗篷。因此,北风不得不承认太阳是两者中最强的一个。\"\r\n    },\r\n    \"2\": {\r\n        \"title\": \"简短测试 (Short Test)\",\r\n        \"text\": \"你好，这是一个中文文本转语音测试。\"\r\n    },\r\n    \"3\": {\r\n        \"title\": \"自定义输入 (Custom Input)\",\r\n        \"text\": None  # Will be entered by user\r\n    }\r\n}\r\n\r\n\r\ndef print_chinese_header():\r\n    \"\"\"Print application header\"\"\"\r\n    print(\"\\n\" + \"=\"*60)\r\n    print(\"  Kokoro-82M-v1.1 Chinese TTS Demo\")\r\n    print(\"  科克罗中文文本转语音演示\")\r\n    print(\"=\"*60 + \"\\n\")\r\n\r\n\r\ndef print_menu():\r\n    \"\"\"Print the main menu options in Chinese\"\"\"\r\n    print(\"\\n\" + \"-\"*40)\r\n    print(\"  主菜单 (Main Menu)\")\r\n    print(\"-\"*40)\r\n    print(\"1. 列出可用声音 (List available voices)\")\r\n    print(\"2. 生成语音 (Generate speech)\")\r\n    print(\"3. 从样本文本生成 (Generate from sample text)\")\r\n    print(\"4. 帮助 (Help)\")\r\n    print(\"5. 退出 (Exit)\")\r\n    print(\"-\"*40)\r\n    return input(\"请选择一个选项 (Select an option) (1-5): \").strip()\r\n\r\n\r\ndef print_help():\r\n    \"\"\"Print help information\"\"\"\r\n    print(\"\\n\" + \"=\"*60)\r\n    print(\"帮助信息 (Help Information)\")\r\n    print(\"=\"*60)\r\n    print(\"\"\"\r\n关于本程序 (About this program):\r\n  这是一个中文TTS演示程序，使用Kokoro-82M-v1.1中文模型。\r\n  This is a Chinese TTS demo using the Kokoro-82M-v1.1 Chinese model.\r\n\r\n功能 (Features):\r\n  - 支持8个中文女性和男性声音 (Supports 8 Chinese female and male voices)\r\n  - 可调节语速 (Adjustable speech speed)\r\n  - 支持自定义和预设文本 (Supports custom and preset texts)\r\n  - 自动文本处理和分割 (Automatic text processing and segmentation)\r\n\r\n声音列表 (Voice List):\r\n  女性声音 (Female voices):\r\n    zf_xiaobei  - 晓蓓 (Young, energetic)\r\n    zf_xiaoni   - 晓妮 (Clear, friendly)\r\n    zf_xiaoxiao - 晓晓 (Soft, gentle)\r\n    zf_xiaoyi   - 晓艺 (Professional, articulate)\r\n  \r\n  男性声音 (Male voices):\r\n    zm_yunjian  - 云健 (Strong, confident)\r\n    zm_yunxi    - 云析 (Warm, professional)\r\n    zm_yunxia   - 云夏 (Calm, steady)\r\n    zm_yunyang  - 云阳 (Resonant, deep)\r\n\r\n常见问题 (FAQ):\r\n  Q: 提示\"字数不匹配\" (Word count mismatch warning)?\r\n  A: 这通常是因为英文音素化器被用于中文文本。\r\n     请确保使用正确的中文模型和配置。\r\n     \r\n  Q: 生成的音频质量不好?\r\n  A: 尝试调整语速，使用不同的声音。\r\n     确保模型和声音文件完整。\r\n\"\"\")\r\n    print(\"=\"*60 + \"\\n\")\r\n\r\n\r\ndef list_chinese_voices():\r\n    \"\"\"List all available Chinese voices with details\"\"\"\r\n    print(\"\\n\" + \"-\"*60)\r\n    print(\"可用声音 (Available Chinese Voices)\")\r\n    print(\"-\"*60)\r\n    \r\n    voices = get_chinese_voices()\r\n    \r\n    # Organize by gender\r\n    female_voices = [v for v in voices if v.startswith('zf_')]\r\n    male_voices = [v for v in voices if v.startswith('zm_')]\r\n    \r\n    print(\"\\n女性声音 (Female Voices):\")\r\n    for i, voice in enumerate(female_voices, 1):\r\n        info = get_chinese_voice_info(voice)\r\n        print(f\"  {i}. {voice} - {info['name']} ({info['description']})\")\r\n    \r\n    print(\"\\n男性声音 (Male Voices):\")\r\n    for i, voice in enumerate(male_voices, 1):\r\n        info = get_chinese_voice_info(voice)\r\n        print(f\"  {i+len(female_voices)}. {voice} - {info['name']} ({info['description']})\")\r\n    \r\n    print(\"-\"*60 + \"\\n\")\r\n\r\n\r\ndef select_voice(voices: List[str]) -> str:\r\n    \"\"\"Interactive voice selection\"\"\"\r\n    print(\"\\n可用声音 (Available voices):\")\r\n    for i, voice in enumerate(voices, 1):\r\n        info = get_chinese_voice_info(voice)\r\n        print(f\"{i}. {voice} - {info['name']} ({info['description']})\")\r\n\r\n    while True:\r\n        try:\r\n            choice = input(\"\\n请选择一个声音编号 (Select a voice number) (or press Enter for 'zf_xiaobei'): \").strip()\r\n            if not choice:\r\n                return \"zf_xiaobei\"\r\n            choice = int(choice)\r\n            if 1 <= choice <= len(voices):\r\n                return voices[choice - 1]\r\n            print(f\"无效选择。请输入1到{len(voices)}之间的数字。(Invalid choice. Please try again.)\")\r\n        except ValueError:\r\n            print(\"请输入有效的数字。(Please enter a valid number.)\")\r\n\r\n\r\ndef get_chinese_text_input() -> str:\r\n    \"\"\"Get Chinese text input from user\"\"\"\r\n    print(\"\\n请输入要转换为语音的中文文本\")\r\n    print(\"(Enter the Chinese text you want to convert to speech)\")\r\n    print(\"(or press Enter to exit)\")\r\n    text = input(\"> \").strip()\r\n    return text\r\n\r\n\r\ndef get_speech_speed() -> float:\r\n    \"\"\"Get speech speed from user\"\"\"\r\n    while True:\r\n        try:\r\n            speed = input(f\"\\n请输入语速 (Enter speech speed) ({MIN_SPEED}-{MAX_SPEED}, default {DEFAULT_SPEED}): \").strip()\r\n            if not speed:\r\n                return DEFAULT_SPEED\r\n            speed = float(speed)\r\n            if MIN_SPEED <= speed <= MAX_SPEED:\r\n                return speed\r\n            print(f\"语速必须在 {MIN_SPEED} 和 {MAX_SPEED} 之间。(Speed must be between {MIN_SPEED} and {MAX_SPEED})\")\r\n        except ValueError:\r\n            print(\"请输入有效的数字。(Please enter a valid number.)\")\r\n\r\n\r\ndef select_sample_text() -> Optional[str]:\r\n    \"\"\"Select from predefined sample texts\"\"\"\r\n    print(\"\\n选择样本文本 (Select sample text):\")\r\n    for key, sample in SAMPLE_CHINESE_TEXTS.items():\r\n        print(f\"{key}. {sample['title']}\")\r\n        if sample[\"text\"]:\r\n            print(f\"   {sample['text'][:50]}...\")\r\n    \r\n    choice = input(\"\\n请选择 (Select): \").strip()\r\n    \r\n    if choice in SAMPLE_CHINESE_TEXTS:\r\n        if SAMPLE_CHINESE_TEXTS[choice][\"text\"]:\r\n            return SAMPLE_CHINESE_TEXTS[choice][\"text\"]\r\n        else:\r\n            # Custom input option\r\n            return get_chinese_text_input()\r\n    \r\n    return None\r\n\r\n\r\ndef load_chinese_model(model_path: str, device: str) -> EnhancedKPipeline:\r\n    \"\"\"Load the Chinese TTS model\r\n    \r\n    Args:\r\n        model_path: Path to the Chinese model file\r\n        device: Device to use ('cuda' or 'cpu')\r\n        \r\n    Returns:\r\n        EnhancedKPipeline instance configured for Chinese\r\n    \"\"\"\r\n    try:\r\n        # Check if model file exists\r\n        model_file = Path(model_path).resolve()\r\n        if not model_file.exists():\r\n            print(f\"错误: 找不到模型文件 (Error: Model file not found): {model_file}\")\r\n            print(f\"请确保您已下载 {DEFAULT_CHINESE_MODEL}\")\r\n            raise FileNotFoundError(f\"Chinese model not found: {model_file}\")\r\n        \r\n        # Build model with Chinese language code\r\n        logger.info(f\"加载中文模型 (Loading Chinese model): {model_path}\")\r\n        \r\n        # Import build_model to use with Chinese config\r\n        from models import build_model\r\n        \r\n        # We'll use language code 'z' for Chinese (Mandarin)\r\n        # Create a custom pipeline for Chinese\r\n        pipeline = build_model(model_path, device, repo_version=\"main\", lang_code='z')\r\n        \r\n        logger.info(\"中文模型加载成功 (Chinese model loaded successfully)\")\r\n        return pipeline\r\n        \r\n    except Exception as e:\r\n        logger.error(f\"加载中文模型时出错 (Error loading Chinese model): {e}\")\r\n        raise\r\n\r\n\r\ndef generate_chinese_speech(\r\n    model: EnhancedKPipeline,\r\n    text: str,\r\n    voice: str,\r\n    device: str = 'cpu',\r\n    speed: float = 1.0\r\n) -> Tuple[Optional[np.ndarray], Optional[str]]:\r\n    \"\"\"Generate speech for Chinese text\r\n    \r\n    Args:\r\n        model: EnhancedKPipeline instance\r\n        text: Chinese text to synthesize\r\n        voice: Voice name (e.g., 'zf_xiaobei')\r\n        device: Device to use\r\n        speed: Speech speed multiplier\r\n        \r\n    Returns:\r\n        Tuple of (audio_data, phonemes) or (None, None) on error\r\n    \"\"\"\r\n    try:\r\n        # Check if text contains Chinese characters\r\n        if not ChineseTextProcessor.is_chinese(text):\r\n            print(\"警告: 文本可能不是中文 (Warning: Text may not be Chinese)\")\r\n        \r\n        # Normalize Chinese text\r\n        text = ChineseTextProcessor.normalize_chinese_text(text)\r\n        logger.info(f\"已规范化文本 (Normalized text): {text[:50]}...\")\r\n        \r\n        # Generate speech\r\n        logger.info(f\"生成语音... (Generating speech...)\")\r\n        print(f\"  文本: {text[:100]}{'...' if len(text) > 100 else ''}\")\r\n        print(f\"  声音: {voice}\")\r\n        print(f\"  语速: {speed}x\")\r\n        \r\n        # Load voice file\r\n        voice_path = Path(\"voices\").resolve() / f\"{voice}.pt\"\r\n        if not voice_path.exists():\r\n            print(f\"错误: 找不到声音文件 (Error: Voice file not found): {voice_path}\")\r\n            return None, None\r\n        \r\n        # Generate using the model\r\n        audio_segments = []\r\n        all_phonemes = []\r\n        \r\n        try:\r\n            generator = model(\r\n                text,\r\n                voice=str(voice_path),\r\n                speed=speed,\r\n                split_pattern=r'\\n+'\r\n            )\r\n            \r\n            for gs, ps, audio in generator:\r\n                if audio is not None:\r\n                    # Convert to numpy if needed\r\n                    if isinstance(audio, torch.Tensor):\r\n                        audio = audio.detach().cpu().numpy()\r\n                    audio_segments.append(audio)\r\n                    all_phonemes.append(ps)\r\n                    logger.info(f\"生成了句段: {gs} (Generated segment: {gs})\")\r\n            \r\n            # Concatenate all audio segments\r\n            if audio_segments:\r\n                if len(audio_segments) == 1:\r\n                    final_audio = audio_segments[0]\r\n                else:\r\n                    final_audio = np.concatenate(audio_segments, axis=0)\r\n                \r\n                all_phonemes_str = \" \".join(all_phonemes) if all_phonemes else \"\"\r\n                return final_audio, all_phonemes_str\r\n            else:\r\n                print(\"错误: 没有生成音频 (Error: No audio was generated)\")\r\n                return None, None\r\n                \r\n        except Exception as e:\r\n            logger.error(f\"生成过程中出错 (Error during generation): {e}\")\r\n            import traceback\r\n            traceback.print_exc()\r\n            return None, None\r\n            \r\n    except Exception as e:\r\n        logger.error(f\"生成语音时出错 (Error generating speech): {e}\")\r\n        import traceback\r\n        traceback.print_exc()\r\n        return None, None\r\n\r\n\r\ndef save_audio(audio_data: np.ndarray, output_path: str = DEFAULT_CHINESE_OUTPUT) -> bool:\r\n    \"\"\"Save generated audio to file\r\n    \r\n    Args:\r\n        audio_data: Audio data as numpy array\r\n        output_path: Path to save the audio file\r\n        \r\n    Returns:\r\n        True if successful, False otherwise\r\n    \"\"\"\r\n    try:\r\n        output_path = Path(output_path).resolve()\r\n        output_path.parent.mkdir(parents=True, exist_ok=True)\r\n        \r\n        # Remove existing file if it exists\r\n        if output_path.exists():\r\n            output_path.unlink()\r\n        \r\n        logger.info(f\"保存音频到 (Saving audio to): {output_path}\")\r\n        sf.write(str(output_path), audio_data, SAMPLE_RATE)\r\n        print(f\"✓ 音频已保存 (Audio saved to): {output_path}\")\r\n        return True\r\n        \r\n    except Exception as e:\r\n        logger.error(f\"保存音频时出错 (Error saving audio): {e}\")\r\n        print(f\"✗ 无法保存音频 (Failed to save audio): {e}\")\r\n        return False\r\n\r\n\r\ndef main():\r\n    \"\"\"Main application loop\"\"\"\r\n    print_chinese_header()\r\n    \r\n    try:\r\n        # Set up device\r\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\r\n        print(f\"使用设备 (Using device): {device}\\n\")\r\n        \r\n        # Load model\r\n        print(\"初始化模型 (Initializing model)...\")\r\n        model = load_chinese_model(DEFAULT_CHINESE_MODEL, device)\r\n        print(\"✓ 模型已加载 (Model loaded)\\n\")\r\n        \r\n        # Get available voices\r\n        voices = get_chinese_voices()\r\n        if not voices:\r\n            print(\"错误: 找不到中文声音文件 (Error: No Chinese voices found)\")\r\n            print(f\"请确保中文声音文件在 voices/ 目录中\")\r\n            return\r\n        \r\n        # Main loop\r\n        while True:\r\n            choice = print_menu()\r\n            \r\n            if choice == \"1\":\r\n                # List voices\r\n                list_chinese_voices()\r\n                \r\n            elif choice == \"2\":\r\n                # Generate speech from user input\r\n                voice = select_voice(voices)\r\n                text = get_chinese_text_input()\r\n                \r\n                if not text:\r\n                    print(\"已取消 (Cancelled)\")\r\n                    continue\r\n                \r\n                speed = get_speech_speed()\r\n                \r\n                print(\"\\n生成中... (Generating...)\")\r\n                audio, phonemes = generate_chinese_speech(model, text, voice, device, speed)\r\n                \r\n                if audio is not None:\r\n                    if save_audio(audio):\r\n                        print(\"✓ 完成 (Done)\")\r\n                    else:\r\n                        print(\"✗ 保存失败 (Save failed)\")\r\n                else:\r\n                    print(\"✗ 生成失败 (Generation failed)\")\r\n                    \r\n            elif choice == \"3\":\r\n                # Generate from sample text\r\n                text = select_sample_text()\r\n                if text:\r\n                    voice = select_voice(voices)\r\n                    speed = get_speech_speed()\r\n                    \r\n                    print(\"\\n生成中... (Generating...)\")\r\n                    audio, phonemes = generate_chinese_speech(model, text, voice, device, speed)\r\n                    \r\n                    if audio is not None:\r\n                        if save_audio(audio):\r\n                            print(\"✓ 完成 (Done)\")\r\n                        else:\r\n                            print(\"✗ 保存失败 (Save failed)\")\r\n                    else:\r\n                        print(\"✗ 生成失败 (Generation failed)\")\r\n                        \r\n            elif choice == \"4\":\r\n                # Help\r\n                print_help()\r\n                \r\n            elif choice == \"5\":\r\n                # Exit\r\n                print(\"\\n再见！(Goodbye!)\")\r\n                break\r\n                \r\n            else:\r\n                print(\"无效选择。请重试。(Invalid choice. Please try again.)\")\r\n                \r\n    except KeyboardInterrupt:\r\n        print(\"\\n\\n用户中断 (User interrupted)\")\r\n    except Exception as e:\r\n        logger.error(f\"应用程序错误 (Application error): {e}\")\r\n        import traceback\r\n        traceback.print_exc()\r\n    finally:\r\n        print(\"\\n程序结束 (Program ended)\")\r\n        if torch.cuda.is_available():\r\n            torch.cuda.empty_cache()\r\n\r\n\r\nif __name__ == \"__main__\":\r\n    main()\r\n\r\n"
  },
  {
    "path": "config.py",
    "content": "\"\"\"\nCentralized Configuration System for Kokoro TTS Local\n----------------------------------------------------\nThis module provides centralized configuration management for all components\nof the Kokoro TTS Local application.\n\"\"\"\n\nimport os\nimport json\nfrom pathlib import Path\nfrom typing import Dict, Any, Optional\nimport logging\n\nlogger = logging.getLogger(__name__)\n\nclass TTSConfig:\n    \"\"\"Centralized configuration manager for TTS application\"\"\"\n    \n    def __init__(self, config_file: Optional[str] = None):\n        self.config_file = Path(config_file or \"tts_config.json\").resolve()\n        self._config = self._load_default_config()\n        self._load_config_file()\n    \n    def _load_default_config(self) -> Dict[str, Any]:\n        \"\"\"Load default configuration values\"\"\"\n        return {\n            # Audio settings\n            \"audio\": {\n                \"sample_rate\": 24000,\n                \"max_text_length_cli\": 10000,\n                \"max_text_length_web\": 5000,\n                \"min_speed\": 0.1,\n                \"max_speed\": 3.0,\n                \"default_speed\": 1.0,\n                \"supported_formats\": [\"wav\", \"mp3\", \"aac\"]\n            },\n            \n            # Model settings\n            \"model\": {\n                \"default_model_path\": \"kokoro-v1_0.pth\",\n                \"repo_id\": \"hexgrad/Kokoro-82M\",\n                \"repo_version\": \"main\",\n                \"default_language\": \"a\",\n                \"max_generation_time\": 300,\n                \"min_generation_time\": 60,\n                \"max_retries\": 3,\n                \"retry_delay\": 2\n            },\n            \n            # Path settings\n            \"paths\": {\n                \"voices_dir\": \"voices\",\n                \"outputs_dir\": \"outputs\",\n                \"cache_dir\": \".cache\",\n                \"config_file\": \"config.json\",\n                \"speed_dial_file\": \"speed_dial.json\"\n            },\n            \n            # Web interface settings\n            \"web\": {\n                \"server_name\": \"0.0.0.0\",\n                \"server_port\": 7860,\n                \"share\": False\n            },\n            \n            # CLI settings\n            \"cli\": {\n                \"default_output_file\": \"output.wav\"\n            },\n            \n            # Language codes mapping\n            \"language_codes\": {\n                'a': 'American English',\n                'b': 'British English',\n                'j': 'Japanese',\n                'z': 'Mandarin Chinese',\n                'e': 'Spanish',\n                'f': 'French',\n                'h': 'Hindi',\n                'i': 'Italian',\n                'p': 'Brazilian Portuguese'\n            },\n            \n            # Voice files list\n            \"voice_files\": [\n                # American English Female voices (11 voices)\n                \"af_heart.pt\", \"af_alloy.pt\", \"af_aoede.pt\", \"af_bella.pt\", \"af_jessica.pt\",\n                \"af_kore.pt\", \"af_nicole.pt\", \"af_nova.pt\", \"af_river.pt\", \"af_sarah.pt\", \"af_sky.pt\",\n                \n                # American English Male voices (9 voices)\n                \"am_adam.pt\", \"am_echo.pt\", \"am_eric.pt\", \"am_fenrir.pt\", \"am_liam.pt\",\n                \"am_michael.pt\", \"am_onyx.pt\", \"am_puck.pt\", \"am_santa.pt\",\n                \n                # British English Female voices (4 voices)\n                \"bf_alice.pt\", \"bf_emma.pt\", \"bf_isabella.pt\", \"bf_lily.pt\",\n                \n                # British English Male voices (4 voices)\n                \"bm_daniel.pt\", \"bm_fable.pt\", \"bm_george.pt\", \"bm_lewis.pt\",\n                \n                # Japanese voices (5 voices)\n                \"jf_alpha.pt\", \"jf_gongitsune.pt\", \"jf_nezumi.pt\", \"jf_tebukuro.pt\", \"jm_kumo.pt\",\n                \n                # Mandarin Chinese voices (8 voices)\n                \"zf_xiaobei.pt\", \"zf_xiaoni.pt\", \"zf_xiaoxiao.pt\", \"zf_xiaoyi.pt\",\n                \"zm_yunjian.pt\", \"zm_yunxi.pt\", \"zm_yunxia.pt\", \"zm_yunyang.pt\",\n                \n                # Spanish voices (3 voices)\n                \"ef_dora.pt\", \"em_alex.pt\", \"em_santa.pt\",\n                \n                # French voices (1 voice)\n                \"ff_siwis.pt\",\n                \n                # Hindi voices (4 voices)\n                \"hf_alpha.pt\", \"hf_beta.pt\", \"hm_omega.pt\", \"hm_psi.pt\",\n                \n                # Italian voices (2 voices)\n                \"if_sara.pt\", \"im_nicola.pt\",\n                \n                # Brazilian Portuguese voices (3 voices)\n                \"pf_dora.pt\", \"pm_alex.pt\", \"pm_santa.pt\"\n            ]\n        }\n    \n    def _load_config_file(self):\n        \"\"\"Load configuration from file if it exists\"\"\"\n        if self.config_file.exists():\n            try:\n                with open(self.config_file, 'r', encoding='utf-8') as f:\n                    file_config = json.load(f)\n                self._merge_config(file_config)\n                logger.info(f\"Loaded configuration from {self.config_file}\")\n            except (json.JSONDecodeError, IOError) as e:\n                logger.warning(f\"Failed to load config file {self.config_file}: {e}\")\n    \n    def _merge_config(self, file_config: Dict[str, Any]):\n        \"\"\"Merge file configuration with default configuration\"\"\"\n        def merge_dict(default: Dict, override: Dict):\n            for key, value in override.items():\n                if key in default and isinstance(default[key], dict) and isinstance(value, dict):\n                    merge_dict(default[key], value)\n                else:\n                    default[key] = value\n        \n        merge_dict(self._config, file_config)\n    \n    def get(self, key: str, default: Any = None) -> Any:\n        \"\"\"Get configuration value using dot notation (e.g., 'audio.sample_rate')\"\"\"\n        keys = key.split('.')\n        value = self._config\n        \n        for k in keys:\n            if isinstance(value, dict) and k in value:\n                value = value[k]\n            else:\n                return default\n        \n        return value\n    \n    def set(self, key: str, value: Any):\n        \"\"\"Set configuration value using dot notation\"\"\"\n        keys = key.split('.')\n        config = self._config\n        \n        for k in keys[:-1]:\n            if k not in config:\n                config[k] = {}\n            config = config[k]\n        \n        config[keys[-1]] = value\n    \n    def save(self):\n        \"\"\"Save current configuration to file\"\"\"\n        try:\n            self.config_file.parent.mkdir(parents=True, exist_ok=True)\n            with open(self.config_file, 'w', encoding='utf-8') as f:\n                json.dump(self._config, f, indent=2, ensure_ascii=False)\n            logger.info(f\"Configuration saved to {self.config_file}\")\n        except IOError as e:\n            logger.error(f\"Failed to save configuration: {e}\")\n    \n    def get_path(self, path_key: str) -> Path:\n        \"\"\"Get a path from configuration and return as resolved Path object\"\"\"\n        path_str = self.get(f\"paths.{path_key}\")\n        if path_str:\n            return Path(path_str).resolve()\n        raise ValueError(f\"Path key '{path_key}' not found in configuration\")\n    \n    def validate_sample_rate(self, rate: int) -> int:\n        \"\"\"Validate and normalize sample rate to acceptable values\n        \n        Returns the rate if valid, otherwise returns the default sample rate.\n        \"\"\"\n        valid_rates = [16000, 22050, 24000, 44100, 48000]\n        if rate not in valid_rates:\n            default_rate = self.get(\"audio.sample_rate\", 24000)\n            logger.warning(\n                f\"Invalid sample rate {rate}. Valid rates are {valid_rates}. \"\n                f\"Using default rate: {default_rate}\"\n            )\n            return default_rate\n        return rate\n    \n    def validate_language(self, lang: str) -> str:\n        \"\"\"Validate language code\"\"\"\n        valid_langs = list(self.get(\"language_codes\", {}).keys())\n        if lang not in valid_langs:\n            logger.warning(f\"Invalid language code '{lang}'. Using default.\")\n            logger.info(f\"Supported language codes: {', '.join(valid_langs)}\")\n            return self.get(\"model.default_language\", \"a\")\n        return lang\n    \n    def validate_speed(self, speed: float) -> float:\n        \"\"\"Validate speech speed is within acceptable range\"\"\"\n        min_speed = self.get(\"audio.min_speed\", 0.1)\n        max_speed = self.get(\"audio.max_speed\", 3.0)\n        \n        if speed < min_speed:\n            logger.warning(f\"Speed {speed} too low, using minimum {min_speed}\")\n            return min_speed\n        elif speed > max_speed:\n            logger.warning(f\"Speed {speed} too high, using maximum {max_speed}\")\n            return max_speed\n        \n        return speed\n\n# Global configuration instance\nconfig = TTSConfig()\n\n# Convenience functions for backward compatibility\ndef get_config(key: str, default: Any = None) -> Any:\n    \"\"\"Get configuration value\"\"\"\n    return config.get(key, default)\n\ndef set_config(key: str, value: Any):\n    \"\"\"Set configuration value\"\"\"\n    config.set(key, value)\n\ndef save_config():\n    \"\"\"Save configuration to file\"\"\"\n    config.save()\n\ndef get_path(path_key: str) -> Path:\n    \"\"\"Get a path from configuration\"\"\"\n    return config.get_path(path_key)"
  },
  {
    "path": "dependency_checker.py",
    "content": "\"\"\"\nDependency Version Checker for Kokoro TTS Local\n----------------------------------------------\nThis module checks if all required dependencies are installed and compatible.\n\"\"\"\n\nimport sys\nimport importlib\nimport subprocess\nfrom typing import Any, Dict, List, Tuple, Optional\nfrom packaging import version\nimport logging\n\nlogger = logging.getLogger(__name__)\n\n# Required dependencies with minimum versions\nREQUIRED_DEPENDENCIES = {\n    'torch': '1.9.0',\n    'kokoro': '0.9.2',\n    'gradio': '3.0.0',\n    'soundfile': '0.10.0',\n    'huggingface_hub': '0.10.0',\n    'pydub': '0.25.0',\n    'numpy': '1.19.0',\n    'pathlib': None,  # Built-in module\n    'tqdm': '4.60.0'\n}\n\n# Optional dependencies\nOPTIONAL_DEPENDENCIES = {\n    'espeakng_loader': '0.1.0',\n    'phonemizer': '3.0.0',\n    'misaki': '0.1.0',\n    'spacy': '3.0.0',\n    'num2words': '0.5.0'\n}\n\nclass DependencyChecker:\n    \"\"\"Check and validate dependencies\"\"\"\n    \n    def __init__(self):\n        self.missing_required = []\n        self.missing_optional = []\n        self.version_conflicts = []\n        self.warnings = []\n    \n    def check_python_version(self) -> bool:\n        \"\"\"Check if Python version is compatible\"\"\"\n        min_python = (3, 8)\n        current_python = sys.version_info[:2]\n        \n        if current_python < min_python:\n            logger.error(f\"Python {min_python[0]}.{min_python[1]}+ required, but {current_python[0]}.{current_python[1]} found\")\n            return False\n        \n        logger.info(f\"Python version {current_python[0]}.{current_python[1]} is compatible\")\n        return True\n    \n    def get_package_version(self, package_name: str) -> Optional[str]:\n        \"\"\"Get installed version of a package\"\"\"\n        try:\n            module = importlib.import_module(package_name)\n            # Try different version attributes\n            for attr in ['__version__', 'version', 'VERSION']:\n                if hasattr(module, attr):\n                    return getattr(module, attr)\n            \n            # For some packages, try getting version via pip\n            try:\n                result = subprocess.run(\n                    [sys.executable, '-m', 'pip', 'show', package_name],\n                    capture_output=True, text=True, timeout=10\n                )\n                if result.returncode == 0:\n                    for line in result.stdout.split('\\n'):\n                        if line.startswith('Version:'):\n                            return line.split(':', 1)[1].strip()\n            except (subprocess.TimeoutExpired, subprocess.SubprocessError):\n                pass\n            \n            return \"unknown\"\n            \n        except ImportError:\n            return None\n    \n    def check_dependency(self, package_name: str, min_version: Optional[str]) -> Tuple[bool, str]:\n        \"\"\"Check if a dependency is installed and meets version requirements\"\"\"\n        installed_version = self.get_package_version(package_name)\n        \n        if installed_version is None:\n            return False, f\"{package_name} is not installed\"\n        \n        if min_version is None:\n            return True, f\"{package_name} is installed (version: {installed_version})\"\n        \n        try:\n            if installed_version == \"unknown\":\n                self.warnings.append(f\"Could not determine version of {package_name}\")\n                return True, f\"{package_name} is installed (version: unknown)\"\n            \n            if version.parse(installed_version) >= version.parse(min_version):\n                return True, f\"{package_name} {installed_version} meets requirement (>= {min_version})\"\n            else:\n                return False, f\"{package_name} {installed_version} is too old (>= {min_version} required)\"\n                \n        except Exception as e:\n            self.warnings.append(f\"Error checking version of {package_name}: {e}\")\n            return True, f\"{package_name} is installed but version check failed\"\n    \n    def check_all_dependencies(self) -> bool:\n        \"\"\"Check all required and optional dependencies\"\"\"\n        logger.info(\"Checking dependencies...\")\n        \n        # Check Python version first\n        if not self.check_python_version():\n            return False\n        \n        all_good = True\n        \n        # Check required dependencies\n        logger.info(\"Checking required dependencies...\")\n        for package, min_ver in REQUIRED_DEPENDENCIES.items():\n            is_ok, message = self.check_dependency(package, min_ver)\n            \n            if is_ok:\n                logger.info(f\"✓ {message}\")\n            else:\n                logger.error(f\"✗ {message}\")\n                self.missing_required.append(package)\n                all_good = False\n        \n        # Check optional dependencies\n        logger.info(\"Checking optional dependencies...\")\n        for package, min_ver in OPTIONAL_DEPENDENCIES.items():\n            is_ok, message = self.check_dependency(package, min_ver)\n            \n            if is_ok:\n                logger.info(f\"✓ {message}\")\n            else:\n                logger.warning(f\"○ {message} (optional)\")\n                self.missing_optional.append(package)\n        \n        # Report warnings\n        for warning in self.warnings:\n            logger.warning(warning)\n        \n        return all_good\n    \n    def get_installation_commands(self) -> List[str]:\n        \"\"\"Get pip install commands for missing dependencies\"\"\"\n        commands = []\n        \n        if self.missing_required:\n            required_packages = []\n            for package in self.missing_required:\n                min_ver = REQUIRED_DEPENDENCIES.get(package)\n                if min_ver:\n                    required_packages.append(f\"{package}>={min_ver}\")\n                else:\n                    required_packages.append(package)\n            \n            if required_packages:\n                commands.append(f\"pip install {' '.join(required_packages)}\")\n        \n        if self.missing_optional:\n            optional_packages = []\n            for package in self.missing_optional:\n                min_ver = OPTIONAL_DEPENDENCIES.get(package)\n                if min_ver:\n                    optional_packages.append(f\"{package}>={min_ver}\")\n                else:\n                    optional_packages.append(package)\n            \n            if optional_packages:\n                commands.append(f\"pip install {' '.join(optional_packages)}  # Optional\")\n        \n        return commands\n    \n    def check_cuda_availability(self) -> Dict[str, Any]:\n        \"\"\"Check CUDA availability and provide information\"\"\"\n        cuda_info = {\n            'available': False,\n            'version': None,\n            'device_count': 0,\n            'devices': []\n        }\n        \n        try:\n            import torch\n            cuda_info['available'] = torch.cuda.is_available()\n            \n            if cuda_info['available']:\n                cuda_info['version'] = torch.version.cuda\n                cuda_info['device_count'] = torch.cuda.device_count()\n                \n                for i in range(cuda_info['device_count']):\n                    device_props = torch.cuda.get_device_properties(i)\n                    cuda_info['devices'].append({\n                        'id': i,\n                        'name': device_props.name,\n                        'memory': device_props.total_memory // (1024**3)  # GB\n                    })\n                \n                logger.info(f\"CUDA {cuda_info['version']} available with {cuda_info['device_count']} device(s)\")\n                for device in cuda_info['devices']:\n                    logger.info(f\"  Device {device['id']}: {device['name']} ({device['memory']}GB)\")\n            else:\n                logger.info(\"CUDA not available, will use CPU\")\n                \n        except Exception as e:\n            logger.warning(f\"Error checking CUDA availability: {e}\")\n        \n        return cuda_info\n\ndef check_dependencies() -> bool:\n    \"\"\"Main function to check all dependencies\"\"\"\n    checker = DependencyChecker()\n    \n    # Check dependencies\n    all_good = checker.check_all_dependencies()\n    \n    # Check CUDA\n    cuda_info = checker.check_cuda_availability()\n    \n    # Print summary\n    if not all_good:\n        logger.error(\"Some required dependencies are missing or incompatible!\")\n        logger.info(\"To install missing dependencies, run:\")\n        for cmd in checker.get_installation_commands():\n            logger.info(f\"  {cmd}\")\n        return False\n    \n    if checker.missing_optional:\n        logger.info(\"Some optional dependencies are missing. The application will work but some features may be disabled.\")\n        logger.info(\"To install optional dependencies, run:\")\n        for cmd in checker.get_installation_commands():\n            if \"Optional\" in cmd:\n                logger.info(f\"  {cmd}\")\n    \n    logger.info(\"All required dependencies are satisfied!\")\n    return True\n\nif __name__ == \"__main__\":\n    # Configure logging for standalone execution\n    logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')\n    \n    success = check_dependencies()\n    sys.exit(0 if success else 1)\n"
  },
  {
    "path": "docker-compose.yml",
    "content": "services:\n  kokoro-tts:\n    build:\n      context: .\n      dockerfile: Dockerfile\n    container_name: kokoro-tts-local\n    ports:\n      - \"7860:7860\"\n    environment:\n      # Uncomment the line below to run fully offline after initial model download\n      # - HF_HUB_OFFLINE=1\n    volumes:\n      - ./outputs:/app/outputs\n      - ./voices:/app/voices\n      - ./.cache:/app/.cache\n    restart: unless-stopped\n"
  },
  {
    "path": "gradio_interface.py",
    "content": "\"\"\"\r\nKokoro-TTS Local Generator\r\n-------------------------\r\nA Gradio interface for the Kokoro-TTS-Local text-to-speech system.\r\nSupports multiple voices and audio formats, with cross-platform compatibility.\r\n\r\nKey Features:\r\n- Multiple voice models support (54 voices across 8 languages)\r\n- Real-time generation with progress logging\r\n- WAV, MP3, and AAC output formats\r\n- Network sharing capabilities\r\n- Cross-platform compatibility (Windows, macOS, Linux)\r\n\r\nDependencies:\r\n- kokoro: Official Kokoro TTS library\r\n- gradio: Web interface framework\r\n- soundfile: Audio file handling\r\n- pydub: Audio format conversion\r\n\"\"\"\r\n\r\nimport gradio as gr\r\nimport os\r\nimport sys\r\nimport platform\r\nfrom datetime import datetime\r\nimport shutil\r\nfrom pathlib import Path\r\nimport soundfile as sf\r\nfrom pydub import AudioSegment\r\nimport torch\r\nimport numpy as np\r\nimport argparse\r\nfrom typing import Union, List, Optional, Tuple, Dict, Any\r\nfrom models import (\r\n    list_available_voices, build_model,\r\n    generate_speech, download_voice_files, EnhancedKPipeline\r\n)\r\nimport speed_dial\r\n\r\n# Constants\r\nMAX_TEXT_LENGTH = 5000\r\nDEFAULT_SAMPLE_RATE = 24000\r\nMIN_SPEED = 0.1\r\nMAX_SPEED = 3.0\r\nDEFAULT_SPEED = 1.0\r\n\r\n# Define path type for consistent handling\r\nPathLike = Union[str, Path]\r\n\r\n# Configuration validation\r\ndef validate_sample_rate(rate: int) -> int:\r\n    \"\"\"Validate sample rate is within acceptable range\"\"\"\r\n    valid_rates = [16000, 22050, 24000, 44100, 48000]\r\n    if rate not in valid_rates:\r\n        print(f\"Warning: Unusual sample rate {rate}. Valid rates are {valid_rates}\")\r\n        return 24000  # Default to safe value\r\n    return rate\r\n\r\n# Global configuration\r\nCONFIG_FILE = Path(\"tts_config.json\")  # Stores user preferences and paths\r\nDEFAULT_OUTPUT_DIR = Path(\"outputs\")    # Directory for generated audio files\r\nSAMPLE_RATE = validate_sample_rate(24000)  # Validated sample rate\r\n\r\n# Initialize model globally\r\ndevice = 'cuda' if torch.cuda.is_available() else 'cpu'\r\nmodel = None\r\n\r\nLANG_MAP = {\r\n    \"af_\": \"a\", \"am_\": \"a\",\r\n    \"bf_\": \"b\", \"bm_\": \"b\",\r\n    \"jf_\": \"j\", \"jm_\": \"j\",\r\n    \"zf_\": \"z\", \"zm_\": \"z\",\r\n    \"ef_\": \"e\", \"em_\": \"e\",\r\n    \"ff_\": \"f\",\r\n    \"hf_\": \"h\", \"hm_\": \"h\",\r\n    \"if_\": \"i\", \"im_\": \"i\",\r\n    \"pf_\": \"p\", \"pm_\": \"p\",\r\n}\r\npipelines = {}\r\n\r\ndef get_available_voices():\r\n    \"\"\"Get list of available voice models.\"\"\"\r\n    try:\r\n        # Initialize model to trigger voice downloads\r\n        global model\r\n        if model is None:\r\n            print(\"Initializing model and downloading voices...\")\r\n            model = build_model(None, device)\r\n\r\n        voices = list_available_voices()\r\n        if not voices:\r\n            print(\"No voices found after initialization. Attempting to download...\")\r\n            download_voice_files()  # Try downloading again\r\n            voices = list_available_voices()\r\n\r\n        print(\"Available voices:\", voices)\r\n        return voices\r\n    except Exception as e:\r\n        print(f\"Error getting voices: {e}\")\r\n        return []\r\n\r\ndef get_pipeline_for_voice(voice_name: str) -> EnhancedKPipeline:\n    \"\"\"\n    Determine the language code from the voice prefix and return the associated pipeline.\n    \"\"\"\n    prefix = voice_name[:3].lower()\n    lang_code = LANG_MAP.get(prefix, \"a\")\n    if lang_code not in pipelines:\n        print(f\"[INFO] Creating pipeline for lang_code='{lang_code}'\")\n        pipelines[lang_code] = build_model(None, device, lang_code=lang_code)\n    pipelines[lang_code].device = device\n    return pipelines[lang_code]\n\r\ndef convert_audio(input_path: PathLike, output_path: PathLike, format: str) -> Optional[PathLike]:\r\n    \"\"\"Convert audio to specified format.\r\n\r\n    Args:\r\n        input_path: Path to input audio file\r\n        output_path: Path to output audio file\r\n        format: Output format ('wav', 'mp3', or 'aac')\r\n\r\n    Returns:\r\n        Path to output file or None on error\r\n    \"\"\"\r\n    try:\r\n        # Normalize paths\r\n        input_path = Path(input_path).resolve()\r\n        output_path = Path(output_path).resolve()\r\n\r\n        # Validate input file\r\n        if not input_path.exists():\r\n            raise FileNotFoundError(f\"Input file not found: {input_path}\")\r\n\r\n        # For WAV format, just return the input path\r\n        if format.lower() == \"wav\":\r\n            return input_path\r\n\r\n        # Create output directory if it doesn't exist\r\n        output_path.parent.mkdir(parents=True, exist_ok=True)\r\n\r\n        # Convert format\r\n        audio = AudioSegment.from_wav(str(input_path))\r\n\r\n        # Select proper format and options\r\n        if format.lower() == \"mp3\":\r\n            audio.export(str(output_path), format=\"mp3\", bitrate=\"192k\")\r\n        elif format.lower() == \"aac\":\r\n            audio.export(str(output_path), format=\"aac\", bitrate=\"192k\")\r\n        else:\r\n            raise ValueError(f\"Unsupported format: {format}\")\r\n\r\n        # Verify file was created\r\n        if not output_path.exists() or output_path.stat().st_size == 0:\r\n            raise IOError(f\"Failed to create {format} file\")\r\n\r\n        return output_path\r\n\r\n    except (IOError, FileNotFoundError, ValueError) as e:\r\n        print(f\"Error converting audio: {type(e).__name__}: {e}\")\r\n        return None\r\n    except Exception as e:\r\n        print(f\"Unexpected error converting audio: {type(e).__name__}: {e}\")\r\n        import traceback\r\n        traceback.print_exc()\r\n        return None\r\n\r\ndef generate_tts_with_logs(voice_name: str, text: str, format: str, speed: float = 1.0) -> Optional[PathLike]:\r\n    \"\"\"Generate TTS audio with progress logging and memory management.\r\n\r\n    Args:\r\n        voice_name: Name of the voice to use\r\n        text: Text to convert to speech\r\n        format: Output format ('wav', 'mp3', 'aac')\r\n\r\n    Returns:\r\n        Path to generated audio file or None on error\r\n    \"\"\"\r\n    global model\r\n    import psutil\r\n    import gc\r\n\r\n    try:\r\n        # Check available memory before processing\r\n        memory = psutil.virtual_memory()\r\n        available_gb = memory.available / (1024**3)\r\n        \r\n        if available_gb < 1.0:  # Less than 1GB available\r\n            print(f\"Warning: Low memory available ({available_gb:.1f}GB). Consider closing other applications.\")\r\n            # Force garbage collection\r\n            gc.collect()\r\n            if torch.cuda.is_available():\r\n                torch.cuda.empty_cache()\r\n\r\n        # Initialize model if needed\r\n        if model is None:\r\n            print(\"Initializing model...\")\r\n            model = build_model(None, device)\r\n\r\n        # Create output directory\r\n        DEFAULT_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)\r\n\r\n        # Validate input text\r\n        if not text or not text.strip():\r\n            raise ValueError(\"Text input cannot be empty\")\r\n\r\n        # Dynamic text length limit based on available memory\r\n        MAX_CHARS = MAX_TEXT_LENGTH\r\n        if available_gb < 2.0:  # Less than 2GB available\r\n            MAX_CHARS = min(MAX_CHARS, 2000)  # Reduce limit for low memory\r\n            print(f\"Reduced text limit to {MAX_CHARS} characters due to low memory\")\r\n        \r\n        if len(text) > MAX_CHARS:\r\n            print(f\"Warning: Text exceeds {MAX_CHARS} characters. Truncating to prevent memory issues.\")\r\n            text = text[:MAX_CHARS] + \"...\"\r\n\r\n        # Generate base filename from text\r\n        timestamp = datetime.now().strftime(\"%Y%m%d_%H%M%S\")\r\n        base_name = f\"tts_{timestamp}\"\r\n        wav_path = DEFAULT_OUTPUT_DIR / f\"{base_name}.wav\"\r\n\r\n        # Generate speech\r\n        print(f\"\\nGenerating speech for: '{text}'\")\r\n        print(f\"Using voice: {voice_name}\")\r\n\r\n        # Validate voice path using Path for consistent handling\r\n        voice_path = Path(\"voices\").resolve() / f\"{voice_name}.pt\"\r\n        if not voice_path.exists():\r\n            raise FileNotFoundError(f\"Voice file not found: {voice_path}\")\r\n\r\n        try:\r\n            if voice_name.startswith(tuple(LANG_MAP.keys())):\n                pipeline = get_pipeline_for_voice(voice_name)\n                generator = pipeline(text, voice=str(voice_path), speed=speed, split_pattern=r'\\n+')\n            else:\n                generator = model(text, voice=str(voice_path), speed=speed, split_pattern=r'\\n+')\n\r\n            all_audio = []\r\n            max_segments = 100  # Safety limit for very long texts\r\n            segment_count = 0\r\n\r\n            for gs, ps, audio in generator:\r\n                segment_count += 1\r\n                if segment_count > max_segments:\r\n                    print(f\"Warning: Reached maximum segment limit ({max_segments})\")\r\n                    break\r\n\r\n                if audio is not None:\r\n                    if isinstance(audio, np.ndarray):\r\n                        audio = torch.from_numpy(audio).float()\r\n                    all_audio.append(audio)\r\n                    print(f\"Generated segment: {gs}\")\r\n                    if ps:  # Only print phonemes if available\r\n                        print(f\"Phonemes: {ps}\")\r\n\r\n            if not all_audio:\r\n                raise Exception(\"No audio generated\")\r\n        except Exception as e:\r\n            raise Exception(f\"Error in speech generation: {e}\")\r\n\r\n        # Combine audio segments and save\r\n        if not all_audio:\r\n            raise Exception(\"No audio segments were generated\")\r\n\r\n        # Handle single segment case without concatenation\r\n        if len(all_audio) == 1:\r\n            final_audio = all_audio[0]\r\n        else:\r\n            try:\r\n                final_audio = torch.cat(all_audio, dim=0)\r\n            except RuntimeError as e:\r\n                raise Exception(f\"Failed to concatenate audio segments: {e}\")\r\n\r\n        # Save audio file\r\n        try:\n            if isinstance(final_audio, torch.Tensor):\n                final_audio = final_audio.detach().cpu().numpy()\n            sf.write(wav_path, final_audio, SAMPLE_RATE)\n        except Exception as e:\n            raise Exception(f\"Failed to save audio file: {e}\")\n\r\n        # Convert to requested format if needed\r\n        if format.lower() != \"wav\":\r\n            output_path = DEFAULT_OUTPUT_DIR / f\"{base_name}.{format.lower()}\"\r\n            return convert_audio(wav_path, output_path, format.lower())\r\n\r\n        return wav_path\r\n\r\n    except Exception as e:\r\n        print(f\"Error generating speech: {e}\")\r\n        import traceback\r\n        traceback.print_exc()\r\n        return None\r\n\r\ndef create_interface(server_name=\"127.0.0.1\", server_port=7860):\r\n    \"\"\"Create and launch the Gradio interface.\"\"\"\r\n\r\n    # Get available voices\r\n    voices = get_available_voices()\r\n    if not voices:\r\n        print(\"No voices found! Please check the voices directory.\")\r\n        return\r\n\r\n    # Get speed dial presets\r\n    preset_names = speed_dial.get_preset_names()\r\n\r\n    # Create interface\r\n    with gr.Blocks(title=\"Kokoro TTS Generator\", fill_height=True) as interface:\r\n        gr.Markdown(\"# Kokoro TTS Generator\")\r\n\r\n        with gr.Row():\r\n            with gr.Column(scale=2):\r\n                gr.Markdown(\"## TTS Controls\")\r\n            \r\n            with gr.Column(scale=1):\r\n                gr.Markdown(\"## Speed Dial\")\r\n                \r\n        with gr.Row(equal_height=True):\r\n            with gr.Column(scale=2):\r\n                # Main TTS controls\r\n                \r\n                voice = gr.Dropdown(\r\n                    choices=voices,\r\n                    value=voices[0] if voices else None,\r\n                    label=\"Voice\"\r\n                )\r\n                text = gr.Textbox(\r\n                    lines=3,\r\n                    placeholder=\"Enter text to convert to speech...\",\r\n                    label=\"Text\"\r\n                )\r\n\r\n            with gr.Column(scale=1):\r\n                # Speed dial section\r\n                preset_dropdown = gr.Dropdown(\r\n                    choices=preset_names,\r\n                    value=preset_names[0] if preset_names else None,\r\n                    label=\"Saved Presets\",\r\n                    interactive=True\r\n                )\r\n                preset_name = gr.Textbox(\r\n                    placeholder=\"Enter preset name...\",\r\n                    label=\"New Preset Name\"\r\n                )\r\n\r\n        with gr.Row(equal_height=True):\r\n            with gr.Column(scale=2):\r\n                with gr.Row():\r\n                    format = gr.Radio(\r\n                        choices=[\"wav\", \"mp3\", \"aac\"],\r\n                        value=\"wav\",\r\n                        label=\"Output Format\"\r\n                    )\r\n                    speed = gr.Slider(\r\n                        minimum=0.5,\r\n                        maximum=2.0,\r\n                        value=1.0,\r\n                        step=0.1,\r\n                        label=\"Speed\"\r\n                    )\r\n\r\n            with gr.Column(scale=1):\r\n                load_preset = gr.Button(\"Load\")\r\n                save_preset = gr.Button(\"Save Current\")\r\n\r\n        with gr.Row():\r\n            with gr.Column(scale=2):\r\n                generate = gr.Button(\"Generate Speech\")\r\n\r\n            with gr.Column(scale=1):\r\n                delete_preset = gr.Button(\"Delete\")\r\n\r\n        with gr.Row():\r\n            # Output section\r\n            output = gr.Audio(label=\"Generated Audio\")\r\n\r\n        # Function to load a preset\r\n        def load_preset_fn(preset_name):\r\n            if not preset_name:\r\n                return None, None, None, None\r\n\r\n            preset = speed_dial.get_preset(preset_name)\r\n            if not preset:\r\n                return None, None, None, None\r\n\r\n            return preset[\"voice\"], preset[\"text\"], preset[\"format\"], preset[\"speed\"]\r\n\r\n        # Function to save a preset\r\n        def save_preset_fn(name, voice, text, format, speed):\r\n            if not name or not voice or not text:\r\n                return gr.update(value=\"Please provide a name, voice, and text\")\r\n\r\n            success = speed_dial.save_preset(name, voice, text, format, speed)\r\n\r\n            # Update the dropdown with the new preset list\r\n            preset_names = speed_dial.get_preset_names()\r\n\r\n            if success:\r\n                return gr.update(choices=preset_names, value=name)\r\n            else:\r\n                return gr.update(choices=preset_names)\r\n\r\n        # Function to delete a preset\r\n        def delete_preset_fn(name):\r\n            if not name:\r\n                return gr.update(value=\"Please select a preset to delete\")\r\n\r\n            success = speed_dial.delete_preset(name)\r\n\r\n            # Update the dropdown with the new preset list\r\n            preset_names = speed_dial.get_preset_names()\r\n\r\n            if success:\r\n                return gr.update(choices=preset_names, value=None)\r\n            else:\r\n                return gr.update(choices=preset_names)\r\n\r\n        # Connect the buttons to their functions\r\n        load_preset.click(\r\n            fn=load_preset_fn,\r\n            inputs=preset_dropdown,\r\n            outputs=[voice, text, format, speed]\r\n        )\r\n\r\n        save_preset.click(\r\n            fn=save_preset_fn,\r\n            inputs=[preset_name, voice, text, format, speed],\r\n            outputs=preset_dropdown\r\n        )\r\n\r\n        delete_preset.click(\r\n            fn=delete_preset_fn,\r\n            inputs=preset_dropdown,\r\n            outputs=preset_dropdown\r\n        )\r\n\r\n        # Connect the generate button\r\n        generate.click(\r\n            fn=generate_tts_with_logs,\r\n            inputs=[voice, text, format, speed],\r\n            outputs=output\r\n        )\r\n\r\n    # Launch interface\r\n    interface.launch(\r\n        server_name=server_name,\r\n        server_port=server_port,\r\n        share=False\r\n    )\r\n\r\ndef cleanup_resources():\r\n    \"\"\"Properly clean up resources when the application exits\"\"\"\r\n    global model\r\n\r\n    try:\r\n        print(\"Cleaning up resources...\")\r\n\r\n        # Clean up model resources\r\n        if model is not None:\r\n            print(\"Releasing model resources...\")\r\n\r\n            # Clear voice dictionary to release memory\r\n            if hasattr(model, 'voices') and model.voices is not None:\r\n                try:\r\n                    voice_count = len(model.voices)\r\n                    for voice_name in list(model.voices.keys()):\r\n                        try:\r\n                            # Release each voice explicitly\r\n                            model.voices[voice_name] = None\r\n                        except:\r\n                            pass\r\n                    model.voices.clear()\r\n                    print(f\"Cleared {voice_count} voice references\")\r\n                except Exception as ve:\r\n                    print(f\"Error clearing voices: {type(ve).__name__}: {ve}\")\r\n\r\n            # Clear model attributes that might hold tensors\r\n            for attr_name in dir(model):\r\n                if not attr_name.startswith('__') and hasattr(model, attr_name):\r\n                    try:\r\n                        attr = getattr(model, attr_name)\r\n                        # Handle specific tensor attributes\r\n                        if isinstance(attr, torch.Tensor):\r\n                            if attr.is_cuda:\r\n                                print(f\"Releasing CUDA tensor: {attr_name}\")\r\n                                setattr(model, attr_name, None)\r\n                        elif hasattr(attr, 'to'):  # Module or Tensor-like object\r\n                            setattr(model, attr_name, None)\r\n                    except:\r\n                        pass\r\n\r\n            # Delete model reference\r\n            try:\r\n                del model\r\n                model = None\r\n                print(\"Model reference deleted\")\r\n            except Exception as me:\r\n                print(f\"Error deleting model: {type(me).__name__}: {me}\")\r\n\r\n        # Clear CUDA memory explicitly\r\n        if torch.cuda.is_available():\r\n            try:\r\n                # Get initial memory usage\r\n                try:\r\n                    initial = torch.cuda.memory_allocated()\r\n                    initial_mb = initial / (1024 * 1024)\r\n                    print(f\"CUDA memory before cleanup: {initial_mb:.2f} MB\")\r\n                except:\r\n                    pass\r\n\r\n                # Free memory\r\n                print(\"Clearing CUDA cache...\")\r\n                torch.cuda.empty_cache()\r\n\r\n                # Force synchronization\r\n                try:\r\n                    torch.cuda.synchronize()\r\n                except:\r\n                    pass\r\n\r\n                # Get final memory usage\r\n                try:\r\n                    final = torch.cuda.memory_allocated()\r\n                    final_mb = final / (1024 * 1024)\r\n                    freed_mb = (initial - final) / (1024 * 1024)\r\n                    print(f\"CUDA memory after cleanup: {final_mb:.2f} MB (freed {freed_mb:.2f} MB)\")\r\n                except:\r\n                    pass\r\n            except Exception as ce:\r\n                print(f\"Error clearing CUDA memory: {type(ce).__name__}: {ce}\")\r\n\r\n        # Final garbage collection\r\n        try:\r\n            import gc\r\n            collected = gc.collect()\r\n            print(f\"Garbage collection completed: {collected} objects collected\")\r\n        except Exception as gce:\r\n            print(f\"Error during garbage collection: {type(gce).__name__}: {gce}\")\r\n\r\n        print(\"Cleanup completed\")\r\n\r\n    except Exception as e:\r\n        print(f\"Error during cleanup: {type(e).__name__}: {e}\")\r\n        import traceback\r\n        traceback.print_exc()\r\n\r\n# Register cleanup for normal exit\r\nimport atexit\r\natexit.register(cleanup_resources)\r\n\r\n# Register cleanup for signals\r\nimport signal\r\nimport sys\r\n\r\ndef signal_handler(signum, frame):\r\n    print(f\"\\nReceived signal {signum}, shutting down...\")\r\n    cleanup_resources()\r\n    sys.exit(0)\r\n\r\n# Register for common signals\r\nfor sig in [signal.SIGINT, signal.SIGTERM]:\r\n    try:\r\n        signal.signal(sig, signal_handler)\r\n    except (ValueError, AttributeError):\r\n        # Some signals might not be available on all platforms\r\n        pass\r\n\r\ndef parse_arguments():\r\n    \"\"\"Parse command line arguments for host and port configuration.\"\"\"\r\n    parser = argparse.ArgumentParser(\r\n        description=\"Kokoro TTS Local Generator - Gradio Web Interface\",\r\n        formatter_class=argparse.ArgumentDefaultsHelpFormatter\r\n    )\r\n    parser.add_argument(\r\n        \"--host\",\r\n        type=str,\r\n        default=\"127.0.0.1\",\r\n        help=\"Host address to bind the server to\"\r\n    )\r\n    parser.add_argument(\r\n        \"--port\",\r\n        type=int,\r\n        default=7860,\r\n        help=\"Port number to run the server on\"\r\n    )\r\n    return parser.parse_args()\r\n\r\nif __name__ == \"__main__\":\r\n    try:\r\n        args = parse_arguments()\r\n        create_interface(server_name=args.host, server_port=args.port)\r\n    finally:\r\n        # Ensure cleanup even if Gradio encounters an error\r\n        cleanup_resources()\r\n"
  },
  {
    "path": "models.py",
    "content": "\"\"\"Models module for Kokoro TTS Local\"\"\"\r\nfrom typing import Optional, Tuple, List\r\nimport torch\r\nfrom kokoro import KPipeline\r\nimport os\r\nimport json\r\nimport codecs\r\nfrom pathlib import Path\r\nimport numpy as np\r\nimport shutil\r\nimport threading\r\nimport warnings\r\nimport logging\r\n\r\n# Configure logging\r\nlogging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')\r\nlogger = logging.getLogger(__name__)\r\n\r\n# Suppress warnings from pre-trained model\r\nwarnings.filterwarnings(\"ignore\", message=\"dropout option adds dropout after all but last recurrent layer\")\r\nwarnings.filterwarnings(\"ignore\", message=\"`torch.nn.utils.weight_norm` is deprecated\")\r\n\r\n# Set environment variables for proper encoding\r\nos.environ[\"PYTHONIOENCODING\"] = \"utf-8\"\r\n# Disable symlinks warning\r\nos.environ[\"HF_HUB_DISABLE_SYMLINKS_WARNING\"] = \"1\"\r\n\r\n# Check if offline mode is enabled via environment variable\r\nOFFLINE_MODE = os.environ.get(\"HF_HUB_OFFLINE\", \"0\") == \"1\" or os.environ.get(\"TRANSFORMERS_OFFLINE\", \"0\") == \"1\"\r\nif OFFLINE_MODE:\r\n    logger.info(\"Running in OFFLINE mode - will only use locally cached files\")\r\n    # Ensure the environment variable is set for the kokoro library as well\r\n    os.environ[\"HF_HUB_OFFLINE\"] = \"1\"\r\n    os.environ[\"TRANSFORMERS_OFFLINE\"] = \"1\"\r\n\r\n# Setup for safer cleanup\r\nimport atexit\r\nimport signal\r\nimport sys\r\n\r\n# Track whether patches have been applied\r\n_patches_applied = {\r\n    'json_load': False\r\n}\r\n\r\nclass EnhancedKPipeline(KPipeline):\r\n    \"\"\"Enhanced KPipeline with improved voice loading and error handling\"\"\"\r\n    \r\n    def __init__(self, lang_code: str = 'a', model: bool = True):\r\n        super().__init__(lang_code=lang_code, model=model)\r\n        self.device = 'cpu'  # Default device\r\n        if not hasattr(self, 'voices'):\r\n            self.voices = {}\r\n    \r\n    def load_voice(self, voice_path: str) -> torch.Tensor:\r\n        \"\"\"Load voice model with improved error handling and path validation\"\"\"\r\n        voice_path = Path(voice_path).resolve()\r\n        \r\n        if not voice_path.exists():\r\n            raise FileNotFoundError(f\"Voice file not found: {voice_path}\")\r\n        \r\n        voice_name = voice_path.stem\r\n        \r\n        try:\r\n            logger.info(f\"Loading voice: {voice_name} from {voice_path}\")\r\n            voice_model = torch.load(str(voice_path), weights_only=True, map_location='cpu')\r\n            \r\n            if voice_model is None:\r\n                raise ValueError(f\"Failed to load voice model from {voice_path}\")\r\n            \r\n            # Move model to device and store in voices dictionary\r\n            self.voices[voice_name] = voice_model.to(self.device)\r\n            logger.info(f\"Successfully loaded voice: {voice_name}\")\r\n            return self.voices[voice_name]\r\n            \r\n        except Exception as e:\r\n            logger.error(f\"Error loading voice {voice_name}: {e}\")\r\n            raise\r\n\r\ndef _cleanup_patches() -> None:\r\n    \"\"\"Restore original functions that were patched\"\"\"\r\n    try:\r\n        if _patches_applied['json_load'] and _original_json_load is not None:\r\n            restore_json_load()\r\n            _patches_applied['json_load'] = False\r\n            logger.info(\"Restored original json.load function\")\r\n    except Exception as e:\r\n        logger.warning(f\"Error restoring json.load: {e}\")\r\n\r\n# Register cleanup for normal exit\r\natexit.register(_cleanup_patches)\r\n\r\ndef register_cleanup_signal_handlers() -> None:\r\n    \"\"\"Install process signal handlers for patch cleanup.\r\n\r\n    This is opt-in to avoid import-time global signal side effects when models.py\r\n    is imported by other applications.\r\n    \"\"\"\r\n    for sig in [signal.SIGINT, signal.SIGTERM]:\r\n        try:\r\n            signal.signal(sig, lambda signum, frame: (\r\n                logger.info(f\"Received signal {signum}, cleaning up...\"),\r\n                _cleanup_patches(),\r\n                sys.exit(1)\r\n            ))\r\n        except (ValueError, AttributeError) as e:\r\n            # Some signals might not be available on all platforms\r\n            logger.warning(f\"Could not register signal handler: {e}\")\r\n\r\n# List of available voice files (54 voices across 8 languages)\r\nVOICE_FILES = [\r\n    # American English Female voices (11 voices)\r\n    \"af_heart.pt\", \"af_alloy.pt\", \"af_aoede.pt\", \"af_bella.pt\", \"af_jessica.pt\",\r\n    \"af_kore.pt\", \"af_nicole.pt\", \"af_nova.pt\", \"af_river.pt\", \"af_sarah.pt\", \"af_sky.pt\",\r\n\r\n    # American English Male voices (9 voices)\r\n    \"am_adam.pt\", \"am_echo.pt\", \"am_eric.pt\", \"am_fenrir.pt\", \"am_liam.pt\",\r\n    \"am_michael.pt\", \"am_onyx.pt\", \"am_puck.pt\", \"am_santa.pt\",\r\n\r\n    # British English Female voices (4 voices)\r\n    \"bf_alice.pt\", \"bf_emma.pt\", \"bf_isabella.pt\", \"bf_lily.pt\",\r\n\r\n    # British English Male voices (4 voices)\r\n    \"bm_daniel.pt\", \"bm_fable.pt\", \"bm_george.pt\", \"bm_lewis.pt\",\r\n\r\n    # Japanese voices (5 voices)\r\n    \"jf_alpha.pt\", \"jf_gongitsune.pt\", \"jf_nezumi.pt\", \"jf_tebukuro.pt\", \"jm_kumo.pt\",\r\n\r\n    # Mandarin Chinese voices (8 voices)\r\n    \"zf_xiaobei.pt\", \"zf_xiaoni.pt\", \"zf_xiaoxiao.pt\", \"zf_xiaoyi.pt\",\r\n    \"zm_yunjian.pt\", \"zm_yunxi.pt\", \"zm_yunxia.pt\", \"zm_yunyang.pt\",\r\n\r\n    # Spanish voices (3 voices)\r\n    \"ef_dora.pt\", \"em_alex.pt\", \"em_santa.pt\",\r\n\r\n    # French voices (1 voice)\r\n    \"ff_siwis.pt\",\r\n\r\n    # Hindi voices (4 voices)\r\n    \"hf_alpha.pt\", \"hf_beta.pt\", \"hm_omega.pt\", \"hm_psi.pt\",\r\n\r\n    # Italian voices (2 voices)\r\n    \"if_sara.pt\", \"im_nicola.pt\",\r\n\r\n    # Brazilian Portuguese voices (3 voices)\r\n    \"pf_dora.pt\", \"pm_alex.pt\", \"pm_santa.pt\"\r\n]\r\n\r\n# Language code mapping for different languages\r\nLANGUAGE_CODES = {\r\n    'a': 'American English',\r\n    'b': 'British English',\r\n    'j': 'Japanese',\r\n    'z': 'Mandarin Chinese',\r\n    'e': 'Spanish',\r\n    'f': 'French',\r\n    'h': 'Hindi',\r\n    'i': 'Italian',\r\n    'p': 'Brazilian Portuguese'\r\n}\r\n\r\nVOICE_PREFIX_TO_LANGUAGE_CODE = {\r\n    'af': 'a', 'am': 'a',\r\n    'bf': 'b', 'bm': 'b',\r\n    'jf': 'j', 'jm': 'j',\r\n    'zf': 'z', 'zm': 'z',\r\n    'ef': 'e', 'em': 'e',\r\n    'ff': 'f', 'fm': 'f',\r\n    'hf': 'h', 'hm': 'h',\r\n    'if': 'i', 'im': 'i',\r\n    'pf': 'p', 'pm': 'p',\r\n}\r\n\r\n\r\ndef patch_json_load() -> None:\r\n    \"\"\"Patch json.load to handle UTF-8 encoded files with special characters\"\"\"\r\n    global _patches_applied, _original_json_load\r\n    if _patches_applied['json_load']:\r\n        return\r\n\r\n    _original_json_load = json.load  # Store for restoration\r\n\r\n    def read_json_content(fp, encoding: str) -> str:\r\n        if hasattr(fp, 'seek'):\r\n            fp.seek(0)\r\n\r\n        if hasattr(fp, 'buffer'):\r\n            raw_content = fp.buffer.read()\r\n            return raw_content.decode(\r\n                encoding,\r\n                errors='replace' if encoding == 'utf-8-sig' else 'strict'\r\n            ).lstrip('\\ufeff')\r\n\r\n        content = fp.read()\r\n        if isinstance(content, bytes):\r\n            return content.decode(\r\n                encoding,\r\n                errors='replace' if encoding == 'utf-8-sig' else 'strict'\r\n            ).lstrip('\\ufeff')\r\n        return content.lstrip('\\ufeff')\r\n\r\n    def custom_load(fp, *args, **kwargs):\r\n        try:\r\n            content = read_json_content(fp, 'utf-8')\r\n        except UnicodeDecodeError:\r\n            content = read_json_content(fp, 'utf-8-sig')\r\n\r\n        try:\r\n            return json.loads(content, *args, **kwargs)\r\n        except json.JSONDecodeError as e:\r\n            logger.error(f\"JSON parsing error: {e}\")\r\n            raise\r\n\r\n    json.load = custom_load\r\n    _patches_applied['json_load'] = True\r\n\r\n# Store the original load function for potential restoration\r\n_original_json_load = None\r\n\r\ndef restore_json_load() -> None:\r\n    \"\"\"Restore the original json.load function\"\"\"\r\n    global _original_json_load, _patches_applied\r\n    if _original_json_load is not None and _patches_applied['json_load']:\r\n        json.load = _original_json_load\r\n        _original_json_load = None\r\n        _patches_applied['json_load'] = False\r\n\r\ndef load_config(config_path: str) -> dict:\r\n    \"\"\"Load configuration file with proper encoding handling\"\"\"\r\n    config_path = Path(config_path).resolve()\r\n    \r\n    try:\r\n        with codecs.open(str(config_path), 'r', encoding='utf-8') as f:\r\n            return json.load(f)\r\n    except UnicodeDecodeError:\r\n        # Fallback to utf-8-sig if regular utf-8 fails\r\n        with codecs.open(str(config_path), 'r', encoding='utf-8-sig') as f:\r\n            return json.load(f)\r\n\r\n# Initialize espeak-ng\r\nphonemizer_available = False  # Global flag to track if phonemizer is working\r\ncurrent_phonemizer_lang = None  # Track current phonemizer language\r\n\r\ndef initialize_phonemizer(language: str = 'en-us') -> bool:\r\n    \"\"\"Initialize phonemizer for a specific language\r\n    \r\n    Args:\r\n        language: Language code for phonemizer (e.g., 'en-us', 'zh')\r\n        \r\n    Returns:\r\n        True if initialization successful, False otherwise\r\n    \"\"\"\r\n    global phonemizer_available, current_phonemizer_lang\r\n    \r\n    try:\r\n        from phonemizer.backend.espeak.wrapper import EspeakWrapper\r\n        from phonemizer import phonemize\r\n        import espeakng_loader\r\n\r\n        # Make library available first\r\n        library_path = espeakng_loader.get_library_path()\r\n        data_path = espeakng_loader.get_data_path()\r\n        espeakng_loader.make_library_available()\r\n\r\n        # Set up espeak-ng paths\r\n        EspeakWrapper.library_path = library_path\r\n        EspeakWrapper.data_path = data_path\r\n\r\n        # Verify espeak-ng is working with specified language\r\n        try:\r\n            test_text = 'test' if language in ['en-us', 'en-gb'] else '测试'\r\n            test_phonemes = phonemize(test_text, language=language)\r\n            if test_phonemes:\r\n                phonemizer_available = True\r\n                current_phonemizer_lang = language\r\n                logger.info(f\"Phonemizer successfully initialized for language: {language}\")\r\n                return True\r\n            else:\r\n                logger.warning(\"Phonemization returned empty result\")\r\n                return False\r\n        except Exception as e:\r\n            # Continue without espeak functionality - be more specific about error types\r\n            if \"espeak\" in str(e).lower():\r\n                logger.warning(f\"eSpeak not found: {e}\")\r\n            else:\r\n                logger.warning(f\"Phonemizer initialization error: {e}\")\r\n            return False\r\n\r\n    except ImportError as e:\r\n        logger.warning(f\"Phonemizer packages not installed: {e}\")\r\n        logger.info(\"If you want phoneme visualization, manually install required packages:\")\r\n        logger.info(\"pip install espeakng-loader phonemizer-fork\")\r\n        return False\r\n\r\n# Initialize default English phonemizer\r\ntry:\r\n    initialize_phonemizer('en-us')\r\nexcept Exception as e:\r\n    logger.warning(f\"Could not initialize default phonemizer: {e}\")\r\n\r\n# Initialize pipeline globally with thread safety\r\n_pipeline = None\r\n_pipeline_lock = threading.RLock()  # Reentrant lock for thread safety\r\n_voice_cache_lock = threading.RLock()  # Separate lock for voice cache operations\r\n_download_lock = threading.Lock()  # Lock for download operations\r\n\r\ndef download_voice_files(voice_files: Optional[List[str]] = None, repo_version: str = \"main\", required_count: int = 1) -> List[str]:\r\n    \"\"\"Download voice files from Hugging Face with enhanced progress tracking.\r\n\r\n    Args:\r\n        voice_files: Optional list of voice files to download. If None, download all VOICE_FILES.\r\n        repo_version: Version/tag of the repository to use (default: \"main\")\r\n        required_count: Minimum number of voices required (default: 1)\r\n\r\n    Returns:\r\n        List of successfully downloaded voice files\r\n\r\n    Raises:\r\n        ValueError: If fewer than required_count voices could be downloaded\r\n    \"\"\"\r\n    from concurrent.futures import ThreadPoolExecutor, as_completed\r\n    from tqdm import tqdm\r\n    import hashlib\r\n    import time\r\n    \r\n    # Use absolute path for voices directory\r\n    voices_dir = Path(\"voices\").resolve()\r\n    voices_dir.mkdir(exist_ok=True)\r\n\r\n    # Import here to avoid startup dependency\r\n    from huggingface_hub import hf_hub_download\r\n    downloaded_voices = []\r\n    failed_voices = []\r\n\r\n    # If specific voice files are requested, use those. Otherwise use all.\r\n    files_to_download = voice_files if voice_files is not None else VOICE_FILES\r\n    total_files = len(files_to_download)\r\n\r\n    logger.info(f\"Downloading voice files... ({total_files} total files)\")\r\n\r\n    # Check for existing voice files first\r\n    existing_files = []\r\n    for voice_file in files_to_download:\r\n        voice_path = voices_dir / voice_file\r\n        if voice_path.exists() and voice_path.stat().st_size > 0:\r\n            logger.info(f\"Voice file {voice_file} already exists\")\r\n            downloaded_voices.append(voice_file)\r\n            existing_files.append(voice_file)\r\n\r\n    # Remove existing files from the download list\r\n    files_to_download = [f for f in files_to_download if f not in existing_files]\r\n    if not files_to_download and downloaded_voices:\r\n        logger.info(f\"All required voice files already exist ({len(downloaded_voices)} files)\")\r\n        return downloaded_voices\r\n    \r\n    # In offline mode, only use existing files\r\n    if OFFLINE_MODE:\r\n        if not downloaded_voices:\r\n            error_msg = \"No voice files found locally and running in OFFLINE mode. Please download voice files first with network connection.\"\r\n            logger.error(error_msg)\r\n            raise ValueError(error_msg)\r\n        elif len(downloaded_voices) < required_count:\r\n            error_msg = f\"Only {len(downloaded_voices)} voice files found locally, but {required_count} were required. Running in OFFLINE mode.\"\r\n            logger.error(error_msg)\r\n            raise ValueError(error_msg)\r\n        else:\r\n            logger.info(f\"Using {len(downloaded_voices)} locally cached voice files (OFFLINE mode)\")\r\n            return downloaded_voices\r\n\r\n    def download_single_voice(voice_file: str) -> Tuple[str, bool, str]:\r\n        \"\"\"Download a single voice file with retry logic\"\"\"\r\n        retry_count = 3\r\n        retry_delay = 2\r\n        \r\n        for attempt in range(retry_count):\r\n            try:\r\n                # Download with exponential backoff\r\n                if attempt > 0:\r\n                    delay = retry_delay * (2 ** (attempt - 1))\r\n                    time.sleep(delay)\r\n                \r\n                # Download directly to voices directory\r\n                import tempfile\r\n                temp_dir = tempfile.mkdtemp()\r\n                try:\r\n                    downloaded_path = hf_hub_download(\r\n                        repo_id=\"hexgrad/Kokoro-82M\",\r\n                        filename=f\"voices/{voice_file}\",\r\n                        local_dir=temp_dir,\r\n                        force_download=False,\r\n                        revision=repo_version,\r\n                        local_files_only=OFFLINE_MODE\r\n                    )\r\n                    \r\n                    # Verify file integrity with basic size check\r\n                    if Path(downloaded_path).stat().st_size == 0:\r\n                        raise ValueError(f\"Downloaded file {voice_file} has zero size\")\r\n                    \r\n                    # Move to final location\r\n                    voice_path = voices_dir / voice_file\r\n                    shutil.move(downloaded_path, str(voice_path))\r\n                    \r\n                    return voice_file, True, f\"Successfully downloaded {voice_file}\"\r\n                finally:\r\n                    # Clean up temporary directory\r\n                    try:\r\n                        shutil.rmtree(temp_dir)\r\n                    except:\r\n                        pass\r\n                    \r\n            except Exception as e:\r\n                error_msg = f\"Failed to download {voice_file} (attempt {attempt+1}/{retry_count}): {e}\"\r\n                if attempt == retry_count - 1:\r\n                    return voice_file, False, error_msg\r\n                logger.warning(error_msg)\r\n        \r\n        return voice_file, False, f\"Failed all {retry_count} attempts to download {voice_file}\"\r\n\r\n    # Download files with progress bar and parallel processing\r\n    if files_to_download:\r\n        logger.info(f\"Downloading {len(files_to_download)} missing voice files...\")\r\n        \r\n        with ThreadPoolExecutor(max_workers=3) as executor:  # Limit concurrent downloads\r\n            # Submit all download tasks\r\n            future_to_voice = {\r\n                executor.submit(download_single_voice, voice_file): voice_file\r\n                for voice_file in files_to_download\r\n            }\r\n            \r\n            # Process completed downloads with progress bar\r\n            with tqdm(total=len(files_to_download), desc=\"Downloading voices\") as pbar:\r\n                for future in as_completed(future_to_voice):\r\n                    voice_file, success, message = future.result()\r\n                    \r\n                    if success:\r\n                        downloaded_voices.append(voice_file)\r\n                        logger.info(message)\r\n                    else:\r\n                        failed_voices.append(voice_file)\r\n                        logger.error(message)\r\n                    \r\n                    pbar.update(1)\r\n\r\n    # Report results\r\n    if failed_voices:\r\n        logger.warning(f\"Failed to download {len(failed_voices)} voice files: {', '.join(failed_voices)}\")\r\n\r\n    if not downloaded_voices:\r\n        error_msg = \"No voice files could be downloaded. Please check your internet connection.\"\r\n        logger.error(error_msg)\r\n        raise ValueError(error_msg)\r\n    elif len(downloaded_voices) < required_count:\r\n        error_msg = f\"Only {len(downloaded_voices)} voice files could be downloaded, but {required_count} were required.\"\r\n        logger.error(error_msg)\r\n        raise ValueError(error_msg)\r\n    else:\r\n        logger.info(f\"Successfully processed {len(downloaded_voices)} voice files\")\r\n\r\n    return downloaded_voices\r\n\r\ndef build_model(\r\n    model_path: Optional[str],\r\n    device: str,\r\n    repo_version: str = \"main\",\r\n    lang_code: str = 'a'\r\n) -> EnhancedKPipeline:\r\n    \"\"\"Build and return the Enhanced Kokoro pipeline with proper encoding configuration\r\n\r\n    Args:\r\n        model_path: Path to the model file or None to use default\r\n        device: Device to use ('cuda' or 'cpu')\r\n        repo_version: Version/tag of the repository to use (default: \"main\")\r\n        lang_code: Language code for the model (default: 'a' for American English, 'z' for Chinese)\r\n\r\n    Returns:\r\n        Initialized EnhancedKPipeline instance\r\n    \"\"\"\r\n    global _pipeline, _pipeline_lock\r\n\r\n    # Use a lock for thread safety\r\n    with _pipeline_lock:\r\n        # Don't reuse pipeline if language code is different\r\n        # (each language may need different configuration)\r\n        if _pipeline is not None and hasattr(_pipeline, 'lang_code') and _pipeline.lang_code == lang_code:\r\n            _pipeline.device = device\r\n            return _pipeline\r\n\r\n        try:\r\n            # Determine if this is a Chinese model\r\n            is_chinese_model = lang_code == 'z' or (model_path and 'zh' in str(model_path).lower())\r\n\r\n            # Download model if it doesn't exist\r\n            if model_path is None:\r\n                model_path = 'kokoro-v1_1-zh.pth' if is_chinese_model else 'kokoro-v1_0.pth'\r\n\r\n            model_path = os.path.abspath(model_path)\r\n            if not os.path.exists(model_path):\r\n                if OFFLINE_MODE:\r\n                    error_msg = f\"Model file {model_path} not found and running in OFFLINE mode. Please download the model first with network connection.\"\r\n                    logger.error(error_msg)\r\n                    raise ValueError(error_msg)\r\n                \r\n                logger.info(f\"Downloading model file {model_path}...\")\r\n                try:\r\n                    from huggingface_hub import hf_hub_download\r\n                    \r\n                    # Determine filename and repo for download\r\n                    filename = 'kokoro-v1_1-zh.pth' if is_chinese_model else 'kokoro-v1_0.pth'\r\n                    model_repo_id = \"hexgrad/Kokoro-82M-v1.1-zh\" if is_chinese_model else \"hexgrad/Kokoro-82M\"\r\n\r\n                    model_path = hf_hub_download(\r\n                        repo_id=model_repo_id,\r\n                        filename=filename,\r\n                        local_dir=\".\",\r\n                        force_download=False,\r\n                        revision=repo_version,\r\n                        local_files_only=OFFLINE_MODE\r\n                    )\r\n                    logger.info(f\"Model downloaded to {model_path}\")\r\n                except Exception as e:\r\n                    logger.error(f\"Error downloading model: {e}\")\r\n                    raise ValueError(f\"Could not download model: {e}\") from e\r\n\r\n            # Download config if it doesn't exist\r\n            config_path = os.path.abspath(\"config.json\")\r\n            if not os.path.exists(config_path):\r\n                if OFFLINE_MODE:\r\n                    error_msg = f\"Config file {config_path} not found and running in OFFLINE mode. Please download the config first with network connection.\"\r\n                    logger.error(error_msg)\r\n                    raise ValueError(error_msg)\r\n                \r\n                logger.info(\"Downloading config file...\")\r\n                try:\r\n                    from huggingface_hub import hf_hub_download\r\n                    config_path = hf_hub_download(\r\n                        repo_id=\"hexgrad/Kokoro-82M\",\r\n                        filename=\"config.json\",\r\n                        local_dir=\".\",\r\n                        force_download=False,\r\n                        revision=repo_version,\r\n                        local_files_only=OFFLINE_MODE\r\n                    )\r\n                    logger.info(f\"Config downloaded to {config_path}\")\r\n                except Exception as e:\r\n                    logger.error(f\"Error downloading config: {e}\")\r\n                    raise ValueError(f\"Could not download config: {e}\") from e\r\n\r\n            # Initialize phonemizer for the appropriate language\r\n            if is_chinese_model:\r\n                logger.info(\"Initializing phonemizer for Chinese...\")\r\n                try:\r\n                    initialize_phonemizer('zh')\r\n                except Exception as e:\r\n                    logger.warning(f\"Could not initialize Chinese phonemizer: {e}\")\r\n            else:\r\n                logger.info(\"Initializing phonemizer for English...\")\r\n                try:\r\n                    initialize_phonemizer('en-us')\r\n                except Exception as e:\r\n                    logger.warning(f\"Could not initialize English phonemizer: {e}\")\r\n\r\n            # Download voice files - require at least one voice\r\n            try:\r\n                downloaded_voices = download_voice_files(repo_version=repo_version, required_count=1)\r\n            except ValueError as e:\r\n                logger.error(f\"Error: Voice files download failed: {e}\")\r\n                raise ValueError(\"Voice files download failed\") from e\r\n\r\n            # Validate language code\r\n            supported_codes = list(LANGUAGE_CODES.keys())\r\n            if lang_code not in supported_codes:\r\n                logger.warning(f\"Unsupported language code '{lang_code}'. Using 'a' (American English).\")\r\n                logger.info(f\"Supported language codes: {', '.join(supported_codes)}\")\r\n                lang_code = 'a'\r\n\r\n            # Initialize pipeline with validated language code\r\n            patch_applied_here = not _patches_applied['json_load']\r\n            if patch_applied_here:\r\n                patch_json_load()\r\n            try:\r\n                pipeline_instance = EnhancedKPipeline(lang_code=lang_code)\r\n            finally:\r\n                if patch_applied_here:\r\n                    restore_json_load()\r\n\r\n            if pipeline_instance is None:\r\n                raise ValueError(\"Failed to initialize EnhancedKPipeline - pipeline is None\")\r\n\r\n            # Store language code and device\r\n            pipeline_instance.lang_code = lang_code\r\n            pipeline_instance.device = device\r\n\r\n            # Try to load the first available voice with improved error handling\r\n            voice_loaded = False\r\n            matching_voice_files = [\r\n                voice_file\r\n                for voice_file in downloaded_voices\r\n                if get_language_code_from_voice(Path(voice_file).stem) == lang_code\r\n            ]\r\n\r\n            if not matching_voice_files:\r\n                logger.warning(\r\n                    \"No voice files matched language code '%s'; falling back to any downloaded voice\",\r\n                    lang_code\r\n                )\r\n\r\n            for voice_file in matching_voice_files or downloaded_voices:\r\n                voice_path = os.path.abspath(os.path.join(\"voices\", voice_file))\r\n                if os.path.exists(voice_path):\r\n                    try:\r\n                        pipeline_instance.load_voice(voice_path)\r\n                        logger.info(f\"Successfully loaded voice: {voice_file}\")\r\n                        voice_loaded = True\r\n                        break  # Successfully loaded a voice\r\n                    except Exception as e:\r\n                        logger.warning(f\"Warning: Failed to load voice {voice_file}: {e}\")\r\n                        continue\r\n\r\n            if not voice_loaded:\r\n                logger.warning(\"Warning: Could not load any voice models\")\r\n\r\n            # Set the global _pipeline only after successful initialization\r\n            _pipeline = pipeline_instance\r\n\r\n        except Exception as e:\r\n            logger.error(f\"Error initializing pipeline: {e}\")\r\n            raise\r\n\r\n        return _pipeline\r\n\r\ndef list_available_voices() -> List[str]:\r\n    \"\"\"List all available voice models\"\"\"\r\n    # Always use absolute path for consistency\r\n    voices_dir = Path(os.path.abspath(\"voices\"))\r\n\r\n    # Create voices directory if it doesn't exist\r\n    if not voices_dir.exists():\r\n        print(f\"Creating voices directory at {voices_dir}\")\r\n        voices_dir.mkdir(exist_ok=True)\r\n        return []\r\n\r\n    # Get all .pt files in the voices directory\r\n    voice_files = list(voices_dir.glob(\"*.pt\"))\r\n\r\n    # If we found voice files, return them\r\n    if voice_files:\r\n        return [f.stem for f in sorted(voice_files, key=lambda f: f.stem.lower())]\r\n\r\n    # If no voice files in standard location, check if we need to do a one-time migration\r\n    # This is legacy support for older installations\r\n    alt_voices_path = Path(\".\") / \"voices\"\r\n    if alt_voices_path.exists() and alt_voices_path.is_dir() and alt_voices_path != voices_dir:\r\n        print(f\"Checking alternative voice location: {alt_voices_path.absolute()}\")\r\n        alt_voice_files = list(alt_voices_path.glob(\"*.pt\"))\r\n\r\n        if alt_voice_files:\r\n            print(f\"Found {len(alt_voice_files)} voice files in alternate location\")\r\n            print(\"Moving files to the standard voices directory...\")\r\n\r\n            # Process files in a batch for efficiency\r\n            files_moved = 0\r\n            for voice_file in alt_voice_files:\r\n                target_path = voices_dir / voice_file.name\r\n                if not target_path.exists():\r\n                    try:\r\n                        # Use copy2 to preserve metadata, then remove original if successful\r\n                        shutil.copy2(str(voice_file), str(target_path))\r\n                        files_moved += 1\r\n                    except (OSError, IOError) as e:\r\n                        print(f\"Error copying {voice_file.name}: {e}\")\r\n\r\n            if files_moved > 0:\r\n                print(f\"Successfully moved {files_moved} voice files\")\r\n                return [f.stem for f in sorted(voices_dir.glob(\"*.pt\"), key=lambda f: f.stem.lower())]\r\n\r\n    print(\"No voice files found. Please run the application again to download voices.\")\r\n    return []\r\n\r\ndef get_language_code_from_voice(voice_name: str) -> str:\r\n    \"\"\"Get the appropriate language code from a voice name\r\n\r\n    Args:\r\n        voice_name: Name of the voice (e.g., 'af_bella', 'jf_alpha')\r\n\r\n    Returns:\r\n        Language code for the voice\r\n    \"\"\"\r\n    prefix = voice_name[:2].lower() if len(voice_name) >= 2 else 'af'\r\n    return VOICE_PREFIX_TO_LANGUAGE_CODE.get(prefix, 'a')  # Default to American English\r\n\r\ndef load_voice(voice_name: str, device: str) -> torch.Tensor:\r\n    \"\"\"Load a voice model in a thread-safe manner\r\n\r\n    Args:\r\n        voice_name: Name of the voice to load (with or without .pt extension)\r\n        device: Device to use ('cuda' or 'cpu')\r\n\r\n    Returns:\r\n        Loaded voice model tensor\r\n\r\n    Raises:\r\n        ValueError: If voice file not found or loading fails\r\n    \"\"\"\r\n    # Format voice path correctly - strip .pt if it was included\r\n    voice_name = voice_name.replace('.pt', '')\r\n    pipeline = build_model(None, device, lang_code=get_language_code_from_voice(voice_name))\r\n    voice_path = Path(\"voices\").resolve() / f\"{voice_name}.pt\"\r\n\r\n    if not voice_path.exists():\r\n        raise ValueError(f\"Voice file not found: {voice_path}\")\r\n\r\n    # Use a lock to ensure thread safety when loading voices\r\n    with _pipeline_lock:\r\n        # Check if voice is already loaded\r\n        if voice_name in pipeline.voices:\r\n            return pipeline.voices[voice_name]\r\n\r\n        # Load voice if not already loaded\r\n        return pipeline.load_voice(str(voice_path))\r\n\r\ndef generate_speech(\r\n    model: EnhancedKPipeline,\r\n    text: str,\r\n    voice: str,\r\n    lang: str = 'a',\r\n    device: str = 'cpu',\r\n    speed: float = 1.0\r\n) -> Tuple[Optional[torch.Tensor], Optional[str]]:\r\n    \"\"\"Generate speech using the Kokoro pipeline in a thread-safe manner\r\n\r\n    Args:\r\n        model: EnhancedKPipeline instance\r\n        text: Text to synthesize\r\n        voice: Voice name (e.g. 'af_bella')\r\n        lang: Language code ('a' for American English, 'b' for British English)\r\n        device: Device to use ('cuda' or 'cpu')\r\n        speed: Speech speed multiplier (default: 1.0)\r\n\r\n    Returns:\r\n        Tuple of (audio tensor, phonemes string) or (None, None) on error\r\n    \"\"\"\r\n    global _pipeline_lock\r\n\r\n    try:\r\n        if model is None:\r\n            raise ValueError(\"Model is None - pipeline not properly initialized\")\r\n\r\n        # Format voice name and path\r\n        voice_name = voice.replace('.pt', '')\r\n        voice_path = Path(\"voices\").resolve() / f\"{voice_name}.pt\"\r\n\r\n        # Check if voice file exists\r\n        if not voice_path.exists():\r\n            raise ValueError(f\"Voice file not found: {voice_path}\")\r\n\r\n        # Thread-safe initialization of model properties and voice loading\r\n        with _pipeline_lock:\r\n            # Ensure device is set\r\n            model.device = device\r\n\r\n            # Ensure voice is loaded before generating\r\n            if voice_name not in model.voices:\r\n                logger.info(f\"Loading voice {voice_name}...\")\r\n                try:\r\n                    model.load_voice(str(voice_path))\r\n                    if voice_name not in model.voices:\r\n                        raise ValueError(\"Voice load succeeded but voice not in model.voices dictionary\")\r\n                except Exception as e:\r\n                    raise ValueError(f\"Failed to load voice {voice_name}: {e}\")\r\n\r\n        # Generate speech (outside the lock for better concurrency)\r\n        logger.info(f\"Generating speech with device: {model.device}\")\r\n        generator = model(\r\n            text,\r\n            voice=str(voice_path),\r\n            speed=speed,\r\n            split_pattern=r'\\n+'\r\n        )\r\n\r\n        # Get first generated segment and convert numpy array to tensor if needed\r\n        for gs, ps, audio in generator:\r\n            if audio is not None:\r\n                if isinstance(audio, np.ndarray):\r\n                    audio = torch.from_numpy(audio).float()\r\n                return audio, ps\r\n\r\n        return None, None\r\n    except (ValueError, FileNotFoundError, RuntimeError, KeyError, AttributeError, TypeError) as e:\r\n        logger.error(f\"Error generating speech: {e}\")\r\n        return None, None\r\n    except Exception as e:\r\n        logger.error(f\"Unexpected error during speech generation: {e}\")\r\n        import traceback\r\n        traceback.print_exc()\r\n        return None, None\r\n"
  },
  {
    "path": "requirements.txt",
    "content": "kokoro  # Official Kokoro TTS library (v1.0 model support)\nmisaki  # G2P library for Kokoro (multi-language support)\ntorch  # PyTorch for model inference (for GPU support, see README.md for CUDA-specific installation)\ntorchaudio  # PyTorch audio processing library\nsoundfile  # Audio file handling\nhuggingface-hub  # Model downloads from Hugging Face\ngradio  # Web interface\npydub  # For audio format conversion\nespeakng-loader  # For loading espeak-ng library\nphonemizer-fork  # For phoneme generation\nwheel  # For building packages\nsetuptools  # For installing packages\nmaturin  # Build dependency for underthesea-core\nnum2words  # For number to word conversion\nspacy  # For text processing\ntqdm  # Progress bars\npsutil  # System and process monitoring\npackaging  # Version parsing for dependency checking\nnumpy<2.0 # Numerical computing\nunderthesea\n\n# Japan Language Libraries\nfugashi[unidic]\njaconv\nmojimoji\npyopenjtalk\n\n# Korean Language Libraries\njamo\nnltk\n\n# Mandarin Language Libraries\ncn2an\njieba\nordered-set\npypinyin\npypinyin-dict\n\n# Hebrew Language Libraries\nhttps://files.pythonhosted.org/packages/44/17/9efdef222f2fc8e1ca721d919738d69d8b2358554a99f27b0764905f60fd/mishkal_hebrew-0.3.2-py3-none-any.whl\n"
  },
  {
    "path": "setup_chinese_tts.py",
    "content": "\"\"\"\r\nSetup Script for Kokoro Chinese TTS\r\n===================================\r\n\r\nThis script downloads and sets up the Kokoro-v1.1-zh Chinese TTS model\r\nand all required voice files.\r\n\r\nUsage:\r\n    python setup_chinese_tts.py\r\n\"\"\"\r\n\r\nimport os\r\nimport sys\r\nfrom pathlib import Path\r\nimport logging\r\nfrom typing import List, Tuple\r\n\r\n# Configure logging\r\nlogging.basicConfig(\r\n    level=logging.INFO,\r\n    format='%(asctime)s - %(levelname)s - %(message)s'\r\n)\r\nlogger = logging.getLogger(__name__)\r\n\r\n# Configuration\r\nCHINESE_MODEL_FILE = \"kokoro-v1_1-zh.pth\"\r\nCONFIG_FILE = \"config.json\"\r\nVOICES_DIR = Path(\"voices\").resolve()\r\n\r\nCHINESE_VOICES = [\r\n    # Female voices\r\n    \"zf_xiaobei.pt\",\r\n    \"zf_xiaoni.pt\",\r\n    \"zf_xiaoxiao.pt\",\r\n    \"zf_xiaoyi.pt\",\r\n    # Male voices\r\n    \"zm_yunjian.pt\",\r\n    \"zm_yunxi.pt\",\r\n    \"zm_yunxia.pt\",\r\n    \"zm_yunyang.pt\"\r\n]\r\n\r\n\r\ndef print_header():\r\n    \"\"\"Print setup header\"\"\"\r\n    print(\"\\n\" + \"=\"*60)\r\n    print(\"  Kokoro-82M-v1.1 Chinese TTS Setup\")\r\n    print(\"  科克罗中文TTS设置\")\r\n    print(\"=\"*60 + \"\\n\")\r\n\r\n\r\ndef check_dependencies() -> bool:\r\n    \"\"\"Check if required packages are installed\"\"\"\r\n    print(\"检查依赖 (Checking dependencies)...\")\r\n    \r\n    required_packages = {\r\n        'torch': 'PyTorch',\r\n        'huggingface_hub': 'Hugging Face Hub',\r\n        'kokoro': 'Kokoro',\r\n        'soundfile': 'SoundFile'\r\n    }\r\n    \r\n    missing = []\r\n    for package, name in required_packages.items():\r\n        try:\r\n            __import__(package)\r\n            print(f\"  ✓ {name}\")\r\n        except ImportError:\r\n            print(f\"  ✗ {name}\")\r\n            missing.append(package)\r\n    \r\n    if missing:\r\n        print(f\"\\n缺少必需的包 (Missing packages): {', '.join(missing)}\")\r\n        print(\"请运行: pip install -r requirements.txt\")\r\n        return False\r\n    \r\n    print(\"✓ 所有依赖已安装 (All dependencies installed)\\n\")\r\n    return True\r\n\r\n\r\ndef download_file(repo_id: str, filename: str, local_dir: str = \".\") -> bool:\r\n    \"\"\"Download a file from Hugging Face Hub\r\n    \r\n    Args:\r\n        repo_id: Repository ID (e.g., \"hexgrad/Kokoro-82M\")\r\n        filename: File to download\r\n        local_dir: Local directory to save to\r\n        \r\n    Returns:\r\n        True if successful, False otherwise\r\n    \"\"\"\r\n    try:\r\n        from huggingface_hub import hf_hub_download\r\n        \r\n        print(f\"下载 (Downloading): {filename}...\")\r\n        \r\n        # Download the file\r\n        downloaded_path = hf_hub_download(\r\n            repo_id=repo_id,\r\n            filename=filename,\r\n            local_dir=local_dir,\r\n            force_download=False\r\n        )\r\n        \r\n        print(f\"  ✓ 完成 (Done): {filename}\")\r\n        return True\r\n        \r\n    except Exception as e:\r\n        print(f\"  ✗ 错误 (Error): {e}\")\r\n        return False\r\n\r\n\r\ndef download_model() -> bool:\r\n    \"\"\"Download the Chinese TTS model\"\"\"\r\n    print(\"\\n下载中文TTS模型 (Downloading Chinese TTS Model)...\")\r\n    print(\"-\" * 60)\r\n    \r\n    model_path = Path(CHINESE_MODEL_FILE).resolve()\r\n    \r\n    # Check if already exists\r\n    if model_path.exists():\r\n        size_mb = model_path.stat().st_size / (1024 * 1024)\r\n        print(f\"✓ 模型文件已存在 (Model already exists): {model_path}\")\r\n        print(f\"  大小 (Size): {size_mb:.1f} MB\")\r\n        return True\r\n    \r\n    # Download\r\n    success = download_file(\r\n        \"hexgrad/Kokoro-82M-v1.1-zh\",\r\n        CHINESE_MODEL_FILE,\r\n        local_dir=\".\"\r\n    )\r\n    \r\n    if success and model_path.exists():\r\n        size_mb = model_path.stat().st_size / (1024 * 1024)\r\n        print(f\"✓ 模型已下载 (Model downloaded): {size_mb:.1f} MB\\n\")\r\n        return True\r\n    else:\r\n        print(f\"✗ 模型下载失败 (Model download failed)\\n\")\r\n        return False\r\n\r\n\r\ndef download_config() -> bool:\r\n    \"\"\"Download the model configuration file\"\"\"\r\n    print(\"下载配置文件 (Downloading Config File)...\")\r\n    print(\"-\" * 60)\r\n    \r\n    config_path = Path(CONFIG_FILE).resolve()\r\n    \r\n    # Check if already exists\r\n    if config_path.exists():\r\n        print(f\"✓ 配置文件已存在 (Config already exists): {config_path}\")\r\n        return True\r\n    \r\n    # Download\r\n    success = download_file(\r\n        \"hexgrad/Kokoro-82M\",\r\n        CONFIG_FILE,\r\n        local_dir=\".\"\r\n    )\r\n    \r\n    if success and config_path.exists():\r\n        print(f\"✓ 配置文件已下载 (Config downloaded)\\n\")\r\n        return True\r\n    else:\r\n        print(f\"✗ 配置文件下载失败 (Config download failed)\\n\")\r\n        return False\r\n\r\n\r\ndef download_voices() -> Tuple[int, int]:\r\n    \"\"\"Download all Chinese voice files\r\n    \r\n    Returns:\r\n        Tuple of (successful_downloads, failed_downloads)\r\n    \"\"\"\r\n    print(\"下载中文声音文件 (Downloading Chinese Voice Files)...\")\r\n    print(\"-\" * 60)\r\n    \r\n    # Create voices directory\r\n    VOICES_DIR.mkdir(parents=True, exist_ok=True)\r\n    \r\n    successful = 0\r\n    failed = 0\r\n    \r\n    for voice_file in CHINESE_VOICES:\r\n        voice_path = VOICES_DIR / voice_file\r\n        \r\n        # Check if already exists\r\n        if voice_path.exists():\r\n            size_mb = voice_path.stat().st_size / (1024 * 1024)\r\n            print(f\"✓ {voice_file} ({size_mb:.1f} MB)\")\r\n            successful += 1\r\n            continue\r\n        \r\n        # Download\r\n        try:\r\n            from huggingface_hub import hf_hub_download\r\n            \r\n            print(f\"下载 (Downloading): {voice_file}...\")\r\n            \r\n            downloaded_path = hf_hub_download(\r\n                repo_id=\"hexgrad/Kokoro-82M\",\r\n                filename=f\"voices/{voice_file}\",\r\n                local_dir=str(VOICES_DIR.parent),\r\n                force_download=False\r\n            )\r\n            \r\n            size_mb = Path(downloaded_path).stat().st_size / (1024 * 1024)\r\n            print(f\"  ✓ 完成 (Done): {voice_file} ({size_mb:.1f} MB)\")\r\n            successful += 1\r\n            \r\n        except Exception as e:\r\n            print(f\"  ✗ 错误 (Error): {voice_file} - {e}\")\r\n            failed += 1\r\n    \r\n    print(f\"\\n✓ 成功: {successful}/{len(CHINESE_VOICES)} (Successful: {successful}/{len(CHINESE_VOICES)})\")\r\n    if failed > 0:\r\n        print(f\"✗ 失败: {failed}/{len(CHINESE_VOICES)} (Failed: {failed}/{len(CHINESE_VOICES)})\")\r\n    \r\n    print()\r\n    return successful, failed\r\n\r\n\r\ndef verify_setup() -> bool:\r\n    \"\"\"Verify that all required files are in place\"\"\"\r\n    print(\"验证设置 (Verifying Setup)...\")\r\n    print(\"-\" * 60)\r\n    \r\n    all_good = True\r\n    \r\n    # Check model\r\n    model_path = Path(CHINESE_MODEL_FILE).resolve()\r\n    if model_path.exists():\r\n        print(f\"✓ 中文模型 (Chinese Model): {CHINESE_MODEL_FILE}\")\r\n    else:\r\n        print(f\"✗ 缺少模型 (Missing Model): {CHINESE_MODEL_FILE}\")\r\n        all_good = False\r\n    \r\n    # Check config\r\n    config_path = Path(CONFIG_FILE).resolve()\r\n    if config_path.exists():\r\n        print(f\"✓ 配置文件 (Config File): {CONFIG_FILE}\")\r\n    else:\r\n        print(f\"✗ 缺少配置 (Missing Config): {CONFIG_FILE}\")\r\n        all_good = False\r\n    \r\n    # Check voices\r\n    print(f\"\\n中文声音文件 (Chinese Voice Files):\")\r\n    voice_count = 0\r\n    for voice_file in CHINESE_VOICES:\r\n        voice_path = VOICES_DIR / voice_file\r\n        if voice_path.exists():\r\n            print(f\"  ✓ {voice_file}\")\r\n            voice_count += 1\r\n        else:\r\n            print(f\"  ✗ {voice_file}\")\r\n            all_good = False\r\n    \r\n    print(f\"\\n✓ 已找到 {voice_count}/{len(CHINESE_VOICES)} 个声音文件\")\r\n    print(f\"(Found {voice_count}/{len(CHINESE_VOICES)} voice files)\\n\")\r\n    \r\n    return all_good\r\n\r\n\r\ndef print_summary(success: bool, model_ok: bool, config_ok: bool, voices_count: int):\r\n    \"\"\"Print setup summary\"\"\"\r\n    print(\"=\"*60)\r\n    print(\"  设置摘要 (Setup Summary)\")\r\n    print(\"=\"*60)\r\n    \r\n    if success:\r\n        print(\"\\n✓ 设置完成！(Setup Complete!)\")\r\n        print(\"\\n下一步 (Next Steps):\")\r\n        print(\"1. 运行演示: python chinese_tts_demo.py\")\r\n        print(\"   (Run demo: python chinese_tts_demo.py)\")\r\n    else:\r\n        print(\"\\n⚠ 设置未完成 (Setup Incomplete)\")\r\n        print(\"\\n缺少的文件 (Missing Files):\")\r\n        if not model_ok:\r\n            print(f\"  - {CHINESE_MODEL_FILE}\")\r\n        if not config_ok:\r\n            print(f\"  - {CONFIG_FILE}\")\r\n        if voices_count < len(CHINESE_VOICES):\r\n            print(f\"  - 声音文件 ({voices_count}/{len(CHINESE_VOICES)}) (Voice files)\")\r\n    \r\n    print(\"\\n\"+\"=\"*60 + \"\\n\")\r\n\r\n\r\ndef main():\r\n    \"\"\"Main setup function\"\"\"\r\n    print_header()\r\n    \r\n    # Check dependencies\r\n    if not check_dependencies():\r\n        print(\"请先安装依赖 (Please install dependencies first)\")\r\n        return False\r\n    \r\n    # Download files\r\n    model_ok = download_model()\r\n    config_ok = download_config()\r\n    voice_success, voice_failed = download_voices()\r\n    \r\n    # Verify setup\r\n    print()\r\n    setup_ok = verify_setup()\r\n    \r\n    # Summary\r\n    print_summary(\r\n        setup_ok,\r\n        model_ok,\r\n        config_ok,\r\n        voice_success\r\n    )\r\n    \r\n    return setup_ok\r\n\r\n\r\nif __name__ == \"__main__\":\r\n    try:\r\n        success = main()\r\n        sys.exit(0 if success else 1)\r\n    except KeyboardInterrupt:\r\n        print(\"\\n\\n设置被用户中止 (Setup interrupted by user)\")\r\n        sys.exit(1)\r\n    except Exception as e:\r\n        logger.error(f\"设置错误 (Setup error): {e}\")\r\n        import traceback\r\n        traceback.print_exc()\r\n        sys.exit(1)\r\n\r\n"
  },
  {
    "path": "speed_dial.py",
    "content": "\"\"\"\nSpeed Dial Module for Kokoro-TTS-Local\n--------------------------------------\nManages speed dial presets for quick access to frequently used voice and text combinations.\n\nThis module provides functions to:\n- Load speed dial presets from a JSON file\n- Save new presets to the JSON file\n- Delete presets from the JSON file\n- Validate preset data\n\"\"\"\n\nimport json\nimport os\nfrom pathlib import Path\nfrom typing import Dict, List, Optional, Any\n\n# Define the path for the speed dial presets file\nSPEED_DIAL_FILE = Path(\"speed_dial.json\")\n\ndef load_presets() -> Dict[str, Dict[str, Any]]:\n    \"\"\"\n    Load speed dial presets from the JSON file.\n    \n    Returns:\n        Dictionary of presets where keys are preset names and values are preset data\n    \"\"\"\n    if not SPEED_DIAL_FILE.exists():\n        # If file doesn't exist, return an empty dictionary\n        return {}\n    \n    try:\n        with open(SPEED_DIAL_FILE, 'r', encoding='utf-8') as f:\n            presets = json.load(f)\n\n        if not isinstance(presets, dict):\n            print(\n                \"Error loading speed dial presets: \"\n                f\"expected a JSON object, got {type(presets).__name__}\"\n            )\n            return {}\n        \n        # Validate the loaded presets\n        validated_presets = {}\n        for name, preset in presets.items():\n            if not isinstance(name, str) or not isinstance(preset, dict):\n                print(f\"Skipping invalid preset entry: {name!r}\")\n                continue\n            if validate_preset(preset):\n                validated_presets[name] = preset\n        \n        return validated_presets\n    except (json.JSONDecodeError, IOError) as e:\n        print(f\"Error loading speed dial presets: {e}\")\n        return {}\n\ndef save_preset(name: str, voice: str, text: str, format: str = \"wav\", speed: float = 1.0) -> bool:\n    \"\"\"\n    Save a new speed dial preset.\n    \n    Args:\n        name: Name of the preset\n        voice: Voice to use\n        text: Text to convert to speech\n        format: Output format (default: \"wav\")\n        speed: Speech speed (default: 1.0)\n        \n    Returns:\n        True if successful, False otherwise\n    \"\"\"\n    import re\n    \n    # Validate preset name\n    if not isinstance(name, str) or len(name.strip()) == 0:\n        print(\"Preset name must be a non-empty string\")\n        return False\n    \n    if len(name) > 50:\n        print(\"Preset name is too long (max 50 characters)\")\n        return False\n    \n    # Only allow safe characters in preset names\n    if not re.match(r'^[a-zA-Z0-9_\\- ]+$', name):\n        print(\"Preset name contains invalid characters\")\n        return False\n    \n    # Create preset data\n    preset = {\n        \"voice\": voice,\n        \"text\": text,\n        \"format\": format,\n        \"speed\": speed\n    }\n    \n    # Validate preset data\n    if not validate_preset(preset):\n        return False\n    \n    # Load existing presets\n    presets = load_presets()\n    \n    # Add or update the preset\n    presets[name] = preset\n    \n    # Save presets to file\n    try:\n        with open(SPEED_DIAL_FILE, 'w', encoding='utf-8') as f:\n            json.dump(presets, f, indent=2, ensure_ascii=False)\n        return True\n    except IOError as e:\n        print(f\"Error saving speed dial preset: {e}\")\n        return False\n\ndef delete_preset(name: str) -> bool:\n    \"\"\"\n    Delete a speed dial preset.\n    \n    Args:\n        name: Name of the preset to delete\n        \n    Returns:\n        True if successful, False otherwise\n    \"\"\"\n    # Load existing presets\n    presets = load_presets()\n    \n    # Check if preset exists\n    if name not in presets:\n        return False\n    \n    # Remove the preset\n    del presets[name]\n    \n    # Save presets to file\n    try:\n        with open(SPEED_DIAL_FILE, 'w', encoding='utf-8') as f:\n            json.dump(presets, f, indent=2, ensure_ascii=False)\n        return True\n    except IOError as e:\n        print(f\"Error deleting speed dial preset: {e}\")\n        return False\n\ndef validate_preset(preset: Dict[str, Any]) -> bool:\n    \"\"\"\n    Validate a preset's data structure with security checks.\n    \n    Args:\n        preset: Preset data to validate\n        \n    Returns:\n        True if valid, False otherwise\n    \"\"\"\n    import re\n    \n    # Check required fields\n    required_fields = [\"voice\", \"text\"]\n    for field in required_fields:\n        if field not in preset:\n            print(f\"Preset missing required field: {field}\")\n            return False\n    \n    # Check field types and validate content\n    voice = preset.get(\"voice\")\n    if not isinstance(voice, str):\n        print(\"Preset voice must be a string\")\n        return False\n    \n    # Validate voice name (alphanumeric, underscore, dash only)\n    if not re.match(r'^[a-zA-Z0-9_-]+$', voice):\n        print(\"Preset voice contains invalid characters\")\n        return False\n    \n    text = preset.get(\"text\")\n    if not isinstance(text, str):\n        print(\"Preset text must be a string\")\n        return False\n    \n    # Validate text length and content\n    if len(text) > 10000:\n        print(\"Preset text is too long (max 10,000 characters)\")\n        return False\n    \n    if len(text.strip()) == 0:\n        print(\"Preset text cannot be empty\")\n        return False\n    \n    # Optional fields with validation\n    if \"format\" not in preset:\n        preset[\"format\"] = \"wav\"\n    else:\n        format_val = preset[\"format\"]\n        if not isinstance(format_val, str):\n            print(\"Preset format must be a string\")\n            return False\n        # Only allow safe audio formats\n        if format_val not in [\"wav\", \"mp3\", \"aac\"]:\n            print(\"Preset format must be wav, mp3, or aac\")\n            return False\n    \n    if \"speed\" not in preset:\n        preset[\"speed\"] = 1.0\n    else:\n        speed = preset[\"speed\"]\n        if not isinstance(speed, (int, float)):\n            print(\"Preset speed must be a number\")\n            return False\n        # Validate speed range\n        if speed < 0.1 or speed > 3.0:\n            print(\"Preset speed must be between 0.1 and 3.0\")\n            return False\n    \n    return True\n\ndef get_preset_names() -> List[str]:\n    \"\"\"\n    Get a list of all preset names.\n    \n    Returns:\n        List of preset names\n    \"\"\"\n    presets = load_presets()\n    return list(presets.keys())\n\ndef get_preset(name: str) -> Optional[Dict[str, Any]]:\n    \"\"\"\n    Get a specific preset by name.\n    \n    Args:\n        name: Name of the preset to get\n        \n    Returns:\n        Preset data or None if not found\n    \"\"\"\n    presets = load_presets()\n    return presets.get(name)\n"
  },
  {
    "path": "test_offline.py",
    "content": "#!/usr/bin/env python3\n\"\"\"\nTest script for verifying offline mode functionality of Kokoro-TTS-Local\n\"\"\"\nimport os\nimport sys\nfrom pathlib import Path\nimport torch\n\n# Constants\nREQUIRED_FILES = {\n    'model': 'kokoro-v1_0.pth',\n    'config': 'config.json',\n    'voices_dir': 'voices'\n}\n\nDEFAULT_TEST_TEXT = \"Hello, this is a test of offline mode.\"\nDEFAULT_VOICE = \"af_bella\"\nTEST_OUTPUT = \"test_offline_output.wav\"\n\ndef print_header(text: str):\n    \"\"\"Print a formatted header\"\"\"\n    print(\"\\n\" + \"=\" * 60)\n    print(f\"  {text}\")\n    print(\"=\" * 60)\n\ndef print_status(item: str, status: bool, details: str = \"\"):\n    \"\"\"Print a status line with check or cross mark\"\"\"\n    mark = \"[PASS]\" if status else \"[FAIL]\"\n    print(f\"  {mark} {item}\")\n    if details:\n        print(f\"      {details}\")\n\ndef check_offline_mode() -> bool:\n    \"\"\"Check if offline mode is enabled\"\"\"\n    print_header(\"Checking Offline Mode Configuration\")\n    \n    hf_offline = os.environ.get(\"HF_HUB_OFFLINE\", \"0\")\n    transformers_offline = os.environ.get(\"TRANSFORMERS_OFFLINE\", \"0\")\n    \n    offline_enabled = hf_offline == \"1\" or transformers_offline == \"1\"\n    \n    print_status(\"HF_HUB_OFFLINE\", hf_offline == \"1\", f\"Value: {hf_offline}\")\n    print_status(\"TRANSFORMERS_OFFLINE\", transformers_offline == \"1\", f\"Value: {transformers_offline}\")\n    print_status(\"Offline Mode Status\", offline_enabled, \n                \"Enabled\" if offline_enabled else \"Not enabled - will attempt network access\")\n    \n    return offline_enabled\n\ndef check_required_files() -> dict:\n    \"\"\"Check if all required files exist\"\"\"\n    print_header(\"Checking Required Files\")\n    \n    results = {}\n    \n    # Check model file\n    model_path = Path(REQUIRED_FILES['model']).resolve()\n    model_exists = model_path.exists()\n    results['model'] = model_exists\n    print_status(\"Model file\", model_exists, str(model_path))\n    \n    # Check config file\n    config_path = Path(REQUIRED_FILES['config']).resolve()\n    config_exists = config_path.exists()\n    results['config'] = config_exists\n    print_status(\"Config file\", config_exists, str(config_path))\n    \n    # Check voices directory\n    voices_dir = Path(REQUIRED_FILES['voices_dir']).resolve()\n    voices_exists = voices_dir.exists() and voices_dir.is_dir()\n    results['voices_dir'] = voices_exists\n    print_status(\"Voices directory\", voices_exists, str(voices_dir))\n    \n    # Check for voice files\n    if voices_exists:\n        voice_files = list(voices_dir.glob(\"*.pt\"))\n        results['voice_count'] = len(voice_files)\n        has_voices = len(voice_files) > 0\n        print_status(\"Voice files\", has_voices, \n                    f\"Found {len(voice_files)} voice file(s)\")\n        \n        # Check for default voice\n        default_voice_path = voices_dir / f\"{DEFAULT_VOICE}.pt\"\n        default_voice_exists = default_voice_path.exists()\n        results['default_voice'] = default_voice_exists\n        print_status(f\"Default voice ({DEFAULT_VOICE})\", default_voice_exists,\n                    str(default_voice_path) if default_voice_exists else \"Not found\")\n    else:\n        results['voice_count'] = 0\n        results['default_voice'] = False\n        print_status(\"Voice files\", False, \"Voices directory not found\")\n    \n    return results\n\ndef check_dependencies() -> dict:\n    \"\"\"Check if required Python packages are installed\"\"\"\n    print_header(\"Checking Dependencies\")\n    \n    results = {}\n    required_packages = {\n        'torch': 'PyTorch',\n        'kokoro': 'Kokoro TTS',\n        'soundfile': 'SoundFile',\n        'numpy': 'NumPy',\n        'tqdm': 'tqdm'\n    }\n    \n    for package, name in required_packages.items():\n        try:\n            __import__(package)\n            results[package] = True\n            print_status(name, True, f\"Package '{package}' installed\")\n        except ImportError:\n            results[package] = False\n            print_status(name, False, f\"Package '{package}' not found\")\n    \n    # Check CUDA availability\n    cuda_available = torch.cuda.is_available()\n    results['cuda'] = cuda_available\n    print_status(\"CUDA Support\", cuda_available, \n                \"GPU acceleration available\" if cuda_available else \"Using CPU\")\n    \n    return results\n\ndef test_model_initialization() -> bool:\n    \"\"\"Test if model can be initialized\"\"\"\n    print_header(\"Testing Model Initialization\")\n    \n    try:\n        from models import build_model\n        \n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        print(f\"  Using device: {device}\")\n        \n        model_path = Path(REQUIRED_FILES['model']).resolve()\n        print(f\"  Model path: {model_path}\")\n        \n        # Build model\n        print(\"  Initializing model...\")\n        model = build_model(str(model_path), device)\n        \n        if model is None:\n            print_status(\"Model initialization\", False, \"Model returned None\")\n            return False\n        \n        print_status(\"Model initialization\", True, \"Model loaded successfully\")\n        return True\n        \n    except Exception as e:\n        print_status(\"Model initialization\", False, f\"Error: {type(e).__name__}: {str(e)}\")\n        return False\n\ndef test_voice_listing() -> bool:\n    \"\"\"Test if voices can be listed\"\"\"\n    print_header(\"Testing Voice Listing\")\n    \n    try:\n        from models import list_available_voices\n        \n        voices = list_available_voices()\n        \n        if not voices:\n            print_status(\"Voice listing\", False, \"No voices found\")\n            return False\n        \n        print_status(\"Voice listing\", True, f\"Found {len(voices)} voice(s)\")\n        print(\"\\n  Available voices:\")\n        for i, voice in enumerate(voices[:10], 1):  # Show first 10\n            print(f\"    {i}. {voice}\")\n        if len(voices) > 10:\n            print(f\"    ... and {len(voices) - 10} more\")\n        \n        return True\n        \n    except Exception as e:\n        print_status(\"Voice listing\", False, f\"Error: {type(e).__name__}: {str(e)}\")\n        return False\n\ndef test_speech_generation() -> bool:\n    \"\"\"Test if speech can be generated\"\"\"\n    print_header(\"Testing Speech Generation\")\n    \n    try:\n        from models import build_model, list_available_voices\n        import soundfile as sf\n        import numpy as np\n        \n        # Get available voices\n        voices = list_available_voices()\n        if not voices:\n            print_status(\"Speech generation\", False, \"No voices available\")\n            return False\n        \n        # Use default voice if available, otherwise use first voice\n        voice = DEFAULT_VOICE if DEFAULT_VOICE in voices else voices[0]\n        print(f\"  Using voice: {voice}\")\n        print(f\"  Test text: '{DEFAULT_TEST_TEXT}'\")\n        \n        # Initialize model\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        model = build_model(str(Path(REQUIRED_FILES['model']).resolve()), device)\n        \n        if model is None:\n            print_status(\"Speech generation\", False, \"Failed to load model\")\n            return False\n        \n        # Generate speech\n        print(\"  Generating speech...\")\n        voice_path = Path(\"voices\").resolve() / f\"{voice}.pt\"\n        \n        all_audio = []\n        generator = model(DEFAULT_TEST_TEXT, voice=str(voice_path), speed=1.0, split_pattern=r'\\n+')\n        \n        for gs, ps, audio in generator:\n            if audio is not None:\n                audio_tensor = audio if isinstance(audio, torch.Tensor) else torch.from_numpy(audio).float()\n                all_audio.append(audio_tensor)\n        \n        if not all_audio:\n            print_status(\"Speech generation\", False, \"No audio generated\")\n            return False\n        \n        # Concatenate audio segments\n        if len(all_audio) == 1:\n            final_audio = all_audio[0]\n        else:\n            final_audio = torch.cat(all_audio, dim=0)\n        \n        # Save test output\n        output_path = Path(TEST_OUTPUT).resolve()\n        sf.write(str(output_path), final_audio.numpy(), 24000)\n        \n        if not output_path.exists():\n            print_status(\"Speech generation\", False, \"Failed to save output file\")\n            return False\n        \n        file_size = output_path.stat().st_size\n        print_status(\"Speech generation\", True, \n                    f\"Generated {file_size:,} bytes to {output_path.name}\")\n        \n        return True\n        \n    except Exception as e:\n        print_status(\"Speech generation\", False, f\"Error: {type(e).__name__}: {str(e)}\")\n        import traceback\n        traceback.print_exc()\n        return False\n\ndef cleanup():\n    \"\"\"Clean up test files\"\"\"\n    try:\n        output_path = Path(TEST_OUTPUT)\n        if output_path.exists():\n            output_path.unlink()\n            print(f\"\\n  Cleaned up test file: {TEST_OUTPUT}\")\n    except Exception as e:\n        print(f\"\\n  Warning: Could not clean up test file: {e}\")\n\ndef main():\n    \"\"\"Run all offline mode tests\"\"\"\n    print(\"\\n\" + \"=\" * 60)\n    print(\"  KOKORO TTS OFFLINE MODE TEST\")\n    print(\"=\" * 60)\n    \n    # Track test results\n    tests_passed = 0\n    tests_failed = 0\n    \n    # Check offline mode\n    offline_enabled = check_offline_mode()\n    if not offline_enabled:\n        print(\"\\n[WARNING] Offline mode is not enabled!\")\n        print(\"  To enable offline mode, set the environment variable:\")\n        print(\"    Linux/macOS:  export HF_HUB_OFFLINE=1\")\n        print(\"    Windows PS:   $env:HF_HUB_OFFLINE=\\\"1\\\"\")\n        print(\"    Windows CMD:  set HF_HUB_OFFLINE=1\")\n        print(\"\\n  Continuing tests (may require network access)...\\n\")\n    \n    # Check required files\n    file_results = check_required_files()\n    all_files_present = all([\n        file_results.get('model', False),\n        file_results.get('config', False),\n        file_results.get('voices_dir', False),\n        file_results.get('voice_count', 0) > 0\n    ])\n    \n    if not all_files_present:\n        print(\"\\n[PREREQUISITE FAILED] Required files are missing\")\n        print(\"  Please run the application with network access first to download:\")\n        print(\"    - Model file (kokoro-v1_0.pth)\")\n        print(\"    - Config file (config.json)\")\n        print(\"    - At least one voice file in voices/ directory\")\n        print(\"\\n  Run: python tts_demo.py\")\n        print(\"       or: python gradio_interface.py\")\n        return 1\n    \n    # Check dependencies\n    dep_results = check_dependencies()\n    all_deps_present = all([\n        dep_results.get('torch', False),\n        dep_results.get('kokoro', False),\n        dep_results.get('soundfile', False),\n        dep_results.get('numpy', False),\n        dep_results.get('tqdm', False)\n    ])\n    \n    if not all_deps_present:\n        print(\"\\n[PREREQUISITE FAILED] Required dependencies are missing\")\n        print(\"  Please install required packages:\")\n        print(\"    pip install -r requirements.txt\")\n        return 1\n    \n    # Test model initialization\n    if test_model_initialization():\n        tests_passed += 1\n    else:\n        tests_failed += 1\n    \n    # Test voice listing\n    if test_voice_listing():\n        tests_passed += 1\n    else:\n        tests_failed += 1\n    \n    # Test speech generation\n    if test_speech_generation():\n        tests_passed += 1\n    else:\n        tests_failed += 1\n    \n    # Print summary\n    print_header(\"Test Summary\")\n    total_tests = tests_passed + tests_failed\n    print(f\"  Total tests: {total_tests}\")\n    print(f\"  Passed: {tests_passed}\")\n    print(f\"  Failed: {tests_failed}\")\n    \n    if tests_failed == 0:\n        print(\"\\n[SUCCESS] All tests passed!\")\n        print(\"  Your offline setup is working correctly.\")\n        cleanup()\n        return 0\n    else:\n        print(f\"\\n[FAILURE] {tests_failed} test(s) failed\")\n        print(\"  Please review the errors above and fix any issues.\")\n        return 1\n\nif __name__ == \"__main__\":\n    try:\n        exit_code = main()\n        sys.exit(exit_code)\n    except KeyboardInterrupt:\n        print(\"\\n\\nTest interrupted by user\")\n        cleanup()\n        sys.exit(130)\n    except Exception as e:\n        print(f\"\\n\\n[UNEXPECTED ERROR] {type(e).__name__}: {str(e)}\")\n        import traceback\n        traceback.print_exc()\n        cleanup()\n        sys.exit(1)\n\n"
  },
  {
    "path": "tts_demo.py",
    "content": "import torch\r\nfrom typing import Optional, Tuple, List, Union\r\nfrom models import build_model, generate_speech, list_available_voices\r\nfrom tqdm.auto import tqdm\r\nimport soundfile as sf\r\nfrom pathlib import Path\r\nimport numpy as np\r\nimport time\r\nimport os\r\nimport sys\r\n\r\n# Define path type for consistent handling\r\nPathLike = Union[str, Path]\r\n\r\n# Constants\r\nMAX_TEXT_LENGTH = 10000\r\nMAX_GENERATION_TIME = 300  # seconds\r\nMIN_GENERATION_TIME = 60   # seconds\r\nDEFAULT_SAMPLE_RATE = 24000\r\nMIN_SPEED = 0.1\r\nMAX_SPEED = 3.0\r\nDEFAULT_SPEED = 1.0\r\nMAX_RETRIES = 3\r\nRETRY_DELAY = 2  # seconds\r\n\r\n# Constants with validation\r\ndef validate_sample_rate(rate: int) -> int:\r\n    \"\"\"Validate sample rate is within acceptable range\"\"\"\r\n    valid_rates = [16000, 22050, 24000, 44100, 48000]\r\n    if rate not in valid_rates:\r\n        print(f\"Warning: Unusual sample rate {rate}. Valid rates are {valid_rates}\")\r\n        return 24000  # Default to safe value\r\n    return rate\r\n\r\ndef validate_language(lang: str) -> str:\r\n    \"\"\"Validate language code\"\"\"\r\n    # Import here to avoid circular imports\r\n    from models import LANGUAGE_CODES\r\n    valid_langs = list(LANGUAGE_CODES.keys())\r\n    if lang not in valid_langs:\r\n        print(f\"Warning: Invalid language code '{lang}'. Using 'a' (American English).\")\r\n        print(f\"Supported language codes: {', '.join(valid_langs)}\")\r\n        return 'a'  # Default to American English\r\n    return lang\r\n\r\n# Define and validate constants\r\nSAMPLE_RATE = validate_sample_rate(24000)\r\nDEFAULT_MODEL_PATH = Path('kokoro-v1_0.pth').resolve()\r\nDEFAULT_OUTPUT_FILE = Path('output.wav').resolve()\r\nDEFAULT_LANGUAGE = validate_language('a')  # 'a' for American English, 'b' for British English\r\nDEFAULT_TEXT = \"Hello, welcome to this text-to-speech test.\"\r\n\r\n# Ensure output directory exists\r\nDEFAULT_OUTPUT_FILE.parent.mkdir(parents=True, exist_ok=True)\r\n\r\n# Configure tqdm for better Windows console support\r\ntqdm.monitor_interval = 0\r\n\r\ndef print_menu():\r\n    \"\"\"Print the main menu options.\"\"\"\r\n    print(\"\\n=== Kokoro TTS Menu ===\")\r\n    print(\"1. List available voices\")\r\n    print(\"2. Generate speech\")\r\n    print(\"3. Exit\")\r\n    return input(\"Select an option (1-3): \").strip()\r\n\r\ndef select_voice(voices: List[str]) -> str:\r\n    \"\"\"Interactive voice selection.\"\"\"\r\n    print(\"\\nAvailable voices:\")\r\n    for i, voice in enumerate(voices, 1):\r\n        print(f\"{i}. {voice}\")\r\n\r\n    while True:\r\n        try:\r\n            choice = input(\"\\nSelect a voice number (or press Enter for default 'af_bella'): \").strip()\r\n            if not choice:\r\n                return \"af_bella\"\r\n            choice = int(choice)\r\n            if 1 <= choice <= len(voices):\r\n                return voices[choice - 1]\r\n            print(\"Invalid choice. Please try again.\")\r\n        except ValueError:\r\n            print(\"Please enter a valid number.\")\r\n\r\ndef get_text_input() -> str:\r\n    \"\"\"Get text input from user.\"\"\"\r\n    print(\"\\nEnter the text you want to convert to speech\")\r\n    print(\"(or press Enter for default text)\")\r\n    text = input(\"> \").strip()\r\n    return text if text else DEFAULT_TEXT\r\n\r\ndef get_speed() -> float:\r\n    \"\"\"Get speech speed from user.\"\"\"\r\n    while True:\r\n        try:\r\n            speed = input(f\"\\nEnter speech speed ({MIN_SPEED}-{MAX_SPEED}, default {DEFAULT_SPEED}): \").strip()\r\n            if not speed:\r\n                return DEFAULT_SPEED\r\n            speed = float(speed)\r\n            if MIN_SPEED <= speed <= MAX_SPEED:\r\n                return speed\r\n            print(f\"Speed must be between {MIN_SPEED} and {MAX_SPEED}\")\r\n        except ValueError:\r\n            print(\"Please enter a valid number.\")\r\n\r\ndef save_audio_with_retry(audio_data: np.ndarray, sample_rate: int, output_path: PathLike, max_retries: int = MAX_RETRIES, retry_delay: float = RETRY_DELAY) -> bool:\r\n    \"\"\"\r\n    Attempt to save audio data to file with retry logic.\r\n\r\n    Args:\r\n        audio_data: Audio data as numpy array\r\n        sample_rate: Sample rate in Hz\r\n        output_path: Path to save the audio file\r\n        max_retries: Maximum number of retry attempts\r\n        retry_delay: Delay between retries in seconds\r\n\r\n    Returns:\r\n        True if successful, False otherwise\r\n    \"\"\"\r\n    # Convert and normalize path to Path object\r\n    output_path = Path(output_path).resolve()\r\n\r\n    # Create parent directory if it doesn't exist\r\n    output_path.parent.mkdir(parents=True, exist_ok=True)\r\n\r\n    # Try to remove the file if it exists to avoid \"file in use\" issues\r\n    try:\r\n        if output_path.exists():\r\n            print(f\"Removing existing file: {output_path}\")\r\n            output_path.unlink()\r\n    except Exception as e:\r\n        print(f\"Warning: Could not remove existing file: {e}\")\r\n        print(\"This might indicate the file is in use by another program.\")\r\n\r\n    for attempt in range(max_retries):\r\n        try:\r\n            # Validate audio data before saving\r\n            if audio_data is None or len(audio_data) == 0:\r\n                raise ValueError(\"Empty audio data\")\r\n\r\n            # Check write permissions for the directory\r\n            if not os.access(str(output_path.parent), os.W_OK):\r\n                raise PermissionError(f\"No write permission for directory: {output_path.parent}\")\r\n\r\n            # Try to use a temporary file first, then rename it\r\n            temp_path = output_path.with_name(f\"temp_{output_path.name}\")\r\n\r\n            # Save audio file to temporary location\r\n            print(f\"Saving audio to temporary file: {temp_path}\")\r\n            sf.write(str(temp_path), audio_data, sample_rate)\r\n\r\n            # If successful, rename to final location\r\n            if temp_path.exists():\r\n                # Remove target file if it exists\r\n                if output_path.exists():\r\n                    output_path.unlink()\r\n                # Rename temp file to target file\r\n                temp_path.rename(output_path)\r\n                print(f\"Successfully renamed temporary file to: {output_path}\")\r\n\r\n            return True\r\n\r\n        except (IOError, PermissionError) as e:\r\n            if attempt < max_retries - 1:\r\n                print(f\"\\nFailed to save audio (attempt {attempt + 1}/{max_retries}): {e}\")\r\n                print(\"The output file might be in use by another program (e.g., media player).\")\r\n                print(f\"Please close any programs that might be using '{output_path}'\")\r\n                print(f\"Retrying in {retry_delay} seconds...\")\r\n                time.sleep(retry_delay)\r\n            else:\r\n                print(f\"\\nError: Could not save audio after {max_retries} attempts: {e}\")\r\n                print(f\"Please ensure '{output_path}' is not open in any other program and try again.\")\r\n                print(f\"You might need to restart your computer if the file remains locked.\")\r\n                return False\r\n        except Exception as e:\r\n            print(f\"\\nUnexpected error saving audio: {type(e).__name__}: {e}\")\r\n            if attempt < max_retries - 1:\r\n                print(f\"Retrying in {retry_delay} seconds...\")\r\n                time.sleep(retry_delay)\r\n            else:\r\n                return False\r\n        finally:\r\n            # Clean up temp file if it exists and we failed\r\n            try:\r\n                temp_path = output_path.with_name(f\"temp_{output_path.name}\")\r\n                if temp_path.exists():\r\n                    temp_path.unlink()\r\n            except Exception as e:\r\n                print(f\"Warning: Could not clean up temporary file {temp_path}: {e}\")\r\n\r\n    return False\r\n\r\ndef main() -> None:\r\n    import psutil\r\n    import gc\r\n    \r\n    try:\r\n        # Check system memory at startup\r\n        memory = psutil.virtual_memory()\r\n        available_gb = memory.available / (1024**3)\r\n        total_gb = memory.total / (1024**3)\r\n        \r\n        print(f\"System memory: {available_gb:.1f}GB available / {total_gb:.1f}GB total\")\r\n        \r\n        if available_gb < 2.0:\r\n            print(\"Warning: Low system memory detected. Consider closing other applications.\")\r\n            # Force garbage collection\r\n            gc.collect()\r\n\r\n        # Set up device safely\r\n        try:\r\n            device = 'cuda' if torch.cuda.is_available() else 'cpu'\r\n        except (RuntimeError, AttributeError, ImportError) as e:\r\n            print(f\"CUDA initialization error: {e}. Using CPU instead.\")\r\n            device = 'cpu'  # Fallback if CUDA check fails\r\n        print(f\"Using device: {device}\")\r\n\r\n        # Build model\r\n        print(\"\\nInitializing model...\")\r\n        with tqdm(total=1, desc=\"Building model\") as pbar:\r\n            model = build_model(DEFAULT_MODEL_PATH, device)\r\n            pbar.update(1)\r\n\r\n        # Cache for voices to avoid redundant calls\r\n        voices_cache = None\r\n\r\n        while True:\r\n            choice = print_menu()\r\n\r\n            if choice == \"1\":\r\n                # List voices\r\n                voices_cache = list_available_voices()\r\n                print(\"\\nAvailable voices:\")\r\n                for voice in voices_cache:\r\n                    print(f\"- {voice}\")\r\n\r\n            elif choice == \"2\":\r\n                # Generate speech\r\n                # Use cached voices if available\r\n                if voices_cache is None:\r\n                    voices_cache = list_available_voices()\r\n\r\n                if not voices_cache:\r\n                    print(\"No voices found! Please check the voices directory.\")\r\n                    continue\r\n\r\n                # Get user inputs\r\n                voice = select_voice(voices_cache)\r\n                text = get_text_input()\r\n\r\n                # Dynamic text length validation based on available memory\r\n                memory = psutil.virtual_memory()\r\n                available_gb = memory.available / (1024**3)\r\n                \r\n                # Adjust max length based on available memory\r\n                dynamic_max_length = MAX_TEXT_LENGTH\r\n                if available_gb < 2.0:\r\n                    dynamic_max_length = min(MAX_TEXT_LENGTH, 3000)\r\n                    print(f\"Reduced text limit to {dynamic_max_length} characters due to low memory\")\r\n                \r\n                if len(text) > dynamic_max_length:\r\n                    print(f\"Text is too long ({len(text)} chars). Maximum allowed: {dynamic_max_length} characters.\")\r\n                    print(\"Please enter a shorter text.\")\r\n                    continue\r\n\r\n                speed = get_speed()\r\n\r\n                print(f\"\\nGenerating speech for: '{text}'\")\r\n                print(f\"Using voice: {voice}\")\r\n                print(f\"Speed: {speed}x\")\r\n\r\n                # Generate speech\r\n                all_audio = []\r\n                # Use Path object for consistent path handling\r\n                voice_path = Path(\"voices\").resolve() / f\"{voice}.pt\"\r\n\r\n                # Verify voice file exists\r\n                if not voice_path.exists():\r\n                    print(f\"Error: Voice file not found: {voice_path}\")\r\n                    continue\r\n\r\n                # Set a timeout for generation with per-segment timeout\r\n                max_gen_time = MAX_GENERATION_TIME\r\n                max_segment_time = MIN_GENERATION_TIME\r\n                start_time = time.time()\r\n                segment_start_time = start_time\r\n\r\n                try:\r\n                    # Setup watchdog timer for overall process\r\n                    import threading\r\n                    generation_complete = False\r\n\r\n                    def watchdog_timer():\r\n                        if not generation_complete:\r\n                            print(\"\\nWatchdog: Generation taking too long, process will be cancelled\")\r\n                            # Can't directly interrupt generator, but this will inform user\r\n\r\n                    # Start watchdog timer\r\n                    watchdog = threading.Timer(max_gen_time, watchdog_timer)\r\n                    watchdog.daemon = True  # Don't prevent program exit\r\n                    watchdog.start()\r\n\r\n                    # Initialize generator\r\n                    try:\r\n                        generator = model(text, voice=str(voice_path), speed=speed, split_pattern=r'\\n+')\r\n                    except (ValueError, TypeError, RuntimeError) as e:\r\n                        print(f\"Error initializing speech generator: {e}\")\r\n                        watchdog.cancel()\r\n                        continue\r\n                    except Exception as e:\r\n                        print(f\"Unexpected error initializing generator: {type(e).__name__}: {e}\")\r\n                        watchdog.cancel()\r\n                        continue\r\n\r\n                    # Process segments\r\n                    with tqdm(desc=\"Generating speech\") as pbar:\r\n                        for gs, ps, audio in generator:\r\n                            # Check overall timeout\r\n                            current_time = time.time()\r\n                            if current_time - start_time > max_gen_time:\r\n                                print(\"\\nWarning: Total generation time exceeded limit, stopping\")\r\n                                break\r\n\r\n                            # Check per-segment timeout\r\n                            segment_elapsed = current_time - segment_start_time\r\n                            if segment_elapsed > max_segment_time:\r\n                                print(f\"\\nWarning: Segment took too long ({segment_elapsed:.1f}s), stopping\")\r\n                                break\r\n\r\n                            # Reset segment timer\r\n                            segment_start_time = current_time\r\n\r\n                            # Process audio if available\r\n                            if audio is not None:\r\n                                # Only convert if it's a numpy array, not if already tensor\r\n                                audio_tensor = audio if isinstance(audio, torch.Tensor) else torch.from_numpy(audio).float()\r\n\r\n                                all_audio.append(audio_tensor)\r\n                                print(f\"\\nGenerated segment: {gs}\")\r\n                                if ps:  # Only print phonemes if available\r\n                                    print(f\"Phonemes: {ps}\")\r\n                                pbar.update(1)\r\n\r\n                    # Mark generation as complete (for watchdog)\r\n                    generation_complete = True\r\n                    watchdog.cancel()\r\n\r\n                except ValueError as e:\r\n                    print(f\"Value error during speech generation: {e}\")\r\n                except RuntimeError as e:\r\n                    print(f\"Runtime error during speech generation: {e}\")\r\n                    # If CUDA out of memory, provide more helpful message\r\n                    if \"CUDA out of memory\" in str(e):\r\n                        print(\"CUDA out of memory error - try using a shorter text or switching to CPU\")\r\n                except KeyError as e:\r\n                    print(f\"Key error during speech generation: {e}\")\r\n                    print(\"This might be caused by a missing voice configuration\")\r\n                except FileNotFoundError as e:\r\n                    print(f\"File not found: {e}\")\r\n                except Exception as e:\r\n                    print(f\"Unexpected error during speech generation: {type(e).__name__}: {e}\")\r\n                    import traceback\r\n                    traceback.print_exc()\r\n\r\n                # Save audio\r\n                if all_audio:\r\n                    try:\r\n                        # Handle single segment case without concatenation\r\n                        if len(all_audio) == 1:\r\n                            final_audio = all_audio[0]\r\n                        else:\r\n                            try:\r\n                                final_audio = torch.cat(all_audio, dim=0)\r\n                            except RuntimeError as e:\r\n                                print(f\"Error concatenating audio segments: {e}\")\r\n                                continue\r\n\r\n                        # Use consistent Path object\r\n                        output_path = DEFAULT_OUTPUT_FILE\r\n                        if isinstance(final_audio, torch.Tensor):\n                            final_audio = final_audio.detach().cpu().numpy()\n                        if save_audio_with_retry(final_audio, SAMPLE_RATE, output_path):\n                            print(f\"\\nAudio saved to {output_path}\")\n                            # Play a system beep to indicate completion\r\n                            try:\r\n                                print('\\a')  # ASCII bell - should make a sound on most systems\r\n                            except:\r\n                                pass\r\n                        else:\r\n                            print(\"Failed to save audio file\")\r\n                    except Exception as e:\r\n                        print(f\"Error processing audio: {type(e).__name__}: {e}\")\r\n                else:\r\n                    print(\"Error: Failed to generate audio\")\r\n\r\n            elif choice == \"3\":\r\n                print(\"\\nGoodbye!\")\r\n                break\r\n\r\n            else:\r\n                print(\"\\nInvalid choice. Please try again.\")\r\n\r\n    except Exception as e:\r\n        print(f\"Error in main: {e}\")\r\n        import traceback\r\n        traceback.print_exc()\r\n    finally:\r\n        # Comprehensive cleanup with error handling\r\n        try:\r\n            print(\"\\nPerforming cleanup...\")\r\n\r\n            # Ensure model is properly released\r\n            if 'model' in locals() and model is not None:\r\n                print(\"Cleaning up model resources...\")\r\n                # First clear any references to voice models\r\n                if hasattr(model, 'voices'):\r\n                    try:\r\n                        voices_count = len(model.voices)\r\n                        model.voices.clear()\r\n                        print(f\"Cleared {voices_count} voice references\")\r\n                    except Exception as voice_error:\r\n                        print(f\"Error clearing voice references: {voice_error}\")\r\n\r\n                # Clear any other model attributes that might hold references\r\n                try:\r\n                    for attr in list(model.__dict__.keys()):\r\n                        if hasattr(model, attr) and not attr.startswith('__'):\r\n                            try:\r\n                                delattr(model, attr)\r\n                            except:\r\n                                pass\r\n                except Exception as attr_error:\r\n                    print(f\"Error clearing model attributes: {attr_error}\")\r\n\r\n                # Then delete the model\r\n                try:\r\n                    del model\r\n                    model = None\r\n                    print(\"Model reference deleted\")\r\n                except Exception as del_error:\r\n                    print(f\"Error deleting model: {del_error}\")\r\n\r\n            # Clean up voice cache\r\n            if 'voices_cache' in locals() and voices_cache is not None:\r\n                try:\r\n                    voices_cache.clear()\r\n                    voices_cache = None\r\n                    print(\"Voice cache cleared\")\r\n                except Exception as cache_error:\r\n                    print(f\"Error clearing voice cache: {cache_error}\")\r\n\r\n            # Clean up any CUDA resources\r\n            if torch.cuda.is_available():\r\n                try:\r\n                    print(\"Cleaning up CUDA resources...\")\r\n                    torch.cuda.empty_cache()\r\n                    print(\"CUDA cache emptied\")\r\n                except Exception as cuda_error:\r\n                    print(f\"Error clearing CUDA cache: {cuda_error}\")\r\n\r\n\r\n            # Final garbage collection\r\n            try:\r\n                import gc\r\n                gc.collect()\r\n                print(\"Garbage collection completed\")\r\n            except Exception as gc_error:\r\n                print(f\"Error during garbage collection: {gc_error}\")\r\n\r\n            print(\"Cleanup completed\")\r\n\r\n        except Exception as e:\r\n            print(f\"Error during cleanup: {type(e).__name__}: {e}\")\r\n            import traceback\r\n            traceback.print_exc()\r\n\r\nif __name__ == \"__main__\":\r\n    main()\r\n"
  }
]