Showing preview only (225K chars total). Download the full file or copy to clipboard to get everything.
Repository: 0x0funky/audioghost-ai
Branch: main
Commit: 2f309f19f9d2
Files: 37
Total size: 209.8 KB
Directory structure:
gitextract_y5kuu04t/
├── .gitignore
├── LICENSE
├── QUICKSTART.md
├── README.md
├── backend/
│ ├── api/
│ │ ├── __init__.py
│ │ ├── auth.py
│ │ ├── separate.py
│ │ └── tasks.py
│ ├── main.py
│ ├── requirements.txt
│ └── workers/
│ ├── __init__.py
│ ├── celery_app.py
│ └── tasks.py
├── docker-compose.yml
├── frontend/
│ ├── .gitignore
│ ├── README.md
│ ├── eslint.config.mjs
│ ├── next.config.ts
│ ├── package.json
│ ├── postcss.config.mjs
│ ├── src/
│ │ ├── app/
│ │ │ ├── globals.css
│ │ │ ├── layout.tsx
│ │ │ └── page.tsx
│ │ └── components/
│ │ ├── AudioUploader.tsx
│ │ ├── AuthModal.tsx
│ │ ├── Header.tsx
│ │ ├── ProgressTracker.tsx
│ │ ├── SeparationPanel.tsx
│ │ ├── StemMixer.tsx
│ │ ├── VideoStemMixer.tsx
│ │ └── WaveformEditor.tsx
│ └── tsconfig.json
├── install.bat
├── sam_audio_lite.py
├── start.bat
├── stop.bat
└── test_video_only.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
*.egg
*.egg-info/
dist/
build/
eggs/
*.manifest
*.spec
pip-log.txt
pip-delete-this-directory.txt
# Virtual environments
venv/
env/
ENV/
.venv/
# IDEs
.idea/
.vscode/
*.swp
*.swo
*~
.project
.pydevproject
.settings/
# SAM Audio original repo (install from pip instead)
sam_audio/
eval/
examples/
assets/
.github/
.checkpoints/
checkpoints/
sam_audio.egg-info/
# Test files and outputs
*.mp3
*.wav
*.mp4
output_*.wav
output_*.mp4
test.mp3
test_audio.wav
office.mp4
# Test scripts (keep sam_audio_lite.py for reference)
test_small.py
test_video.py
# Secrets
.hf_token
.env
*.env
# Jupyter
.ipynb_checkpoints/
*.ipynb
# OS
.DS_Store
Thumbs.db
# Node.js (frontend)
frontend/node_modules/
frontend/.next/
frontend/out/
# Redis/Celery
redis/
*.rdb
celerybeat-schedule
celerybeat.pid
# Logs
*.log
logs/
# Uploads/Outputs (runtime generated)
backend/uploads/
backend/outputs/
# Original repo files (not needed for our fork)
CODE_OF_CONDUCT.md
CONTRIBUTING.md
.pre-commit-config.yaml
pyproject.toml
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2024 AudioGhost AI Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
---
## Third-Party Licenses
This project uses the following third-party components:
### SAM-Audio
SAM-Audio is developed by Meta AI Research and is subject to Meta's research
license. See https://github.com/facebookresearch/sam-audio for more information.
================================================
FILE: QUICKSTART.md
================================================
# AudioGhost AI 啟動指南
## 快速啟動
### 1. 啟動 Redis (使用 Docker)
```powershell
cd d:\sam_audio
docker-compose up -d
```
### 2. 建立 Anaconda 環境
```powershell
# 建立新環境 (Python 3.11+ 必要)
conda create -n audioghost python=3.11 -y
# 啟動環境
conda activate audioghost
```
### 3. 安裝 PyTorch + xformers (CUDA 12.4)
```powershell
pip install torch==2.9.0+cu126 torchvision==0.24.0+cu126 torchaudio==2.9.0+cu126 --index-url https://download.pytorch.org/whl/cu126 --extra-index-url https://pypi.org/simple
```
### 4. 安裝 FFmpeg (TorchCodec 需要)
```powershell
conda install -c conda-forge ffmpeg -y
```
### 5. 安裝 SAM Audio
```powershell
cd d:\sam_audio
pip install .
```
### 6. 安裝 Backend 依賴
```powershell
cd d:\sam_audio\backend
pip install -r requirements.txt
```
### 7. 啟動 Backend API
```powershell
cd d:\sam_audio\backend
uvicorn main:app --reload --port 8000
```
### 8. 啟動 Celery Worker (新終端機)
```powershell
conda activate audioghost
cd d:\sam_audio\backend
celery -A workers.celery_app worker --loglevel=info --pool=solo
```
### 9. 啟動 Frontend (新終端機)
```powershell
cd d:\sam_audio\frontend
npm run dev
```
### 10. 開啟瀏覽器
訪問 http://localhost:3000
## 首次使用
1. 點擊右上角 "Connect HuggingFace" 按鈕
2. 前往 https://huggingface.co/facebook/sam-audio-large 申請存取權限
3. 建立 Access Token: https://huggingface.co/settings/tokens
4. 將 Token 貼入並連接
## 功能使用
- **上傳音訊**:拖放或點擊上傳區域
- **語意分離**:選擇快捷按鈕或輸入自訂描述
- **時間鎖定**:在波形圖上選取區域
- **三軌輸出**:Original / Ghost / Clean
================================================
FILE: README.md
================================================
# AudioGhost AI 🎵👻

**AI-Powered Object-Oriented Audio Separation**
Describe the sound you want to extract or remove using natural language. Powered by Meta's [SAM-Audio](https://github.com/facebookresearch/sam-audio) model.
  
## 🎬 Demo
### Audio Separation
https://github.com/user-attachments/assets/49248e25-0c56-46ab-a821-2de7f7016bb6
### Video Upload
https://github.com/user-attachments/assets/6b8c08a8-c84f-4fc3-83ad-5703f474fc1b
## Features
- 🎯 **Text-Guided Separation** - Describe what you want to extract: "vocals", "drums", "a dog barking"
- 🎬 **Video Upload Support** - Upload videos and extract/remove audio sources (audio extraction only, not vision-based)
- 🚀 **Memory Optimized** - Lite mode reduces VRAM from ~11GB to ~4GB
- 🎨 **Modern UI** - Glassmorphism design with waveform visualization
- ⚡ **Real-time Progress** - Track separation progress in real-time
- 🎛️ **Stem Mixer** - Preview and compare original, extracted, and residual audio
## 🗺️ Roadmap
- 🖱️ **Visual Prompting** - Click on video to select sound sources visually (Integration with [SAM 2](https://github.com/facebookresearch/sam2))
## Architecture
```
┌─────────────────────────────────────────────────┐
│ Frontend │
│ (Next.js + Tailwind v4) │
└──────────────────────┬──────────────────────────┘
│
┌──────────────────────▼──────────────────────────┐
│ Backend API │
│ (FastAPI + Python) │
└──────────────────────┬──────────────────────────┘
│
┌──────────────────────▼──────────────────────────┐
│ Task Queue │
│ (Celery + Redis) │
└──────────────────────┬──────────────────────────┘
│
┌──────────────────────▼──────────────────────────┐
│ SAM Audio Lite │
│ (Memory-optimized Meta SAM-Audio) │
└─────────────────────────────────────────────────┘
```
## Requirements
- **Python 3.11+**
- **CUDA-compatible GPU** (4GB+ VRAM for lite mode, 12GB+ for full mode)
- **CUDA 12.6** (recommended)
- **Node.js 18+** (for frontend)
> 💡 FFmpeg and Redis are automatically installed by the installer.
## 🚀 One-Click Installation (Recommended)
### First Time Setup
```bash
# Run installer (creates Conda env, downloads Redis, installs all dependencies)
install.bat
```
### Daily Usage
```bash
# Start all services with one click
start.bat
# Stop all services
stop.bat
```
---
## Manual Setup (Advanced)
### 1. Start Redis
Redis is automatically downloaded to `redis/` folder by `install.bat`. If you prefer Docker:
```bash
docker-compose up -d
```
### 2. Create Anaconda Environment
```bash
# Create new environment (Python 3.11+ required)
conda create -n audioghost python=3.11 -y
# Activate environment
conda activate audioghost
```
### 3. Install PyTorch (CUDA 12.6)
```bash
pip install torch==2.9.0+cu126 torchvision==0.24.0+cu126 torchaudio==2.9.0+cu126 --index-url https://download.pytorch.org/whl/cu126 --extra-index-url https://pypi.org/simple
```
### 4. Install FFmpeg (required by TorchCodec)
```bash
conda install -c conda-forge ffmpeg -y
```
### 5. Install SAM Audio
```bash
pip install git+https://github.com/facebookresearch/sam-audio.git
```
### 6. Install Backend Dependencies
```bash
cd backend
pip install -r requirements.txt
```
### 7. Install Frontend Dependencies
```bash
cd frontend
npm install
```
### 8. Start Services
**Terminal 1 - Backend API:**
```bash
cd backend
uvicorn main:app --reload --port 8000
```
**Terminal 2 - Celery Worker:**
```bash
conda activate audioghost
cd backend
celery -A workers.celery_app worker --loglevel=info --pool=solo
```
**Terminal 3 - Frontend:**
```bash
cd frontend
npm run dev
```
### 9. Open the App
Navigate to `http://localhost:3000`
### 10. Connect HuggingFace
1. Click "Connect HuggingFace" button
2. Request access at https://huggingface.co/facebook/sam-audio-large
3. Create Access Token: https://huggingface.co/settings/tokens
4. Paste the token and connect
## Usage
1. **Upload** an audio file (MP3, WAV, FLAC)
2. **Describe** what you want to extract or remove:
- "vocals" / "singing voice"
- "drums" / "percussion"
- "background music"
- "a dog barking"
- "crowd noise"
3. Click **Extract** or **Remove**
4. Wait for processing
5. **Preview** and **download** the results
## Performance Benchmarks
> Tested on RTX 4090 with 4:26 audio (11 chunks @ 25s each)
### VRAM Usage (Lite Mode)
| Model | bfloat16 (Default) | float32 (High Quality) | Recommended GPU |
|-------|-------------------|------------------------|-----------------|
| Small | **~6 GB** | **~10 GB** | RTX 3060 6GB / RTX 3070 8GB |
| Base | **~7 GB** | **~13 GB** | RTX 3070/4060 8GB / RTX 4070 12GB |
| Large | **~10 GB** | **~20 GB** | RTX 3080/4070 12GB / RTX 4080 16GB |
> 💡 **High Quality Mode (float32)**: Better separation quality but uses +2-3GB more VRAM. Enable via the "High Quality Mode" toggle in the UI.
### Processing Time
| Model | First Run (incl. model load) | Subsequent Runs | Speed |
|-------|------------------------------|-----------------|-------|
| Small | ~78s | **~25s** | ~10x realtime |
| Base | ~100s | **~29s** | ~9x realtime |
| Large | ~130s | **~41s** | ~6.5x realtime |
> 💡 First run includes model download and loading. Subsequent runs use cached models.
### Memory Optimization Details
AudioGhost uses a "Lite Mode" that removes unused model components:
| Component Removed | VRAM Saved |
|-------------------|------------|
| Vision Encoder | ~2GB |
| Visual Ranker | ~2GB |
| Text Ranker | ~2GB |
| Span Predictor | ~1-2GB |
**Total Reduction**: Up to **40% less VRAM** compared to original SAM-Audio
This is achieved by:
- Disabling video-related features (not needed for audio-only)
- Using `predict_spans=False` and `reranking_candidates=1`
- Using `bfloat16` precision by default (optional float32 for quality)
- 25-second chunking for long audio files
## Project Structure
```
audioghost-ai/
├── backend/
│ ├── main.py # FastAPI app
│ ├── api/ # API routes
│ │ ├── auth.py # HuggingFace auth
│ │ └── separate.py # Separation endpoints
│ └── workers/
│ ├── celery_app.py # Celery config
│ └── tasks.py # SAM Audio Lite worker
├── frontend/
│ ├── src/
│ │ ├── app/ # Next.js app
│ │ └── components/ # React components
│ └── package.json
├── sam_audio_lite.py # Standalone lite version
├── QUICKSTART.md # Quick setup guide
└── README.md
```
## API Reference
### POST /api/separate/
Create a separation task.
**Form Data:**
- `file` - Audio file
- `description` - Text prompt (e.g., "vocals")
- `mode` - "extract" or "remove"
- `model_size` - "small", "base", or "large" (default: "base")
**Response:**
```json
{
"task_id": "uuid",
"status": "pending",
"message": "Task submitted successfully"
}
```
### GET /api/separate/{task_id}/status
Get task status and progress.
### GET /api/separate/{task_id}/download/{stem}
Download result audio (ghost, clean, or original).
## Troubleshooting
### CUDA Out of Memory
- Use `model_size: "small"` instead of "base" or "large"
- Ensure lite mode is enabled (check for "Optimizing model for low VRAM" in logs)
- Close other GPU applications
### TorchCodec DLL Error
- Downgrade to FFmpeg 7.x
- Ensure FFmpeg `bin` directory is in PATH
### HuggingFace 401 Error
- Re-authenticate via the UI
- Check that `.hf_token` exists in `backend/`
## License
This project is licensed under the MIT License. SAM-Audio is licensed by Meta under a research license.
## Credits
- [SAM-Audio](https://github.com/facebookresearch/sam-audio) by Meta AI Research
- **Core Optimization Logic**: Special thanks to [NilanEkanayake](https://github.com/NilanEkanayake) for providing the initial code modifications in [Issue #24](https://github.com/facebookresearch/sam-audio/issues/24) that made VRAM inference reduction possible.
- Built with ❤️ using Next.js, FastAPI, and Celery
================================================
FILE: backend/api/__init__.py
================================================
"""API Package"""
================================================
FILE: backend/api/auth.py
================================================
"""
Authentication API - HuggingFace Token Management
"""
import os
from pathlib import Path
from typing import Optional
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
from huggingface_hub import HfApi, hf_hub_download
from huggingface_hub.utils import HfHubHTTPError
router = APIRouter()
# Token storage path (use absolute path based on this file's location)
BACKEND_DIR = Path(__file__).parent.parent
TOKEN_FILE = BACKEND_DIR / ".hf_token"
CHECKPOINTS_DIR = BACKEND_DIR / "checkpoints"
class TokenRequest(BaseModel):
token: str
class AuthStatus(BaseModel):
authenticated: bool
model_downloaded: bool
model_name: Optional[str] = None
def get_saved_token() -> Optional[str]:
"""Get saved HuggingFace token"""
if TOKEN_FILE.exists():
return TOKEN_FILE.read_text().strip()
return os.environ.get("HF_TOKEN")
def save_token(token: str):
"""Save HuggingFace token"""
TOKEN_FILE.write_text(token)
def check_model_downloaded() -> bool:
"""Check if SAM Audio model is downloaded"""
# Check for common model files
model_files = list(CHECKPOINTS_DIR.glob("*.safetensors")) + \
list(CHECKPOINTS_DIR.glob("*.bin"))
return len(model_files) > 0
@router.get("/status", response_model=AuthStatus)
async def get_auth_status():
"""Check authentication and model status"""
token = get_saved_token()
authenticated = False
if token:
try:
api = HfApi(token=token)
api.whoami()
authenticated = True
except Exception:
authenticated = False
return AuthStatus(
authenticated=authenticated,
model_downloaded=check_model_downloaded(),
model_name="facebook/sam-audio-large" if check_model_downloaded() else None
)
@router.post("/login")
async def login(request: TokenRequest):
"""Validate and save HuggingFace token"""
try:
# Validate token
api = HfApi(token=request.token)
user_info = api.whoami()
# Check if user has access to SAM Audio
try:
api.model_info("facebook/sam-audio-large", token=request.token)
except HfHubHTTPError as e:
if "403" in str(e) or "401" in str(e):
raise HTTPException(
status_code=403,
detail="You need to request access to facebook/sam-audio-large on HuggingFace first"
)
raise
# Save token
save_token(request.token)
return {
"success": True,
"username": user_info.get("name", "Unknown"),
"message": "Successfully authenticated"
}
except HTTPException:
raise
except Exception as e:
raise HTTPException(status_code=401, detail=f"Invalid token: {str(e)}")
@router.post("/download-model")
async def download_model():
"""Download SAM Audio model"""
token = get_saved_token()
if not token:
raise HTTPException(status_code=401, detail="Not authenticated")
try:
# Note: In production, this should be a background task
# For MVP, we'll use the HuggingFace auto-download feature
# which downloads on first use
return {
"success": True,
"message": "Model will be downloaded automatically on first use"
}
except Exception as e:
raise HTTPException(status_code=500, detail=f"Download failed: {str(e)}")
@router.post("/logout")
async def logout():
"""Clear saved token"""
if TOKEN_FILE.exists():
TOKEN_FILE.unlink()
return {"success": True, "message": "Logged out"}
================================================
FILE: backend/api/separate.py
================================================
"""
Separation API - Audio/Video Separation Endpoints
"""
import uuid
from pathlib import Path
from typing import Optional, List
from fastapi import APIRouter, UploadFile, File, Form, HTTPException
from pydantic import BaseModel
from workers.celery_app import celery_app
from workers.tasks import separate_audio_task
router = APIRouter()
UPLOAD_DIR = Path("uploads")
# Supported MIME types
AUDIO_TYPES = ["audio/mpeg", "audio/wav", "audio/mp3", "audio/x-wav", "audio/flac", "audio/m4a", "audio/aac"]
VIDEO_TYPES = ["video/mp4", "video/webm", "video/quicktime", "video/x-msvideo", "video/mpeg", "video/x-matroska"]
VIDEO_EXTENSIONS = [".mp4", ".webm", ".mov", ".avi", ".mkv", ".mpeg"]
class SeparationRequest(BaseModel):
description: str
mode: str = "extract" # "extract" or "remove"
start_time: Optional[float] = None
end_time: Optional[float] = None
model_size: str = "base" # "small", "base", "large"
class SeparationResponse(BaseModel):
task_id: str
status: str
message: str
@router.post("/", response_model=SeparationResponse)
async def create_separation_task(
file: UploadFile = File(...),
description: str = Form(...),
mode: str = Form("extract"),
start_time: Optional[float] = Form(None),
end_time: Optional[float] = Form(None),
model_size: str = Form("base"),
chunk_duration: float = Form(25.0),
use_float32: str = Form("false")
):
"""
Create a new audio/video separation task
- **file**: Audio or video file to process (video audio will be extracted)
- **description**: Text prompt describing the sound to separate
- **mode**: "extract" to isolate the sound, "remove" to remove it
- **start_time**: Optional start time for temporal prompting
- **end_time**: Optional end time for temporal prompting
- **model_size**: SAM Audio model size (small/base/large)
- **chunk_duration**: Audio chunk duration in seconds (5-60, default 25)
- **use_float32**: Use float32 precision for better quality (default: false)
"""
# Validate chunk_duration
chunk_duration = max(5.0, min(60.0, chunk_duration))
# Parse use_float32 from string to bool
use_float32_bool = use_float32.lower() == "true"
# Detect if file is video
file_extension = Path(file.filename).suffix.lower() if file.filename else ""
is_video = (
(file.content_type and file.content_type in VIDEO_TYPES) or
file_extension in VIDEO_EXTENSIONS
)
# Generate task ID
task_id = str(uuid.uuid4())
# Save uploaded file
file_extension = Path(file.filename).suffix or ".mp3"
upload_path = UPLOAD_DIR / f"{task_id}{file_extension}"
with open(upload_path, "wb") as f:
content = await file.read()
f.write(content)
# Build anchors for temporal prompting
anchors = None
if start_time is not None and end_time is not None:
anchors = [[["+", start_time, end_time]]]
# Submit Celery task
celery_task = separate_audio_task.apply_async(
args=[
str(upload_path),
description,
mode,
anchors,
model_size,
chunk_duration,
use_float32_bool,
is_video # New: flag for video processing
],
task_id=task_id
)
return SeparationResponse(
task_id=task_id,
status="pending",
message="Task submitted successfully"
)
@router.post("/batch", response_model=List[SeparationResponse])
async def create_batch_separation(
file: UploadFile = File(...),
descriptions: str = Form(...), # JSON array of descriptions
mode: str = Form("extract")
):
"""
Create multiple separation tasks for the same audio file
Useful for separating multiple stems at once
"""
import json
try:
desc_list = json.loads(descriptions)
except json.JSONDecodeError:
raise HTTPException(status_code=400, detail="Invalid descriptions format")
# Save file once
base_task_id = str(uuid.uuid4())
file_extension = Path(file.filename).suffix or ".mp3"
upload_path = UPLOAD_DIR / f"{base_task_id}{file_extension}"
with open(upload_path, "wb") as f:
content = await file.read()
f.write(content)
responses = []
for i, desc in enumerate(desc_list):
task_id = f"{base_task_id}-{i}"
separate_audio_task.apply_async(
args=[str(upload_path), desc, mode, None, "small"],
task_id=task_id
)
responses.append(SeparationResponse(
task_id=task_id,
status="pending",
message=f"Task for '{desc}' submitted"
))
return responses
================================================
FILE: backend/api/tasks.py
================================================
"""
Tasks API - Task Status and Results
"""
from pathlib import Path
from typing import Optional, List
from fastapi import APIRouter, HTTPException
from fastapi.responses import FileResponse
from pydantic import BaseModel
from celery.result import AsyncResult
from workers.celery_app import celery_app
router = APIRouter()
OUTPUT_DIR = Path("outputs")
class TaskStatus(BaseModel):
task_id: str
status: str # pending, processing, completed, failed
progress: int # 0-100
message: Optional[str] = None
result: Optional[dict] = None
class TaskResult(BaseModel):
original_url: str
ghost_url: str # Separated target
clean_url: str # Residual
@router.get("/{task_id}", response_model=TaskStatus)
async def get_task_status(task_id: str):
"""Get the status of a separation task"""
result = AsyncResult(task_id, app=celery_app)
if result.state == "PENDING":
return TaskStatus(
task_id=task_id,
status="pending",
progress=0,
message="Task is waiting to be processed"
)
elif result.state == "PROGRESS":
info = result.info or {}
return TaskStatus(
task_id=task_id,
status="processing",
progress=info.get("progress", 0),
message=info.get("message", "Processing...")
)
elif result.state == "SUCCESS":
return TaskStatus(
task_id=task_id,
status="completed",
progress=100,
message="Task completed successfully",
result=result.result
)
elif result.state == "FAILURE":
return TaskStatus(
task_id=task_id,
status="failed",
progress=0,
message=str(result.info)
)
else:
return TaskStatus(
task_id=task_id,
status=result.state.lower(),
progress=0,
message=f"Task state: {result.state}"
)
@router.get("/{task_id}/download/{file_type}")
async def download_result(task_id: str, file_type: str):
"""
Download processed audio or video file
- **file_type**: "original", "ghost", "clean", or "video"
"""
if file_type not in ["original", "ghost", "clean", "video"]:
raise HTTPException(status_code=400, detail="Invalid file type")
result = AsyncResult(task_id, app=celery_app)
if result.state != "SUCCESS":
raise HTTPException(status_code=404, detail="Task not completed")
# Handle video file separately
if file_type == "video":
video_path = result.result.get("video_path")
if not video_path:
raise HTTPException(status_code=404, detail="No video file for this task")
file_path = Path(video_path)
if not file_path.exists():
raise HTTPException(status_code=404, detail="Video file not found")
# Determine media type based on extension
extension = file_path.suffix.lower()
media_types = {
".mp4": "video/mp4",
".webm": "video/webm",
".mov": "video/quicktime",
".avi": "video/x-msvideo",
".mkv": "video/x-matroska"
}
media_type = media_types.get(extension, "video/mp4")
return FileResponse(
path=file_path,
filename=f"{task_id}_video{extension}",
media_type=media_type
)
# Handle audio files
file_path = Path(result.result.get(f"{file_type}_path", ""))
if not file_path.exists():
raise HTTPException(status_code=404, detail="File not found")
return FileResponse(
path=file_path,
filename=f"{task_id}_{file_type}.wav",
media_type="audio/wav"
)
@router.get("/{task_id}/download-video-with-audio/{audio_type}")
async def download_video_with_audio(task_id: str, audio_type: str):
"""
Download video with merged audio track
- **audio_type**: "original", "ghost", or "clean"
"""
import subprocess
import tempfile
import os
if audio_type not in ["original", "ghost", "clean"]:
raise HTTPException(status_code=400, detail="Invalid audio type. Use 'original', 'ghost', or 'clean'")
result = AsyncResult(task_id, app=celery_app)
if result.state != "SUCCESS":
raise HTTPException(status_code=404, detail="Task not completed")
# Get video path
video_path = result.result.get("video_path")
if not video_path:
raise HTTPException(status_code=404, detail="No video file for this task")
video_file = Path(video_path)
if not video_file.exists():
raise HTTPException(status_code=404, detail="Video file not found")
# Get audio path
audio_path = Path(result.result.get(f"{audio_type}_path", ""))
if not audio_path.exists():
raise HTTPException(status_code=404, detail=f"Audio file '{audio_type}' not found")
# Create output file in the same directory as video
output_dir = video_file.parent
extension = video_file.suffix.lower()
output_filename = f"{task_id}_{audio_type}_merged{extension}"
output_path = output_dir / output_filename
# Use FFmpeg to merge video and audio
try:
cmd = [
"ffmpeg", "-y",
"-i", str(video_file),
"-i", str(audio_path),
"-c:v", "copy", # Copy video stream without re-encoding
"-c:a", "aac", # Encode audio to AAC
"-b:a", "192k", # Audio bitrate
"-map", "0:v:0", # Use video from first input
"-map", "1:a:0", # Use audio from second input
"-shortest", # Match shortest stream
str(output_path)
]
subprocess.run(cmd, check=True, capture_output=True)
except subprocess.CalledProcessError as e:
raise HTTPException(status_code=500, detail=f"FFmpeg error: {e.stderr.decode()}")
except FileNotFoundError:
raise HTTPException(status_code=500, detail="FFmpeg not found. Please install FFmpeg.")
if not output_path.exists():
raise HTTPException(status_code=500, detail="Failed to create merged video")
# Determine media type
media_types = {
".mp4": "video/mp4",
".webm": "video/webm",
".mov": "video/quicktime",
".avi": "video/x-msvideo",
".mkv": "video/x-matroska"
}
media_type = media_types.get(extension, "video/mp4")
# Map audio type to display name
audio_labels = {
"original": "original",
"ghost": "isolated",
"clean": "without_isolated"
}
return FileResponse(
path=output_path,
filename=f"{task_id}_{audio_labels[audio_type]}_video{extension}",
media_type=media_type
)
@router.delete("/{task_id}")
async def cancel_task(task_id: str):
"""Cancel a pending or running task"""
result = AsyncResult(task_id, app=celery_app)
result.revoke(terminate=True)
return {"success": True, "message": "Task cancelled"}
@router.get("/", response_model=List[TaskStatus])
async def list_recent_tasks(limit: int = 10):
"""List recent tasks (simplified - in production would use database)"""
# Note: This is a simplified implementation
# In production, you would store task metadata in a database
return []
================================================
FILE: backend/main.py
================================================
"""
AudioGhost AI - FastAPI Backend
"""
import os
from pathlib import Path
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from fastapi.staticfiles import StaticFiles
from api import auth, separate, tasks
# Create necessary directories
UPLOAD_DIR = Path("uploads")
OUTPUT_DIR = Path("outputs")
CHECKPOINTS_DIR = Path("../checkpoints")
UPLOAD_DIR.mkdir(exist_ok=True)
OUTPUT_DIR.mkdir(exist_ok=True)
CHECKPOINTS_DIR.mkdir(exist_ok=True)
app = FastAPI(
title="AudioGhost AI",
description="AI-Powered Audio Separation Tool",
version="1.0.0"
)
# CORS Configuration
app.add_middleware(
CORSMiddleware,
allow_origins=["http://localhost:3000"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Mount static files for downloads
app.mount("/outputs", StaticFiles(directory="outputs"), name="outputs")
# Include routers
app.include_router(auth.router, prefix="/api/auth", tags=["Authentication"])
app.include_router(separate.router, prefix="/api/separate", tags=["Separation"])
app.include_router(tasks.router, prefix="/api/tasks", tags=["Tasks"])
@app.get("/")
async def root():
return {
"name": "AudioGhost AI",
"version": "1.0.0",
"status": "running"
}
@app.get("/health")
async def health():
return {"status": "healthy"}
================================================
FILE: backend/requirements.txt
================================================
# AudioGhost AI - Backend Dependencies
# FastAPI Framework
fastapi==0.115.6
uvicorn[standard]==0.34.0
python-multipart==0.0.19
# Task Queue
celery==5.4.0
redis==5.2.1
# Media Processing
pydub==0.25.1
# AI Dependencies (SAM Audio already installed from parent)
# huggingface_hub is included in SAM Audio deps
# Utilities
python-dotenv==1.0.1
aiofiles==24.1.0
================================================
FILE: backend/workers/__init__.py
================================================
"""Workers Package"""
================================================
FILE: backend/workers/celery_app.py
================================================
"""
Celery Application Configuration
"""
from celery import Celery
celery_app = Celery(
"audioghost",
broker="redis://localhost:6379/0",
backend="redis://localhost:6379/0",
include=["workers.tasks"]
)
# Celery Configuration
celery_app.conf.update(
task_serializer="json",
accept_content=["json"],
result_serializer="json",
timezone="UTC",
enable_utc=True,
task_track_started=True,
task_time_limit=3600, # 1 hour max per task
worker_prefetch_multiplier=1, # Process one task at a time (GPU memory)
result_expires=86400, # Results expire after 24 hours
)
================================================
FILE: backend/workers/tasks.py
================================================
"""
Celery Tasks - Audio Separation Workers
With SAM Audio Lite optimization for low VRAM usage
"""
import os
import sys
import gc
from pathlib import Path
from typing import Optional, List
from celery import current_task
from workers.celery_app import celery_app
# Add parent directory to path for SAM Audio imports
sys.path.insert(0, str(Path(__file__).parent.parent.parent))
OUTPUT_DIR = Path("outputs")
OUTPUT_DIR.mkdir(exist_ok=True)
# Global model cache to avoid reloading
_model_cache = {}
_processor_cache = {}
def update_progress(progress: int, message: str):
"""Update task progress"""
current_task.update_state(
state="PROGRESS",
meta={"progress": progress, "message": message}
)
def create_lite_model(model_name: str, hf_token: str = None):
"""
Create a memory-optimized SAM Audio model by removing unused components.
Reduces VRAM usage from ~11GB to ~4-5GB by:
- Replacing vision_encoder with a dummy
- Disabling visual_ranker
- Disabling text_ranker
- Disabling span_predictor
"""
import torch
from sam_audio import SAMAudio, SAMAudioProcessor
print(f"Loading {model_name} (lite mode)...")
# Load model
if hf_token:
model = SAMAudio.from_pretrained(model_name, token=hf_token)
else:
model = SAMAudio.from_pretrained(model_name)
processor = SAMAudioProcessor.from_pretrained(model_name)
print("Optimizing model for low VRAM...")
# Get vision encoder dim before deleting
vision_dim = model.vision_encoder.dim if hasattr(model.vision_encoder, 'dim') else 1024
# Delete heavy components
del model.vision_encoder
gc.collect()
# Store the dim for _get_video_features
model._vision_encoder_dim = vision_dim
# Replace _get_video_features to not use vision_encoder
def _get_video_features_lite(self, video, audio_features):
B, T, _ = audio_features.shape
return audio_features.new_zeros(B, self._vision_encoder_dim, T)
import types
model._get_video_features = types.MethodType(_get_video_features_lite, model)
# Delete rankers
if hasattr(model, 'visual_ranker') and model.visual_ranker is not None:
del model.visual_ranker
model.visual_ranker = None
gc.collect()
if hasattr(model, 'text_ranker') and model.text_ranker is not None:
del model.text_ranker
model.text_ranker = None
gc.collect()
# Delete span predictor
if hasattr(model, 'span_predictor') and model.span_predictor is not None:
del model.span_predictor
model.span_predictor = None
gc.collect()
if hasattr(model, 'span_predictor_transform') and model.span_predictor_transform is not None:
del model.span_predictor_transform
model.span_predictor_transform = None
gc.collect()
# Force garbage collection
gc.collect()
if torch.cuda.is_available():
torch.cuda.empty_cache()
print("Model optimization complete!")
return model, processor
def get_or_load_lite_model(model_name: str, hf_token: str, device: str, dtype):
"""Get cached lite model or create it - only keeps ONE model in memory"""
import torch
# Include dtype in cache key to ensure correct model is loaded
dtype_str = "bf16" if dtype == torch.bfloat16 else "fp32"
cache_key = f"{model_name}_lite_{device}_{dtype_str}"
print(f"[DEBUG] Looking for cached model with key: {cache_key}")
print(f"[DEBUG] Current cache keys: {list(_model_cache.keys())}")
if cache_key not in _model_cache:
print(f"[DEBUG] Cache miss - creating new lite model")
# IMPORTANT: Clear any existing models first to free memory
if len(_model_cache) > 0:
print(f"[DEBUG] Clearing {len(_model_cache)} existing model(s) from cache...")
for old_key in list(_model_cache.keys()):
del _model_cache[old_key]
for old_key in list(_processor_cache.keys()):
del _processor_cache[old_key]
gc.collect()
if torch.cuda.is_available():
torch.cuda.empty_cache()
print(f"[DEBUG] GPU Memory after clearing old models: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
model, processor = create_lite_model(model_name, hf_token)
print(f"[DEBUG] Converting model to {device} with dtype {dtype}")
model = model.eval().to(device, dtype)
_model_cache[cache_key] = model
_processor_cache[model_name] = processor
if torch.cuda.is_available():
print(f"[DEBUG] GPU Memory after loading: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
else:
print(f"[DEBUG] Cache hit - using existing model")
return _model_cache[cache_key], _processor_cache[model_name]
def cleanup_gpu_memory():
"""Clean up GPU memory after task"""
import torch
if torch.cuda.is_available():
torch.cuda.empty_cache()
gc.collect()
@celery_app.task(bind=True)
def separate_audio_task(
self,
audio_path: str,
description: str,
mode: str = "extract",
anchors: Optional[List] = None,
model_size: str = "base",
chunk_duration: float = 25.0,
use_float32: bool = False,
is_video: bool = False
):
"""
Separate audio using SAM Audio Lite (memory optimized)
Args:
audio_path: Path to input audio or video file
description: Text prompt for separation
mode: "extract" or "remove"
anchors: Optional temporal anchors [["+", start, end], ...]
model_size: Model size (small/base/large)
chunk_duration: Audio chunk duration in seconds (5-60)
use_float32: Use float32 precision for better quality
is_video: If True, extract audio from video file first
Returns:
Dictionary with paths to output files
"""
import torch
import torchaudio
import time
import subprocess
import shutil
from huggingface_hub import login
task_id = self.request.id
device = "cuda" if torch.cuda.is_available() else "cpu"
video_path = None # Will be set if input is video
# Debug: Show received parameter
print(f"[DEBUG] use_float32 parameter received: {use_float32} (type: {type(use_float32).__name__})")
print(f"[DEBUG] is_video parameter received: {is_video}")
# Handle video files - extract audio using FFmpeg
if is_video:
update_progress(2, "Extracting audio from video...")
video_path = Path(audio_path)
# Copy video to output directory for later playback
output_video_path = OUTPUT_DIR / f"{task_id}.video{video_path.suffix}"
shutil.copy2(video_path, output_video_path)
print(f"[DEBUG] Copied video to: {output_video_path}")
# Extract audio from video using FFmpeg
extracted_audio_path = OUTPUT_DIR / f"{task_id}.extracted.wav"
ffmpeg_cmd = [
"ffmpeg", "-y",
"-i", str(video_path),
"-vn", # No video
"-acodec", "pcm_s16le", # PCM 16-bit
"-ar", "44100", # 44.1kHz sample rate
"-ac", "1", # Mono
str(extracted_audio_path)
]
try:
result = subprocess.run(
ffmpeg_cmd,
capture_output=True,
text=True,
check=True
)
print(f"[DEBUG] FFmpeg audio extraction successful")
except subprocess.CalledProcessError as e:
raise Exception(f"FFmpeg audio extraction failed: {e.stderr}")
# Use extracted audio for processing
audio_path = str(extracted_audio_path)
# Set precision based on use_float32 parameter
if use_float32 or device == "cpu":
dtype = torch.float32
print(f"[DEBUG] Using float32 precision (High Quality Mode)")
else:
dtype = torch.bfloat16
print(f"[DEBUG] Using bfloat16 precision (Memory Optimized)")
# Start timing
start_time = time.time()
try:
update_progress(5, "Initializing...")
# Load HuggingFace token
backend_dir = Path(__file__).parent.parent
token_file = backend_dir / ".hf_token"
if token_file.exists():
with open(token_file, "r") as f:
hf_token = f.read().strip()
login(token=hf_token)
else:
raise Exception("HuggingFace token not found. Please authenticate first.")
# Select model based on size
model_name = f"facebook/sam-audio-{model_size}"
update_progress(10, f"Loading {model_name} (lite mode)...")
# Clean up before loading
cleanup_gpu_memory()
# Load lite model (with caching)
model, processor = get_or_load_lite_model(model_name, hf_token, device, dtype)
update_progress(30, "Loading audio...")
# Get sample rate
sample_rate = processor.audio_sampling_rate
# Load and preprocess audio
audio, orig_sr = torchaudio.load(audio_path)
if orig_sr != sample_rate:
resampler = torchaudio.transforms.Resample(orig_sr, sample_rate)
audio = resampler(audio)
# Convert to mono if stereo
if audio.shape[0] > 1:
audio = audio.mean(dim=0, keepdim=True)
# Calculate audio duration
audio_duration = audio.shape[1] / sample_rate
print(f"[DEBUG] Audio duration: {audio_duration:.2f}s")
# Chunking settings (from parameter, clamped to 5-60)
CHUNK_DURATION = max(5.0, min(60.0, chunk_duration))
MAX_CHUNK_SAMPLES = int(sample_rate * CHUNK_DURATION)
# Check if chunking is needed
if audio.shape[1] > MAX_CHUNK_SAMPLES:
print(f"[DEBUG] Audio is {audio_duration:.1f}s, using chunking ({CHUNK_DURATION}s chunks)")
# Split audio into chunks
audio_tensor = audio.squeeze(0).to(device, dtype)
chunks = torch.split(audio_tensor, MAX_CHUNK_SAMPLES, dim=-1)
total_chunks = len(chunks)
out_target = []
out_residual = []
for i, chunk in enumerate(chunks):
# Update progress
chunk_progress = 30 + int((i / total_chunks) * 50)
update_progress(chunk_progress, f"Processing chunk {i+1}/{total_chunks}...")
# Skip very short chunks
if chunk.shape[-1] < sample_rate: # Less than 1 second
print(f"[DEBUG] Skipping chunk {i+1} (too short)")
continue
# Prepare batch for this chunk
batch = processor(
audios=[chunk.unsqueeze(0)],
descriptions=[description]
).to(device)
# Run separation
with torch.inference_mode():
with torch.cuda.amp.autocast(enabled=(device == "cuda")):
result = model.separate(
batch,
predict_spans=False,
reranking_candidates=1
)
out_target.append(result.target[0].cpu())
out_residual.append(result.residual[0].cpu())
# Clean up chunk results
del batch, result
if torch.cuda.is_available():
torch.cuda.empty_cache()
# Concatenate all chunks
target_audio = torch.cat(out_target, dim=-1).clamp(-1, 1).float().unsqueeze(0)
residual_audio = torch.cat(out_residual, dim=-1).clamp(-1, 1).float().unsqueeze(0)
del out_target, out_residual, chunks, audio_tensor
else:
print(f"[DEBUG] Audio is {audio_duration:.1f}s, processing as single batch")
update_progress(50, "Running separation...")
# Process entire audio at once
batch = processor(
audios=[audio_path],
descriptions=[description]
).to(device)
# Run separation
with torch.inference_mode():
with torch.cuda.amp.autocast(enabled=(device == "cuda")):
result = model.separate(
batch,
predict_spans=False,
reranking_candidates=1
)
target_audio = result.target[0].float().unsqueeze(0).cpu()
residual_audio = result.residual[0].float().unsqueeze(0).cpu()
del batch, result
update_progress(80, "Saving results...")
# Output paths
output_base = OUTPUT_DIR / task_id
original_path = output_base.with_suffix(".original.wav")
ghost_path = output_base.with_suffix(".ghost.wav")
clean_path = output_base.with_suffix(".clean.wav")
# Save original audio
torchaudio.save(str(original_path), audio.cpu(), sample_rate)
# Save separated audio
if mode == "extract":
torchaudio.save(str(ghost_path), target_audio, sample_rate)
torchaudio.save(str(clean_path), residual_audio, sample_rate)
else:
torchaudio.save(str(ghost_path), target_audio, sample_rate)
torchaudio.save(str(clean_path), residual_audio, sample_rate)
update_progress(100, "Complete!")
# Aggressive cleanup
print(f"[DEBUG] Cleaning up GPU memory...")
del target_audio, residual_audio, audio
gc.collect()
cleanup_gpu_memory()
if torch.cuda.is_available():
print(f"[DEBUG] GPU Memory after cleanup: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
# Calculate processing time
processing_time = time.time() - start_time
print(f"[DEBUG] Processing completed in {processing_time:.2f}s for {audio_duration:.2f}s audio")
result = {
"original_path": str(original_path),
"ghost_path": str(ghost_path),
"clean_path": str(clean_path),
"description": description,
"mode": mode,
"audio_duration": round(audio_duration, 2),
"processing_time": round(processing_time, 2),
"model_size": model_size
}
# Add video path if this was a video file
if video_path is not None:
output_video_path = OUTPUT_DIR / f"{task_id}.video{video_path.suffix}"
result["video_path"] = str(output_video_path)
result["is_video"] = True
return result
except Exception as e:
gc.collect()
cleanup_gpu_memory()
raise Exception(f"Separation failed: {str(e)}")
@celery_app.task(bind=True)
def match_pattern_task(
self,
audio_path: str,
sample_path: str,
threshold: float = 0.85,
model_size: str = "base"
):
"""
Find and remove sounds similar to a sample
Args:
audio_path: Path to input audio file
sample_path: Path to sample audio file
threshold: Similarity threshold (0-1)
model_size: Model size (small/base/large)
Returns:
Dictionary with paths to output files and matched segments
"""
# TODO: Implement pattern matching with CLAP embeddings
# This is a placeholder for MVP v1.0
update_progress(50, "Pattern matching not yet implemented in MVP")
return {
"status": "not_implemented",
"message": "Pattern matching will be available in v1.1"
}
================================================
FILE: docker-compose.yml
================================================
version: '3.8'
services:
redis:
image: redis:alpine
container_name: audioghost-redis
ports:
- "6379:6379"
volumes:
- redis_data:/data
restart: unless-stopped
volumes:
redis_data:
================================================
FILE: frontend/.gitignore
================================================
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
# dependencies
/node_modules
/.pnp
.pnp.*
.yarn/*
!.yarn/patches
!.yarn/plugins
!.yarn/releases
!.yarn/versions
# testing
/coverage
# next.js
/.next/
/out/
# production
/build
# misc
.DS_Store
*.pem
# debug
npm-debug.log*
yarn-debug.log*
yarn-error.log*
.pnpm-debug.log*
# env files (can opt-in for committing if needed)
.env*
# vercel
.vercel
# typescript
*.tsbuildinfo
next-env.d.ts
================================================
FILE: frontend/README.md
================================================
This is a [Next.js](https://nextjs.org) project bootstrapped with [`create-next-app`](https://nextjs.org/docs/app/api-reference/cli/create-next-app).
## Getting Started
First, run the development server:
```bash
npm run dev
# or
yarn dev
# or
pnpm dev
# or
bun dev
```
Open [http://localhost:3000](http://localhost:3000) with your browser to see the result.
You can start editing the page by modifying `app/page.tsx`. The page auto-updates as you edit the file.
This project uses [`next/font`](https://nextjs.org/docs/app/building-your-application/optimizing/fonts) to automatically optimize and load [Geist](https://vercel.com/font), a new font family for Vercel.
## Learn More
To learn more about Next.js, take a look at the following resources:
- [Next.js Documentation](https://nextjs.org/docs) - learn about Next.js features and API.
- [Learn Next.js](https://nextjs.org/learn) - an interactive Next.js tutorial.
You can check out [the Next.js GitHub repository](https://github.com/vercel/next.js) - your feedback and contributions are welcome!
## Deploy on Vercel
The easiest way to deploy your Next.js app is to use the [Vercel Platform](https://vercel.com/new?utm_medium=default-template&filter=next.js&utm_source=create-next-app&utm_campaign=create-next-app-readme) from the creators of Next.js.
Check out our [Next.js deployment documentation](https://nextjs.org/docs/app/building-your-application/deploying) for more details.
================================================
FILE: frontend/eslint.config.mjs
================================================
import { defineConfig, globalIgnores } from "eslint/config";
import nextVitals from "eslint-config-next/core-web-vitals";
import nextTs from "eslint-config-next/typescript";
const eslintConfig = defineConfig([
...nextVitals,
...nextTs,
// Override default ignores of eslint-config-next.
globalIgnores([
// Default ignores of eslint-config-next:
".next/**",
"out/**",
"build/**",
"next-env.d.ts",
]),
]);
export default eslintConfig;
================================================
FILE: frontend/next.config.ts
================================================
import type { NextConfig } from "next";
const nextConfig: NextConfig = {
/* config options here */
};
export default nextConfig;
================================================
FILE: frontend/package.json
================================================
{
"name": "frontend",
"version": "0.1.0",
"private": true,
"scripts": {
"dev": "next dev",
"build": "next build",
"start": "next start",
"lint": "eslint"
},
"dependencies": {
"axios": "^1.13.2",
"lucide-react": "^0.562.0",
"next": "16.1.0",
"react": "19.2.3",
"react-dom": "19.2.3",
"wavesurfer.js": "^7.12.1"
},
"devDependencies": {
"@tailwindcss/postcss": "^4",
"@types/node": "^20",
"@types/react": "^19",
"@types/react-dom": "^19",
"eslint": "^9",
"eslint-config-next": "16.1.0",
"tailwindcss": "^4",
"typescript": "^5"
}
}
================================================
FILE: frontend/postcss.config.mjs
================================================
const config = {
plugins: {
"@tailwindcss/postcss": {},
},
};
export default config;
================================================
FILE: frontend/src/app/globals.css
================================================
@import "tailwindcss";
:root {
/* AudioGhost Brand Colors */
--ghost-primary: #8B5CF6;
--ghost-secondary: #06B6D4;
--ghost-accent: #F472B6;
--ghost-success: #10B981;
--ghost-warning: #F59E0B;
--ghost-error: #EF4444;
/* Dark Theme */
--bg-primary: #0A0A0F;
--bg-secondary: #12121A;
--bg-tertiary: #1A1A25;
--bg-card: #16161F;
--bg-hover: #1E1E2A;
/* Text Colors */
--text-primary: #FFFFFF;
--text-secondary: #A1A1AA;
--text-muted: #71717A;
/* Glassmorphism */
--glass-bg: rgba(22, 22, 31, 0.8);
--glass-border: rgba(139, 92, 246, 0.2);
}
* {
box-sizing: border-box;
padding: 0;
margin: 0;
}
html {
scroll-behavior: smooth;
}
body {
min-height: 100vh;
background: var(--bg-primary);
color: var(--text-primary);
font-family: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
}
body.light-mode {
--bg-primary: #FAFAFA;
--bg-secondary: #F5F5F5;
--bg-tertiary: #EBEBEB;
--bg-card: #FFFFFF;
--bg-hover: #F0F0F0;
--text-primary: #18181B;
--text-secondary: #52525B;
--text-muted: #A1A1AA;
--glass-bg: rgba(255, 255, 255, 0.9);
--glass-border: rgba(139, 92, 246, 0.3);
}
/* Custom Scrollbar */
::-webkit-scrollbar {
width: 8px;
height: 8px;
}
::-webkit-scrollbar-track {
background: var(--bg-secondary);
}
::-webkit-scrollbar-thumb {
background: var(--ghost-primary);
border-radius: 4px;
}
::-webkit-scrollbar-thumb:hover {
background: #7C3AED;
}
/* Glassmorphism Card */
.glass-card {
background: var(--glass-bg);
backdrop-filter: blur(12px);
-webkit-backdrop-filter: blur(12px);
border: 1px solid var(--glass-border);
border-radius: 16px;
}
/* Gradient Text */
.gradient-text {
background: linear-gradient(135deg, var(--ghost-primary) 0%, var(--ghost-secondary) 50%, var(--ghost-accent) 100%);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
background-clip: text;
}
/* Glow Effects */
.glow-primary {
box-shadow: 0 0 20px rgba(139, 92, 246, 0.3);
}
.glow-secondary {
box-shadow: 0 0 20px rgba(6, 182, 212, 0.3);
}
/* Button Styles */
.btn-primary {
background: linear-gradient(135deg, var(--ghost-primary), #7C3AED);
color: white;
padding: 12px 24px;
border-radius: 12px;
font-weight: 600;
transition: all 0.3s ease;
border: none;
cursor: pointer;
display: inline-flex;
align-items: center;
justify-content: center;
}
.btn-primary:hover {
transform: translateY(-2px);
box-shadow: 0 8px 25px rgba(139, 92, 246, 0.4);
}
.btn-primary:active {
transform: translateY(0);
}
.btn-secondary {
background: var(--bg-tertiary);
color: var(--text-primary);
padding: 12px 24px;
border-radius: 12px;
font-weight: 600;
transition: all 0.3s ease;
border: 1px solid var(--glass-border);
cursor: pointer;
display: inline-flex;
align-items: center;
justify-content: center;
}
.btn-secondary:hover {
background: var(--bg-hover);
border-color: var(--ghost-primary);
}
/* Input Styles */
.input-ghost {
background: var(--bg-tertiary);
border: 1px solid var(--glass-border);
border-radius: 12px;
padding: 12px 16px;
color: var(--text-primary);
transition: all 0.3s ease;
outline: none;
width: 100%;
}
.input-ghost:focus {
border-color: var(--ghost-primary);
box-shadow: 0 0 0 3px rgba(139, 92, 246, 0.2);
}
.input-ghost::placeholder {
color: var(--text-muted);
}
/* Waveform Container */
.waveform-container {
background: linear-gradient(180deg, var(--bg-secondary) 0%, var(--bg-tertiary) 100%);
border-radius: 16px;
padding: 24px;
border: 1px solid var(--glass-border);
}
/* Progress Bar */
.progress-bar {
height: 6px;
background: var(--bg-tertiary);
border-radius: 3px;
overflow: hidden;
}
.progress-bar-fill {
height: 100%;
background: linear-gradient(90deg, var(--ghost-primary), var(--ghost-secondary));
border-radius: 3px;
transition: width 0.3s ease;
}
/* Animations */
@keyframes pulse-glow {
0%,
100% {
box-shadow: 0 0 20px rgba(139, 92, 246, 0.3);
}
50% {
box-shadow: 0 0 40px rgba(139, 92, 246, 0.5);
}
}
@keyframes float {
0%,
100% {
transform: translateY(0);
}
50% {
transform: translateY(-10px);
}
}
@keyframes shimmer {
0% {
background-position: -200% 0;
}
100% {
background-position: 200% 0;
}
}
.animate-pulse-glow {
animation: pulse-glow 2s ease-in-out infinite;
}
.animate-float {
animation: float 3s ease-in-out infinite;
}
.shimmer {
background: linear-gradient(90deg,
var(--bg-tertiary) 25%,
var(--bg-hover) 50%,
var(--bg-tertiary) 75%);
background-size: 200% 100%;
animation: shimmer 1.5s infinite;
}
/* Stem Mixer Track */
.stem-track {
background: var(--bg-secondary);
border-radius: 12px;
padding: 16px;
border: 1px solid var(--glass-border);
transition: all 0.3s ease;
}
.stem-track:hover {
border-color: var(--ghost-primary);
}
.stem-track.original {
border-left: 3px solid var(--ghost-primary);
}
.stem-track.ghost {
border-left: 3px solid var(--ghost-accent);
}
.stem-track.clean {
border-left: 3px solid var(--ghost-success);
}
/* Quick Action Tags */
.quick-tag {
display: inline-flex;
align-items: center;
gap: 6px;
padding: 8px 16px;
background: var(--bg-tertiary);
border: 1px solid var(--glass-border);
border-radius: 20px;
font-size: 14px;
color: var(--text-secondary);
cursor: pointer;
transition: all 0.3s ease;
}
.quick-tag:hover {
background: var(--bg-hover);
color: var(--text-primary);
border-color: var(--ghost-primary);
}
.quick-tag.selected {
background: linear-gradient(135deg, var(--ghost-primary), #7C3AED);
color: white;
border-color: transparent;
}
/* Upload Zone */
.upload-zone {
border: 2px dashed var(--glass-border);
border-radius: 16px;
padding: 64px 48px;
text-align: center;
transition: all 0.3s ease;
cursor: pointer;
background: var(--bg-secondary);
}
.upload-zone:hover {
border-color: var(--ghost-primary);
background: rgba(139, 92, 246, 0.05);
}
.upload-zone.dragover {
border-color: var(--ghost-primary);
background: rgba(139, 92, 246, 0.1);
box-shadow: 0 0 30px rgba(139, 92, 246, 0.2);
}
/* Responsive */
@media (max-width: 768px) {
.upload-zone {
padding: 48px 24px;
}
}
================================================
FILE: frontend/src/app/layout.tsx
================================================
import type { Metadata } from "next";
import { Inter } from "next/font/google";
import "./globals.css";
const inter = Inter({
subsets: ["latin"],
variable: "--font-inter",
});
export const metadata: Metadata = {
title: "AudioGhost AI - AI-Powered Audio Separation",
description: "Separate any sound from audio using natural language. Remove vocals, extract instruments, eliminate background noise with state-of-the-art AI.",
keywords: ["audio separation", "AI", "vocal removal", "stem separation", "audio editing"],
};
export default function RootLayout({
children,
}: Readonly<{
children: React.ReactNode;
}>) {
return (
<html lang="en" suppressHydrationWarning>
<body className={`${inter.variable} antialiased`}>
{children}
</body>
</html>
);
}
================================================
FILE: frontend/src/app/page.tsx
================================================
"use client";
import { useState, useEffect } from "react";
import Header from "@/components/Header";
import AuthModal from "@/components/AuthModal";
import AudioUploader from "@/components/AudioUploader";
import WaveformEditor from "@/components/WaveformEditor";
import SeparationPanel from "@/components/SeparationPanel";
import ProgressTracker from "@/components/ProgressTracker";
import StemMixer from "@/components/StemMixer";
import VideoStemMixer from "@/components/VideoStemMixer";
interface TaskResult {
original_path: string;
ghost_path: string;
clean_path: string;
description: string;
mode: string;
audio_duration?: number;
processing_time?: number;
model_size?: string;
video_path?: string;
is_video?: boolean;
}
interface TaskState {
taskId: string | null;
status: "idle" | "pending" | "processing" | "completed" | "failed";
progress: number;
message: string;
result: TaskResult | null;
}
export default function Home() {
const [isAuthenticated, setIsAuthenticated] = useState(false);
const [showAuthModal, setShowAuthModal] = useState(false);
const [audioFile, setAudioFile] = useState<File | null>(null);
const [audioUrl, setAudioUrl] = useState<string | null>(null);
const [isVideo, setIsVideo] = useState(false);
const [isDarkMode, setIsDarkMode] = useState(true);
const [selectedRegion, setSelectedRegion] = useState<{ start: number; end: number } | null>(null);
// Persistent separation settings (won't reset on "New")
const [separationSettings, setSeparationSettings] = useState({
modelSize: "base" as "small" | "base" | "large",
chunkDuration: 25,
useFloat32: false,
});
const [task, setTask] = useState<TaskState>({
taskId: null,
status: "idle",
progress: 0,
message: "",
result: null,
});
// Check auth status on mount
useEffect(() => {
checkAuthStatus();
}, []);
// Toggle theme
useEffect(() => {
document.body.classList.toggle("light-mode", !isDarkMode);
}, [isDarkMode]);
const checkAuthStatus = async () => {
try {
const res = await fetch("http://localhost:8000/api/auth/status");
const data = await res.json();
setIsAuthenticated(data.authenticated);
} catch (error) {
console.error("Failed to check auth status:", error);
}
};
const handleFileUpload = (file: File) => {
setAudioFile(file);
const url = URL.createObjectURL(file);
setAudioUrl(url);
// Detect if file is video
const isVideoFile = file.type.startsWith("video/") ||
/\.(mp4|webm|mov|avi|mkv)$/i.test(file.name);
setIsVideo(isVideoFile);
// Reset task state
setTask({
taskId: null,
status: "idle",
progress: 0,
message: "",
result: null,
});
};
const handleReset = () => {
// Clean up the object URL to free memory
if (audioUrl) {
URL.revokeObjectURL(audioUrl);
}
setAudioFile(null);
setAudioUrl(null);
setIsVideo(false);
setSelectedRegion(null);
setTask({
taskId: null,
status: "idle",
progress: 0,
message: "",
result: null,
});
};
const handleSeparation = async (
description: string,
mode: "extract" | "remove",
modelSize: string = "base",
chunkDuration: number = 25,
useFloat32: boolean = false
) => {
if (!audioFile) return;
const formData = new FormData();
formData.append("file", audioFile);
formData.append("description", description);
formData.append("mode", mode);
formData.append("model_size", modelSize);
formData.append("chunk_duration", chunkDuration.toString());
formData.append("use_float32", useFloat32.toString());
if (selectedRegion) {
formData.append("start_time", selectedRegion.start.toString());
formData.append("end_time", selectedRegion.end.toString());
}
try {
const res = await fetch("http://localhost:8000/api/separate/", {
method: "POST",
body: formData,
});
const data = await res.json();
setTask({
taskId: data.task_id,
status: "pending",
progress: 0,
message: "Task submitted...",
result: null,
});
// Start polling for status
pollTaskStatus(data.task_id);
} catch (error) {
console.error("Failed to submit separation task:", error);
setTask(prev => ({
...prev,
status: "failed",
message: "Failed to submit task",
}));
}
};
const pollTaskStatus = async (taskId: string) => {
const poll = async () => {
try {
const res = await fetch(`http://localhost:8000/api/tasks/${taskId}`);
const data = await res.json();
setTask({
taskId,
status: data.status,
progress: data.progress,
message: data.message || "",
result: data.result || null,
});
if (data.status !== "completed" && data.status !== "failed") {
setTimeout(poll, 1000);
}
} catch (error) {
console.error("Failed to poll task status:", error);
}
};
poll();
};
return (
<main
style={{
minHeight: "100vh",
background: "var(--bg-primary)",
width: "100%"
}}
>
<Header
isAuthenticated={isAuthenticated}
onAuthClick={() => setShowAuthModal(true)}
isDarkMode={isDarkMode}
onThemeToggle={() => setIsDarkMode(!isDarkMode)}
onLogoClick={handleReset}
/>
<div
style={{
maxWidth: "1200px",
margin: "0 auto",
padding: "32px 24px"
}}
>
{/* Hero Section */}
{!audioUrl && (
<div style={{ textAlign: "center", marginBottom: "48px" }}>
<h1 style={{ fontSize: "3rem", fontWeight: 800, marginBottom: "16px" }}>
<span className="gradient-text">AudioGhost</span>{" "}
<span style={{ color: "var(--text-primary)" }}>AI</span>
</h1>
<p style={{ fontSize: "1.25rem", marginBottom: "8px", color: "var(--text-secondary)" }}>
AI-Powered Object-Oriented Audio Separation
</p>
<p style={{ color: "var(--text-muted)" }}>
Describe the sound you want to extract or remove using natural language
</p>
</div>
)}
{/* Main Content */}
<div style={{ display: "grid", gap: "24px" }}>
{/* Upload Zone */}
{!audioUrl && (
<AudioUploader onFileUpload={handleFileUpload} />
)}
{/* Waveform Editor (Audio) or Video Preview - Hide when results are shown */}
{audioUrl && task.status !== "completed" && (
<>
{/* Section Header with Upload Button */}
<div style={{ display: "flex", justifyContent: "space-between", alignItems: "center" }}>
<h2 style={{ fontSize: "1.25rem", fontWeight: 600, color: "var(--text-primary)" }}>
{isVideo ? "Video Preview" : "Audio Editor"}
</h2>
<button
onClick={handleReset}
style={{
padding: "8px 16px",
borderRadius: "8px",
background: "var(--bg-tertiary)",
color: "var(--text-secondary)",
border: "1px solid var(--border-color)",
cursor: "pointer",
fontSize: "0.875rem",
display: "flex",
alignItems: "center",
gap: "8px",
transition: "all 0.2s ease"
}}
onMouseOver={(e) => e.currentTarget.style.background = "var(--bg-secondary)"}
onMouseOut={(e) => e.currentTarget.style.background = "var(--bg-tertiary)"}
>
↩ Upload New File
</button>
</div>
{/* Show Video Player or Waveform based on file type */}
{isVideo ? (
<div
style={{
background: "var(--bg-secondary)",
borderRadius: "16px",
border: "1px solid var(--glass-border)",
padding: "16px",
overflow: "hidden"
}}
>
<video
src={audioUrl}
controls
style={{
width: "100%",
maxHeight: "400px",
borderRadius: "12px",
background: "#000",
objectFit: "contain"
}}
/>
<p style={{
fontSize: "0.8rem",
color: "var(--text-muted)",
marginTop: "12px",
textAlign: "center"
}}>
Audio will be extracted from this video for separation processing
</p>
</div>
) : (
<WaveformEditor
audioUrl={audioUrl}
onRegionSelect={setSelectedRegion}
selectedRegion={selectedRegion}
/>
)}
</>
)}
{/* Separation Controls */}
{audioUrl && task.status === "idle" && (
<SeparationPanel
onSeparate={handleSeparation}
isAuthenticated={isAuthenticated}
onAuthRequired={() => setShowAuthModal(true)}
hasRegion={!!selectedRegion}
settings={separationSettings}
onSettingsChange={setSeparationSettings}
/>
)}
{/* Progress Tracker */}
{(task.status === "pending" || task.status === "processing") && (
<ProgressTracker
status={task.status}
progress={task.progress}
message={task.message}
/>
)}
{/* Results - Stem Mixer (Audio) or Video Stem Mixer */}
{task.status === "completed" && task.result && task.taskId && (
task.result.is_video ? (
<VideoStemMixer
taskId={task.taskId}
description={task.result.description}
audioDuration={task.result.audio_duration}
processingTime={task.result.processing_time}
modelSize={task.result.model_size}
onUploadNew={handleReset}
onNewSeparation={() => {
setTask({
taskId: null,
status: "idle",
progress: 0,
message: "",
result: null,
});
}}
/>
) : (
<StemMixer
taskId={task.taskId}
description={task.result.description}
audioDuration={task.result.audio_duration}
processingTime={task.result.processing_time}
modelSize={task.result.model_size}
onUploadNew={handleReset}
onNewSeparation={() => {
setTask({
taskId: null,
status: "idle",
progress: 0,
message: "",
result: null,
});
}}
/>
)
)}
{/* Error State */}
{task.status === "failed" && (
<div className="glass-card p-6 text-center">
<div className="text-red-400 text-xl mb-2">❌ Separation Failed</div>
<p style={{ color: "var(--text-secondary)" }}>{task.message}</p>
<button
className="btn-primary mt-4"
onClick={() => setTask({ taskId: null, status: "idle", progress: 0, message: "", result: null })}
>
Try Again
</button>
</div>
)}
</div>
</div>
{/* Auth Modal */}
{showAuthModal && (
<AuthModal
onClose={() => setShowAuthModal(false)}
onSuccess={() => {
setIsAuthenticated(true);
setShowAuthModal(false);
}}
/>
)}
</main>
);
}
================================================
FILE: frontend/src/components/AudioUploader.tsx
================================================
"use client";
import { useState, useRef, useCallback } from "react";
import { Upload, Music, Video } from "lucide-react";
interface AudioUploaderProps {
onFileUpload: (file: File) => void;
}
export default function AudioUploader({ onFileUpload }: AudioUploaderProps) {
const [isDragOver, setIsDragOver] = useState(false);
const fileInputRef = useRef<HTMLInputElement>(null);
const handleDragOver = useCallback((e: React.DragEvent) => {
e.preventDefault();
setIsDragOver(true);
}, []);
const handleDragLeave = useCallback((e: React.DragEvent) => {
e.preventDefault();
setIsDragOver(false);
}, []);
const handleDrop = useCallback((e: React.DragEvent) => {
e.preventDefault();
setIsDragOver(false);
const file = e.dataTransfer.files[0];
if (file && isMediaFile(file)) {
onFileUpload(file);
}
}, [onFileUpload]);
const handleFileSelect = (e: React.ChangeEvent<HTMLInputElement>) => {
const file = e.target.files?.[0];
if (file && isMediaFile(file)) {
onFileUpload(file);
}
};
const isMediaFile = (file: File) => {
// Accept audio files
if (file.type.startsWith("audio/") ||
/\.(mp3|wav|flac|ogg|m4a|aac)$/i.test(file.name)) {
return true;
}
// Accept video files
if (file.type.startsWith("video/") ||
/\.(mp4|webm|mov|avi|mkv)$/i.test(file.name)) {
return true;
}
return false;
};
return (
<div
className={`upload-zone ${isDragOver ? "dragover" : ""}`}
onDragOver={handleDragOver}
onDragLeave={handleDragLeave}
onDrop={handleDrop}
onClick={() => fileInputRef.current?.click()}
>
<input
ref={fileInputRef}
type="file"
accept="audio/*,video/*"
onChange={handleFileSelect}
className="hidden"
/>
<div className="flex flex-col items-center">
{/* Icon */}
<div
className="w-20 h-20 rounded-2xl flex items-center justify-center mb-6 animate-float"
style={{
background: isDragOver
? "linear-gradient(135deg, var(--ghost-primary), var(--ghost-accent))"
: "var(--bg-tertiary)"
}}
>
{isDragOver ? (
<Music className="w-10 h-10 text-white" />
) : (
<Upload className="w-10 h-10" style={{ color: "var(--ghost-primary)" }} />
)}
</div>
{/* Text */}
<h3 className="text-xl font-semibold mb-2" style={{ color: "var(--text-primary)" }}>
{isDragOver ? "Drop your file here" : "Upload Audio or Video"}
</h3>
<p className="mb-4" style={{ color: "var(--text-secondary)" }}>
Drag & drop or click to browse
</p>
{/* Supported formats */}
<div className="flex items-center gap-2 flex-wrap justify-center">
{["MP3", "WAV", "FLAC", "MP4", "WebM", "MOV"].map((format) => (
<span
key={format}
className="px-3 py-1 rounded-full text-xs font-medium"
style={{
background: "var(--bg-tertiary)",
color: "var(--text-muted)"
}}
>
{format}
</span>
))}
</div>
</div>
</div>
);
}
================================================
FILE: frontend/src/components/AuthModal.tsx
================================================
"use client";
import { useState } from "react";
import { X, Key, ExternalLink, Loader2, CheckCircle } from "lucide-react";
interface AuthModalProps {
onClose: () => void;
onSuccess: () => void;
}
export default function AuthModal({ onClose, onSuccess }: AuthModalProps) {
const [token, setToken] = useState("");
const [isLoading, setIsLoading] = useState(false);
const [error, setError] = useState("");
const [step, setStep] = useState<"input" | "success">("input");
const handleSubmit = async (e: React.FormEvent) => {
e.preventDefault();
setIsLoading(true);
setError("");
try {
const res = await fetch("http://localhost:8000/api/auth/login", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ token }),
});
const data = await res.json();
if (!res.ok) {
throw new Error(data.detail || "Authentication failed");
}
setStep("success");
setTimeout(() => {
onSuccess();
}, 1500);
} catch (err) {
setError(err instanceof Error ? err.message : "Authentication failed");
} finally {
setIsLoading(false);
}
};
return (
<div
className="fixed inset-0 z-50 flex items-center justify-center p-4"
style={{ background: "rgba(0, 0, 0, 0.8)", backdropFilter: "blur(8px)" }}
>
<div
className="glass-card w-full max-w-md p-6 relative"
onClick={(e) => e.stopPropagation()}
>
{/* Close Button */}
<button
onClick={onClose}
className="absolute top-4 right-4 p-1 rounded-lg transition-colors"
style={{ color: "var(--text-muted)" }}
>
<X className="w-5 h-5" />
</button>
{step === "input" ? (
<>
{/* Header */}
<div className="text-center mb-6">
<div
className="w-16 h-16 mx-auto rounded-2xl flex items-center justify-center mb-4"
style={{ background: "linear-gradient(135deg, var(--ghost-primary), var(--ghost-accent))" }}
>
<Key className="w-8 h-8 text-white" />
</div>
<h2 className="text-2xl font-bold mb-2" style={{ color: "var(--text-primary)" }}>
Connect HuggingFace
</h2>
<p className="text-sm" style={{ color: "var(--text-secondary)" }}>
Enter your HuggingFace token to access SAM Audio models
</p>
</div>
{/* Instructions */}
<div
className="rounded-xl p-4 mb-6"
style={{ background: "var(--bg-tertiary)" }}
>
<h3 className="font-medium mb-2" style={{ color: "var(--text-primary)" }}>
How to get your token:
</h3>
<ol className="space-y-2 text-sm" style={{ color: "var(--text-secondary)" }}>
<li className="flex items-start gap-2">
<span className="font-bold" style={{ color: "var(--ghost-primary)" }}>1.</span>
<span>
Request access to{" "}
<a
href="https://huggingface.co/facebook/sam-audio-large"
target="_blank"
rel="noopener noreferrer"
className="underline hover:opacity-80"
style={{ color: "var(--ghost-secondary)" }}
>
SAM Audio on HuggingFace
<ExternalLink className="w-3 h-3 inline ml-1" />
</a>
</span>
</li>
<li className="flex items-start gap-2">
<span className="font-bold" style={{ color: "var(--ghost-primary)" }}>2.</span>
<span>
Create an{" "}
<a
href="https://huggingface.co/settings/tokens"
target="_blank"
rel="noopener noreferrer"
className="underline hover:opacity-80"
style={{ color: "var(--ghost-secondary)" }}
>
access token
<ExternalLink className="w-3 h-3 inline ml-1" />
</a>
</span>
</li>
<li className="flex items-start gap-2">
<span className="font-bold" style={{ color: "var(--ghost-primary)" }}>3.</span>
<span>Paste the token below</span>
</li>
</ol>
</div>
{/* Form */}
<form onSubmit={handleSubmit}>
<div className="mb-4">
<input
type="password"
value={token}
onChange={(e) => setToken(e.target.value)}
placeholder="hf_xxxxxxxxxxxxxxxxxxxxxxxxxx"
className="input-ghost font-mono text-sm"
disabled={isLoading}
/>
</div>
{error && (
<div
className="mb-4 p-3 rounded-lg text-sm"
style={{
background: "rgba(239, 68, 68, 0.1)",
color: "var(--ghost-error)",
border: "1px solid rgba(239, 68, 68, 0.2)"
}}
>
{error}
</div>
)}
<button
type="submit"
disabled={!token || isLoading}
className="btn-primary w-full flex items-center justify-center gap-2 disabled:opacity-50 disabled:cursor-not-allowed"
>
{isLoading ? (
<>
<Loader2 className="w-4 h-4 animate-spin" />
Verifying...
</>
) : (
"Connect"
)}
</button>
</form>
</>
) : (
/* Success State */
<div className="text-center py-8">
<div
className="w-20 h-20 mx-auto rounded-full flex items-center justify-center mb-4 animate-pulse-glow"
style={{ background: "linear-gradient(135deg, var(--ghost-success), var(--ghost-secondary))" }}
>
<CheckCircle className="w-10 h-10 text-white" />
</div>
<h2 className="text-2xl font-bold mb-2" style={{ color: "var(--text-primary)" }}>
Connected!
</h2>
<p style={{ color: "var(--text-secondary)" }}>
You can now use SAM Audio models
</p>
</div>
)}
</div>
</div>
);
}
================================================
FILE: frontend/src/components/Header.tsx
================================================
"use client";
import { Sun, Moon, Ghost, User, LogOut } from "lucide-react";
interface HeaderProps {
isAuthenticated: boolean;
onAuthClick: () => void;
isDarkMode: boolean;
onThemeToggle: () => void;
onLogoClick?: () => void;
}
export default function Header({
isAuthenticated,
onAuthClick,
isDarkMode,
onThemeToggle,
onLogoClick
}: HeaderProps) {
return (
<header
className="sticky top-0 z-50 glass-card"
style={{
borderRadius: 0,
borderTop: "none",
borderLeft: "none",
borderRight: "none",
width: "100%",
}}
>
<div
style={{
maxWidth: "1200px",
margin: "0 auto",
padding: "16px 24px",
display: "flex",
alignItems: "center",
justifyContent: "space-between"
}}
>
{/* Logo - Clickable */}
<div
style={{
display: "flex",
alignItems: "center",
gap: "12px",
cursor: onLogoClick ? "pointer" : "default"
}}
onClick={onLogoClick}
title="Return to home"
>
<img
src="/audioghost_logo.png"
alt="AudioGhost Logo"
style={{
width: "40px",
height: "40px",
borderRadius: "10px"
}}
/>
<div>
<h1 style={{ fontWeight: 700, fontSize: "1.125rem", color: "var(--text-primary)" }}>
Audio<span className="gradient-text">Ghost</span>
</h1>
<p style={{ fontSize: "0.75rem", color: "var(--text-muted)" }}>
v1.0 MVP
</p>
</div>
</div>
{/* Actions */}
<div style={{ display: "flex", alignItems: "center", gap: "12px" }}>
{/* Theme Toggle */}
<button
onClick={onThemeToggle}
style={{
padding: "8px",
borderRadius: "8px",
background: "var(--bg-tertiary)",
color: "var(--text-secondary)",
border: "none",
cursor: "pointer",
display: "flex",
alignItems: "center",
justifyContent: "center",
transition: "all 0.3s ease"
}}
title={isDarkMode ? "Switch to Light Mode" : "Switch to Dark Mode"}
>
{isDarkMode ? <Sun className="w-5 h-5" /> : <Moon className="w-5 h-5" />}
</button>
{/* Auth Button */}
{isAuthenticated ? (
<div
style={{
display: "flex",
alignItems: "center",
gap: "8px",
padding: "8px 16px",
borderRadius: "8px",
background: "var(--bg-tertiary)"
}}
>
<div
style={{
width: "32px",
height: "32px",
borderRadius: "50%",
display: "flex",
alignItems: "center",
justifyContent: "center",
background: "linear-gradient(135deg, var(--ghost-success), var(--ghost-secondary))"
}}
>
<User className="w-4 h-4 text-white" />
</div>
<span style={{ fontSize: "0.875rem", color: "var(--text-secondary)" }}>Connected</span>
</div>
) : (
<button
onClick={onAuthClick}
className="btn-primary"
style={{ display: "flex", alignItems: "center", gap: "8px" }}
>
<Ghost className="w-4 h-4" />
Connect HuggingFace
</button>
)}
</div>
</div>
</header>
);
}
================================================
FILE: frontend/src/components/ProgressTracker.tsx
================================================
"use client";
import { Ghost, Loader2, CheckCircle2 } from "lucide-react";
interface ProgressTrackerProps {
status: "pending" | "processing";
progress: number;
message: string;
}
export default function ProgressTracker({ status, progress, message }: ProgressTrackerProps) {
const steps = [
{ step: "Loading model", threshold: 10 },
{ step: "Processing audio", threshold: 30 },
{ step: "Running separation", threshold: 50 },
{ step: "Saving results", threshold: 80 },
];
return (
<div
style={{
background: "var(--bg-secondary)",
borderRadius: "20px",
border: "1px solid var(--glass-border)",
padding: "48px 40px"
}}
>
{/* Icon */}
<div style={{ textAlign: "center", marginBottom: "32px" }}>
<div
style={{
width: "80px",
height: "80px",
borderRadius: "20px",
display: "inline-flex",
alignItems: "center",
justifyContent: "center",
background: "linear-gradient(135deg, var(--ghost-primary), var(--ghost-accent))",
boxShadow: "0 8px 32px rgba(168, 85, 247, 0.3)"
}}
>
{status === "pending" ? (
<Ghost style={{ width: "40px", height: "40px", color: "white" }} />
) : (
<Loader2
style={{
width: "40px",
height: "40px",
color: "white",
animation: "spin 1s linear infinite"
}}
/>
)}
</div>
</div>
{/* Title */}
<h3
style={{
fontSize: "1.5rem",
fontWeight: 600,
color: "var(--text-primary)",
textAlign: "center",
marginBottom: "8px"
}}
>
{status === "pending" ? "Waiting in Queue..." : "Processing Audio..."}
</h3>
{/* Message */}
<p
style={{
fontSize: "0.9rem",
color: "var(--text-muted)",
textAlign: "center",
marginBottom: "32px"
}}
>
{message}
</p>
{/* Progress Bar */}
<div style={{ marginBottom: "32px" }}>
<div
style={{
display: "flex",
justifyContent: "space-between",
marginBottom: "10px"
}}
>
<span style={{ fontSize: "0.85rem", color: "var(--text-muted)" }}>
Progress
</span>
<span
style={{
fontSize: "0.85rem",
fontWeight: 600,
color: "var(--ghost-primary)"
}}
>
{progress}%
</span>
</div>
<div
style={{
height: "8px",
borderRadius: "4px",
background: "var(--bg-tertiary)",
overflow: "hidden"
}}
>
<div
style={{
height: "100%",
borderRadius: "4px",
background: "linear-gradient(90deg, var(--ghost-primary), var(--ghost-accent))",
width: `${progress}%`,
transition: "width 0.3s ease"
}}
/>
</div>
</div>
{/* Steps */}
<div
style={{
background: "var(--bg-tertiary)",
borderRadius: "12px",
padding: "20px"
}}
>
{steps.map(({ step, threshold }, index) => {
const isComplete = progress >= threshold;
const isActive = !isComplete && progress >= threshold - 20;
return (
<div
key={step}
style={{
display: "flex",
alignItems: "center",
gap: "14px",
padding: "12px 0",
borderBottom: index < steps.length - 1
? "1px solid var(--border-color)"
: "none"
}}
>
{/* Status Icon */}
<div
style={{
width: "24px",
height: "24px",
borderRadius: "50%",
display: "flex",
alignItems: "center",
justifyContent: "center",
flexShrink: 0,
background: isComplete
? "linear-gradient(135deg, #10B981, #34D399)"
: isActive
? "var(--ghost-primary)"
: "transparent",
border: isComplete || isActive
? "none"
: "2px solid var(--text-muted)"
}}
>
{isComplete ? (
<CheckCircle2
style={{
width: "14px",
height: "14px",
color: "white"
}}
/>
) : isActive ? (
<Loader2
style={{
width: "12px",
height: "12px",
color: "white",
animation: "spin 1s linear infinite"
}}
/>
) : null}
</div>
{/* Step Label */}
<span
style={{
fontSize: "0.9rem",
fontWeight: isComplete || isActive ? 500 : 400,
color: isComplete
? "var(--text-primary)"
: isActive
? "var(--ghost-primary)"
: "var(--text-muted)"
}}
>
{step}
</span>
</div>
);
})}
</div>
</div>
);
}
================================================
FILE: frontend/src/components/SeparationPanel.tsx
================================================
"use client";
import { useState } from "react";
import {
Mic,
Music,
Volume2,
Sparkles,
Clock,
AlertCircle,
Cpu,
Search,
Sliders,
Zap
} from "lucide-react";
interface SeparationSettings {
modelSize: "small" | "base" | "large";
chunkDuration: number;
useFloat32: boolean;
}
interface SeparationPanelProps {
onSeparate: (description: string, mode: "extract" | "remove", modelSize: string, chunkDuration: number, useFloat32: boolean) => void;
isAuthenticated: boolean;
onAuthRequired: () => void;
hasRegion: boolean;
// Persistent settings from parent
settings: SeparationSettings;
onSettingsChange: (settings: SeparationSettings) => void;
}
const QUICK_PROMPTS = [
{ icon: Mic, label: "Voice", prompt: "singing voice", color: "#8B5CF6" },
{ icon: Music, label: "Music", prompt: "background music", color: "#06B6D4" },
{ icon: Volume2, label: "Drums", prompt: "drums and percussion", color: "#F472B6" },
{ icon: Sparkles, label: "Guitar", prompt: "acoustic guitar", color: "#10B981" },
{ icon: Volume2, label: "Bass", prompt: "bass", color: "#F59E0B" },
{ icon: Volume2, label: "Piano", prompt: "piano", color: "#EF4444" },
];
const MODEL_OPTIONS = [
{ value: "small", label: "Small", vram: "~6GB", vramFp32: "~9GB", speed: "Fast" },
{ value: "base", label: "Base", vram: "~7GB", vramFp32: "~10GB", speed: "Balanced" },
{ value: "large", label: "Large", vram: "~10GB", vramFp32: "~13GB", speed: "Best" },
] as const;
export default function SeparationPanel({
onSeparate,
isAuthenticated,
onAuthRequired,
hasRegion,
settings,
onSettingsChange
}: SeparationPanelProps) {
// Only prompt and mode are local (reset each time)
const [customPrompt, setCustomPrompt] = useState("");
const [selectedPrompt, setSelectedPrompt] = useState<string | null>(null);
const [mode, setMode] = useState<"extract" | "remove">("extract");
// Destructure settings for easier access
const { modelSize, chunkDuration, useFloat32 } = settings;
// Helper to update a single setting
const updateSetting = <K extends keyof SeparationSettings>(key: K, value: SeparationSettings[K]) => {
onSettingsChange({ ...settings, [key]: value });
};
const handleQuickSelect = (prompt: string) => {
setSelectedPrompt(prompt);
setCustomPrompt(prompt);
};
const handleSeparate = () => {
if (!isAuthenticated) {
onAuthRequired();
return;
}
const prompt = customPrompt || selectedPrompt;
if (!prompt) return;
onSeparate(prompt, mode, modelSize, chunkDuration, useFloat32);
};
const activePrompt = customPrompt || selectedPrompt;
return (
<div
style={{
background: "var(--bg-secondary)",
borderRadius: "16px",
border: "1px solid var(--glass-border)",
overflow: "hidden"
}}
>
{/* Header */}
<div
style={{
padding: "20px 24px",
borderBottom: "1px solid var(--glass-border)",
display: "flex",
alignItems: "center",
justifyContent: "space-between"
}}
>
<h3 style={{
fontSize: "1rem",
fontWeight: 600,
color: "var(--text-primary)"
}}>
Separation Settings
</h3>
{hasRegion && (
<div
style={{
display: "flex",
alignItems: "center",
gap: "6px",
padding: "6px 12px",
borderRadius: "20px",
fontSize: "0.75rem",
background: "rgba(244, 114, 182, 0.15)",
color: "var(--ghost-accent)"
}}
>
<Clock style={{ width: "12px", height: "12px" }} />
Temporal Lock
</div>
)}
</div>
<div style={{ padding: "24px" }}>
{/* ============================================ */}
{/* MAIN INPUT - Describe the sound (PROMINENT) */}
{/* ============================================ */}
<div
style={{
marginBottom: "24px",
padding: "20px",
borderRadius: "14px",
background: "linear-gradient(135deg, rgba(168, 85, 247, 0.1), rgba(244, 114, 182, 0.1))",
border: "1px solid rgba(168, 85, 247, 0.2)"
}}
>
<label style={{
display: "flex",
alignItems: "center",
gap: "8px",
fontSize: "0.9rem",
fontWeight: 600,
marginBottom: "12px",
color: "var(--text-primary)"
}}>
<Search style={{ width: "16px", height: "16px", color: "var(--ghost-primary)" }} />
Describe the sound you want to {mode}
</label>
<input
type="text"
value={customPrompt}
onChange={(e) => {
setCustomPrompt(e.target.value);
setSelectedPrompt(null);
}}
placeholder="e.g., singing voice, drums, police siren, crowd noise, a dog barking..."
style={{
width: "100%",
padding: "16px 18px",
borderRadius: "12px",
border: "2px solid var(--ghost-primary)",
background: "var(--bg-primary)",
color: "var(--text-primary)",
fontSize: "1rem",
fontWeight: 500,
outline: "none",
boxShadow: "0 4px 15px rgba(168, 85, 247, 0.15)"
}}
/>
{/* Quick Select Tags */}
<div style={{
display: "flex",
flexWrap: "wrap",
gap: "8px",
marginTop: "14px"
}}>
{QUICK_PROMPTS.map(({ icon: Icon, label, prompt, color }) => (
<button
key={prompt}
onClick={() => handleQuickSelect(prompt)}
style={{
display: "flex",
alignItems: "center",
gap: "6px",
padding: "8px 12px",
borderRadius: "20px",
border: "none",
cursor: "pointer",
fontSize: "0.8rem",
fontWeight: 500,
transition: "all 0.2s",
background: selectedPrompt === prompt
? `linear-gradient(135deg, ${color}, ${color}dd)`
: "var(--bg-tertiary)",
color: selectedPrompt === prompt
? "white"
: "var(--text-secondary)"
}}
>
<Icon style={{ width: "14px", height: "14px" }} />
{label}
</button>
))}
</div>
</div>
{/* Settings Grid */}
<div style={{
display: "grid",
gridTemplateColumns: "1fr 1fr",
gap: "20px",
marginBottom: "20px"
}}>
{/* Mode Toggle */}
<div>
<label style={{
display: "block",
fontSize: "0.8rem",
fontWeight: 500,
marginBottom: "10px",
color: "var(--text-muted)"
}}>
Operation
</label>
<div style={{ display: "flex", gap: "8px" }}>
<button
onClick={() => setMode("extract")}
style={{
flex: 1,
padding: "12px",
borderRadius: "10px",
fontWeight: 500,
fontSize: "0.85rem",
border: "none",
cursor: "pointer",
background: mode === "extract"
? "linear-gradient(135deg, var(--ghost-primary), #7C3AED)"
: "var(--bg-tertiary)",
color: mode === "extract" ? "white" : "var(--text-secondary)"
}}
>
✨ Extract
</button>
<button
onClick={() => setMode("remove")}
style={{
flex: 1,
padding: "12px",
borderRadius: "10px",
fontWeight: 500,
fontSize: "0.85rem",
border: "none",
cursor: "pointer",
background: mode === "remove"
? "linear-gradient(135deg, var(--ghost-error), #DC2626)"
: "var(--bg-tertiary)",
color: mode === "remove" ? "white" : "var(--text-secondary)"
}}
>
🗑️ Remove
</button>
</div>
</div>
{/* Model Selector */}
<div>
<label style={{
display: "flex",
alignItems: "center",
gap: "6px",
fontSize: "0.8rem",
fontWeight: 500,
marginBottom: "10px",
color: "var(--text-muted)"
}}>
<Cpu style={{ width: "12px", height: "12px" }} />
Model
</label>
<div style={{ display: "flex", gap: "6px" }}>
{MODEL_OPTIONS.map(({ value, label, vram, vramFp32 }) => (
<button
key={value}
onClick={() => updateSetting("modelSize", value)}
style={{
flex: 1,
padding: "10px 8px",
borderRadius: "8px",
border: "none",
cursor: "pointer",
textAlign: "center",
background: modelSize === value
? "linear-gradient(135deg, #6366f1, #8b5cf6)"
: "var(--bg-tertiary)",
color: modelSize === value ? "white" : "var(--text-secondary)"
}}
>
<div style={{ fontWeight: 500, fontSize: "0.8rem" }}>{label}</div>
<div style={{ fontSize: "0.65rem", opacity: 0.7, marginTop: "2px" }}>
{useFloat32 ? vramFp32 : vram}
</div>
</button>
))}
</div>
</div>
</div>
{/* Float32 Precision Toggle */}
<div
style={{
display: "flex",
alignItems: "center",
justifyContent: "space-between",
padding: "14px 16px",
marginBottom: "20px",
borderRadius: "10px",
background: useFloat32
? "linear-gradient(135deg, rgba(16, 185, 129, 0.15), rgba(6, 182, 212, 0.1))"
: "var(--bg-tertiary)",
border: useFloat32
? "1px solid rgba(16, 185, 129, 0.3)"
: "1px solid var(--border-color)"
}}
>
<div style={{ display: "flex", alignItems: "center", gap: "10px" }}>
<Zap style={{
width: "16px",
height: "16px",
color: useFloat32 ? "#10B981" : "var(--text-muted)"
}} />
<div>
<div style={{
fontSize: "0.85rem",
fontWeight: 500,
color: "var(--text-primary)"
}}>
High Quality Mode (float32)
</div>
<div style={{
fontSize: "0.7rem",
color: "var(--text-muted)",
marginTop: "2px"
}}>
Better separation quality, +2-3GB VRAM
</div>
</div>
</div>
<button
onClick={() => updateSetting("useFloat32", !useFloat32)}
style={{
width: "44px",
height: "24px",
borderRadius: "12px",
border: "none",
cursor: "pointer",
position: "relative",
background: useFloat32
? "linear-gradient(135deg, #10B981, #06B6D4)"
: "var(--bg-secondary)",
transition: "all 0.2s ease"
}}
>
<div style={{
width: "18px",
height: "18px",
borderRadius: "50%",
background: "white",
position: "absolute",
top: "3px",
left: useFloat32 ? "23px" : "3px",
transition: "left 0.2s ease",
boxShadow: "0 1px 3px rgba(0,0,0,0.2)"
}} />
</button>
</div>
{/* Chunk Duration Slider */}
<div
style={{
marginBottom: "20px",
padding: "16px",
borderRadius: "12px",
background: "var(--bg-tertiary)",
border: "1px solid var(--border-color)"
}}
>
<div style={{
display: "flex",
alignItems: "center",
justifyContent: "space-between",
marginBottom: "12px"
}}>
<label style={{
display: "flex",
alignItems: "center",
gap: "8px",
fontSize: "0.8rem",
fontWeight: 500,
color: "var(--text-muted)"
}}>
<Sliders style={{ width: "14px", height: "14px" }} />
Chunk Duration
</label>
<span style={{
fontSize: "0.9rem",
fontWeight: 600,
color: "var(--ghost-primary)",
fontFamily: "monospace"
}}>
{chunkDuration}s
</span>
</div>
<input
type="range"
min="5"
max="60"
step="5"
value={chunkDuration}
onChange={(e) => updateSetting("chunkDuration", Number(e.target.value))}
style={{
width: "100%",
height: "6px",
borderRadius: "3px",
background: `linear-gradient(to right, var(--ghost-primary) ${((chunkDuration - 5) / 55) * 100}%, var(--bg-secondary) ${((chunkDuration - 5) / 55) * 100}%)`,
cursor: "pointer",
appearance: "none",
outline: "none"
}}
/>
<div style={{
display: "flex",
justifyContent: "space-between",
marginTop: "8px",
fontSize: "0.65rem",
color: "var(--text-muted)"
}}>
<span>5s (Low VRAM)</span>
<span>60s (Fast)</span>
</div>
<p style={{
marginTop: "10px",
fontSize: "0.7rem",
color: "var(--text-muted)",
lineHeight: 1.4
}}>
⚡ Smaller chunks = Less VRAM usage, but slower processing & may affect quality at boundaries
</p>
</div>
{/* Auth Warning */}
{!isAuthenticated && (
<div
style={{
marginBottom: "16px",
padding: "14px 16px",
borderRadius: "10px",
display: "flex",
alignItems: "center",
gap: "12px",
background: "rgba(245, 158, 11, 0.1)",
border: "1px solid rgba(245, 158, 11, 0.25)"
}}
>
<AlertCircle style={{
width: "18px",
height: "18px",
flexShrink: 0,
color: "var(--ghost-warning)"
}} />
<p style={{
fontWeight: 500,
fontSize: "0.85rem",
color: "var(--ghost-warning)"
}}>
Connect HuggingFace to continue
</p>
</div>
)}
{/* Action Button */}
<button
onClick={handleSeparate}
disabled={!activePrompt}
style={{
width: "100%",
padding: "16px",
borderRadius: "12px",
border: "none",
cursor: activePrompt ? "pointer" : "not-allowed",
fontSize: "1rem",
fontWeight: 600,
background: activePrompt
? "linear-gradient(135deg, var(--ghost-primary), var(--ghost-accent))"
: "var(--bg-tertiary)",
color: activePrompt ? "white" : "var(--text-muted)",
opacity: activePrompt ? 1 : 0.6,
boxShadow: activePrompt ? "0 4px 15px rgba(168, 85, 247, 0.3)" : "none"
}}
>
{mode === "extract" ? "✨ Extract" : "🗑️ Remove"} "{activePrompt || "..."}"
</button>
</div>
</div>
);
}
================================================
FILE: frontend/src/components/StemMixer.tsx
================================================
"use client";
import { useState, useRef, useEffect, useCallback } from "react";
import {
Play,
Pause,
Download,
Volume2,
VolumeX,
RefreshCw,
Ghost,
Leaf,
SkipBack,
Music,
type LucideIcon
} from "lucide-react";
import WaveSurfer from "wavesurfer.js";
interface StemMixerProps {
taskId: string;
description: string;
onNewSeparation: () => void;
onUploadNew?: () => void;
audioDuration?: number;
processingTime?: number;
modelSize?: string;
}
interface Track {
id: "ghost" | "clean";
label: string;
icon: LucideIcon;
color: string;
waveColor: string;
}
const TRACKS: Track[] = [
{
id: "ghost",
label: "Isolated Sound",
icon: Ghost,
color: "#F472B6",
waveColor: "#F472B6"
},
{
id: "clean",
label: "Without Isolated Sound",
icon: Leaf,
color: "#60A5FA",
waveColor: "#60A5FA"
},
];
export default function StemMixer({
taskId,
description,
onNewSeparation,
onUploadNew,
audioDuration,
processingTime,
modelSize
}: StemMixerProps) {
const [isPlaying, setIsPlaying] = useState(false);
const [currentTime, setCurrentTime] = useState(0);
const [duration, setDuration] = useState(0);
const [muted, setMuted] = useState<Record<string, boolean>>({
ghost: false,
clean: false,
});
const [isReady, setIsReady] = useState<Record<string, boolean>>({
ghost: false,
clean: false,
});
const wavesurferRefs = useRef<Record<string, WaveSurfer | null>>({});
const containerRefs = useRef<Record<string, HTMLDivElement | null>>({});
const isSeeking = useRef(false);
const getAudioUrl = (trackId: string) => {
return `http://localhost:8000/api/tasks/${taskId}/download/${trackId}`;
};
useEffect(() => {
const initWaveSurfers = async () => {
for (const track of TRACKS) {
const container = containerRefs.current[track.id];
if (!container) continue;
if (wavesurferRefs.current[track.id]) {
wavesurferRefs.current[track.id]?.destroy();
}
const ws = WaveSurfer.create({
container,
waveColor: `${track.waveColor}40`,
progressColor: track.waveColor,
cursorColor: "#ffffff",
cursorWidth: 1,
barWidth: 2,
barGap: 2,
barRadius: 2,
height: 48,
normalize: true,
// All tracks are seekable - clicking any will sync all others
interact: true,
hideScrollbar: true,
});
ws.load(getAudioUrl(track.id));
ws.on("ready", () => {
setIsReady(prev => ({ ...prev, [track.id]: true }));
if (track.id === "ghost") {
setDuration(ws.getDuration());
}
ws.setMuted(muted[track.id]);
});
ws.on("audioprocess", () => {
if (!isSeeking.current && track.id === "ghost") {
setCurrentTime(ws.getCurrentTime());
}
});
ws.on("finish", () => {
// Only trigger finish logic from the "ghost" track (master)
// to prevent multiple finish events causing desync
if (track.id === "ghost") {
setIsPlaying(false);
setCurrentTime(0);
// Seek all tracks back to start synchronized
Object.values(wavesurferRefs.current).forEach(w => {
if (w) {
w.pause();
w.seekTo(0);
}
});
}
});
// Sync seeking across all tracks - when ANY track is seeked
ws.on("seeking", () => {
if (isSeeking.current) return; // Prevent recursive seeking
isSeeking.current = true;
const progress = ws.getCurrentTime() / ws.getDuration();
// Sync all OTHER tracks to the same position
Object.entries(wavesurferRefs.current).forEach(([id, w]) => {
if (w && id !== track.id) {
w.seekTo(progress);
}
});
setCurrentTime(ws.getCurrentTime());
// Reset seeking flag after a short delay
setTimeout(() => {
isSeeking.current = false;
}, 50);
});
wavesurferRefs.current[track.id] = ws;
}
};
initWaveSurfers();
return () => {
// Use setTimeout to avoid AbortError when component unmounts during loading
const refs = { ...wavesurferRefs.current };
setTimeout(() => {
Object.values(refs).forEach(ws => {
if (ws) {
try { ws.destroy(); } catch { /* ignore AbortError */ }
}
});
}, 0);
};
}, [taskId]);
// Continuous sync effect - keep all tracks aligned to original during playback
useEffect(() => {
let animationFrameId: number;
const syncInterval = 100; // Sync every 100ms
let lastSync = 0;
const syncTracks = (timestamp: number) => {
if (isPlaying && timestamp - lastSync > syncInterval) {
lastSync = timestamp;
const originalWs = wavesurferRefs.current["original"];
if (originalWs) {
const masterTime = originalWs.getCurrentTime();
const masterDuration = originalWs.getDuration();
const progress = masterTime / masterDuration;
// Check if original finished
if (masterTime >= masterDuration - 0.1) {
// Stop all tracks
Object.values(wavesurferRefs.current).forEach(ws => ws?.pause());
Object.values(wavesurferRefs.current).forEach(ws => ws?.seekTo(0));
setIsPlaying(false);
setCurrentTime(0);
return;
}
// Sync other tracks to master
Object.entries(wavesurferRefs.current).forEach(([id, ws]) => {
if (ws && id !== "original") {
const trackTime = ws.getCurrentTime();
// Only sync if drift is more than 0.05 seconds
if (Math.abs(trackTime - masterTime) > 0.05) {
ws.seekTo(progress);
}
}
});
setCurrentTime(masterTime);
}
}
animationFrameId = requestAnimationFrame(syncTracks);
};
if (isPlaying) {
animationFrameId = requestAnimationFrame(syncTracks);
}
return () => {
if (animationFrameId) {
cancelAnimationFrame(animationFrameId);
}
};
}, [isPlaying]);
// Sync play/pause across all tracks
const togglePlayAll = useCallback(() => {
const allReady = Object.values(isReady).every(r => r);
if (!allReady) return;
if (isPlaying) {
// Pause all
Object.values(wavesurferRefs.current).forEach(ws => ws?.pause());
setIsPlaying(false);
} else {
// First sync all tracks to the same position
const originalWs = wavesurferRefs.current["original"];
if (originalWs) {
const progress = originalWs.getCurrentTime() / originalWs.getDuration();
Object.entries(wavesurferRefs.current).forEach(([id, ws]) => {
if (ws && id !== "original") {
ws.seekTo(progress);
}
});
}
// Then play all together
Object.values(wavesurferRefs.current).forEach(ws => ws?.play());
setIsPlaying(true);
}
}, [isPlaying, isReady]);
const resetToStart = useCallback(() => {
Object.values(wavesurferRefs.current).forEach(ws => {
if (ws) {
ws.pause();
ws.seekTo(0);
}
});
setIsPlaying(false);
setCurrentTime(0);
}, []);
const toggleMute = useCallback((trackId: string) => {
const ws = wavesurferRefs.current[trackId];
if (ws) {
const newMuted = !muted[trackId];
ws.setMuted(newMuted);
setMuted(prev => ({ ...prev, [trackId]: newMuted }));
}
}, [muted]);
const handleSeek = useCallback((e: React.MouseEvent<HTMLDivElement>) => {
const rect = e.currentTarget.getBoundingClientRect();
const progress = Math.max(0, Math.min(1, (e.clientX - rect.left) / rect.width));
isSeeking.current = true;
// Seek all tracks to the same position
Object.values(wavesurferRefs.current).forEach(ws => {
if (ws) ws.seekTo(progress);
});
setCurrentTime(progress * duration);
isSeeking.current = false;
}, [duration]);
const downloadTrack = (trackId: string, label: string) => {
const link = document.createElement("a");
link.href = getAudioUrl(trackId);
link.download = `${taskId}_${label.toLowerCase().replace(/\s+/g, "_")}.wav`;
document.body.appendChild(link);
link.click();
document.body.removeChild(link);
};
const formatTime = (seconds: number) => {
const mins = Math.floor(seconds / 60);
const secs = Math.floor(seconds % 60);
return `${mins}:${secs.toString().padStart(2, "0")}`;
};
const allReady = Object.values(isReady).every(r => r);
return (
<div
style={{
background: "var(--bg-secondary)",
borderRadius: "16px",
border: "1px solid var(--glass-border)",
overflow: "hidden"
}}
>
{/* Header */}
<div
style={{
padding: "20px 24px",
borderBottom: "1px solid var(--glass-border)",
display: "flex",
alignItems: "center",
justifyContent: "space-between"
}}
>
<div>
<h3 style={{
fontSize: "1rem",
fontWeight: 600,
color: "var(--text-primary)",
marginBottom: "4px"
}}>
✨ Separation Complete
</h3>
<p style={{
fontSize: "0.8rem",
color: "var(--text-muted)"
}}>
"{description}"
</p>
</div>
<div style={{ display: "flex", gap: "8px" }}>
{onUploadNew && (
<button
onClick={onUploadNew}
style={{
display: "flex",
alignItems: "center",
gap: "6px",
padding: "8px 14px",
borderRadius: "8px",
background: "var(--bg-tertiary)",
color: "var(--text-secondary)",
border: "1px solid var(--border-color)",
cursor: "pointer",
fontSize: "0.8rem",
fontWeight: 500
}}
>
↩ Upload New File
</button>
)}
<button
onClick={onNewSeparation}
style={{
display: "flex",
alignItems: "center",
gap: "6px",
padding: "8px 14px",
borderRadius: "8px",
background: "var(--bg-tertiary)",
color: "var(--text-secondary)",
border: "1px solid var(--border-color)",
cursor: "pointer",
fontSize: "0.8rem",
fontWeight: 500
}}
>
<RefreshCw style={{ width: "14px", height: "14px" }} />
New
</button>
</div>
</div>
{/* Stats Bar */}
{(audioDuration || processingTime || modelSize) && (
<div
style={{
padding: "12px 24px",
borderBottom: "1px solid var(--glass-border)",
display: "flex",
gap: "24px",
background: "var(--bg-tertiary)"
}}
>
{audioDuration !== undefined && (
<div style={{ display: "flex", alignItems: "center", gap: "6px" }}>
<span style={{ fontSize: "0.75rem", color: "var(--text-muted)" }}>
Audio:
</span>
<span style={{
fontSize: "0.8rem",
fontWeight: 600,
color: "var(--text-primary)",
fontFamily: "monospace"
}}>
{Math.floor(audioDuration / 60)}:{(audioDuration % 60).toFixed(0).padStart(2, "0")}
</span>
</div>
)}
{processingTime !== undefined && (
<div style={{ display: "flex", alignItems: "center", gap: "6px" }}>
<span style={{ fontSize: "0.75rem", color: "var(--text-muted)" }}>
Processing:
</span>
<span style={{
fontSize: "0.8rem",
fontWeight: 600,
color: "#10B981",
fontFamily: "monospace"
}}>
{processingTime.toFixed(1)}s
</span>
</div>
)}
{modelSize && (
<div style={{ display: "flex", alignItems: "center", gap: "6px" }}>
<span style={{ fontSize: "0.75rem", color: "var(--text-muted)" }}>
Model:
</span>
<span style={{
fontSize: "0.8rem",
fontWeight: 600,
color: "var(--ghost-primary)",
textTransform: "capitalize"
}}>
{modelSize}
</span>
</div>
)}
{audioDuration && processingTime && (
<div style={{ display: "flex", alignItems: "center", gap: "6px", marginLeft: "auto" }}>
<span style={{ fontSize: "0.75rem", color: "var(--text-muted)" }}>
Speed:
</span>
<span style={{
fontSize: "0.8rem",
fontWeight: 600,
color: "#F59E0B",
fontFamily: "monospace"
}}>
{(audioDuration / processingTime).toFixed(1)}x
</span>
</div>
)}
</div>
)}
{/* Transport Controls */}
<div
style={{
padding: "16px 24px",
borderBottom: "1px solid var(--glass-border)",
display: "flex",
alignItems: "center",
gap: "12px",
background: "var(--bg-tertiary)"
}}
>
<button
onClick={resetToStart}
disabled={!allReady}
style={{
width: "32px",
height: "32px",
borderRadius: "6px",
display: "flex",
alignItems: "center",
justifyContent: "center",
background: "var(--bg-secondary)",
color: "var(--text-muted)",
border: "none",
cursor: allReady ? "pointer" : "not-allowed",
opacity: allReady ? 1 : 0.5
}}
>
<SkipBack style={{ width: "14px", height: "14px" }} />
</button>
<button
onClick={togglePlayAll}
disabled={!allReady}
style={{
width: "40px",
height: "40px",
borderRadius: "8px",
display: "flex",
alignItems: "center",
justifyContent: "center",
background: isPlaying
? "linear-gradient(135deg, var(--ghost-primary), var(--ghost-accent))"
: "linear-gradient(135deg, #6366f1, #8b5cf6)",
border: "none",
cursor: allReady ? "pointer" : "not-allowed",
opacity: allReady ? 1 : 0.5,
boxShadow: "0 2px 8px rgba(99, 102, 241, 0.3)"
}}
>
{isPlaying ? (
<Pause style={{ width: "18px", height: "18px", color: "white" }} />
) : (
<Play style={{ width: "18px", height: "18px", color: "white", marginLeft: "2px" }} />
)}
</button>
<div style={{ flex: 1, display: "flex", alignItems: "center", gap: "12px" }}>
<span style={{
fontSize: "0.75rem",
fontFamily: "monospace",
color: "var(--text-muted)",
minWidth: "36px"
}}>
{formatTime(currentTime)}
</span>
<div
style={{
flex: 1,
height: "4px",
borderRadius: "2px",
background: "var(--bg-secondary)",
cursor: "pointer",
position: "relative"
}}
onClick={handleSeek}
>
<div
style={{
position: "absolute",
left: 0,
top: 0,
height: "100%",
borderRadius: "2px",
background: "linear-gradient(90deg, #6366f1, #8b5cf6)",
width: `${duration > 0 ? (currentTime / duration) * 100 : 0}%`,
transition: "width 0.1s"
}}
/>
</div>
<span style={{
fontSize: "0.75rem",
fontFamily: "monospace",
color: "var(--text-muted)",
minWidth: "36px"
}}>
{formatTime(duration)}
</span>
</div>
</div>
{/* Tracks */}
<div style={{ padding: "20px 24px", display: "flex", flexDirection: "column", gap: "16px" }}>
{TRACKS.map((track) => {
const TrackIcon = track.icon;
const isMuted = muted[track.id];
const trackReady = isReady[track.id];
return (
<div
key={track.id}
style={{
display: "flex",
alignItems: "center",
gap: "12px",
padding: "14px 16px",
borderRadius: "12px",
background: isMuted ? "var(--bg-tertiary)" : `${track.color}08`,
border: `1px solid ${isMuted ? "var(--border-color)" : `${track.color}30`}`,
opacity: isMuted ? 0.6 : 1,
transition: "all 0.2s ease"
}}
>
{/* Mute Button */}
<button
onClick={() => toggleMute(track.id)}
style={{
width: "32px",
height: "32px",
borderRadius: "8px",
display: "flex",
alignItems: "center",
justifyContent: "center",
background: isMuted ? "var(--bg-secondary)" : `${track.color}20`,
color: isMuted ? "var(--text-muted)" : track.color,
border: "none",
cursor: "pointer",
flexShrink: 0
}}
>
{isMuted ? (
<VolumeX style={{ width: "16px", height: "16px" }} />
) : (
<Volume2 style={{ width: "16px", height: "16px" }} />
)}
</button>
{/* Track Label */}
<div style={{
display: "flex",
alignItems: "center",
gap: "10px",
minWidth: "180px",
flexShrink: 0
}}>
<TrackIcon
style={{
width: "16px",
height: "16px",
color: isMuted ? "var(--text-muted)" : track.color
}}
/>
<span style={{
fontSize: "0.85rem",
fontWeight: 500,
color: isMuted ? "var(--text-muted)" : "var(--text-primary)"
}}>
{track.label}
</span>
</div>
{/* Waveform */}
<div
ref={(el) => { containerRefs.current[track.id] = el; }}
style={{
flex: 1,
borderRadius: "8px",
overflow: "hidden",
background: "var(--bg-secondary)",
minHeight: "48px"
}}
>
{!trackReady && (
<div style={{
height: "48px",
display: "flex",
alignItems: "center",
justifyContent: "center"
}}>
<span style={{
fontSize: "0.75rem",
color: "var(--text-muted)"
}}>
Loading...
</span>
</div>
)}
</div>
{/* Download */}
<button
onClick={() => downloadTrack(track.id, track.label)}
style={{
width: "32px",
height: "32px",
borderRadius: "8px",
display: "flex",
alignItems: "center",
justifyContent: "center",
background: "var(--bg-secondary)",
color: "var(--text-muted)",
border: "none",
cursor: "pointer",
flexShrink: 0
}}
>
<Download style={{ width: "16px", height: "16px" }} />
</button>
</div>
);
})}
</div>
{/* Download All */}
<div style={{
padding: "20px 24px",
borderTop: "1px solid var(--glass-border)"
}}>
<button
onClick={() => TRACKS.forEach((track) => downloadTrack(track.id, track.label))}
style={{
width: "100%",
padding: "14px",
borderRadius: "10px",
background: "linear-gradient(135deg, #6366f1, #8b5cf6)",
color: "white",
border: "none",
cursor: "pointer",
fontSize: "0.9rem",
fontWeight: 600,
display: "flex",
alignItems: "center",
justifyContent: "center",
gap: "8px",
boxShadow: "0 4px 12px rgba(99, 102, 241, 0.3)"
}}
>
<Download style={{ width: "18px", height: "18px" }} />
Download All Stems
</button>
</div>
</div>
);
}
================================================
FILE: frontend/src/components/VideoStemMixer.tsx
================================================
"use client";
import { useState, useRef, useEffect, useCallback } from "react";
import {
Play,
Pause,
Download,
Volume2,
VolumeX,
RefreshCw,
Ghost,
Leaf,
SkipBack,
Film,
type LucideIcon
} from "lucide-react";
import WaveSurfer from "wavesurfer.js";
interface VideoStemMixerProps {
taskId: string;
description: string;
onNewSeparation: () => void;
onUploadNew?: () => void;
audioDuration?: number;
processingTime?: number;
modelSize?: string;
}
interface Track {
id: "ghost" | "clean";
label: string;
icon: LucideIcon;
color: string;
waveColor: string;
}
const TRACKS: Track[] = [
{
id: "ghost",
label: "Isolated Sound",
icon: Ghost,
color: "#F472B6",
waveColor: "#F472B6"
},
{
id: "clean",
label: "Without Isolated Sound",
icon: Leaf,
color: "#60A5FA",
waveColor: "#60A5FA"
},
];
export default function VideoStemMixer({
taskId,
description,
onNewSeparation,
onUploadNew,
audioDuration,
processingTime,
modelSize
}: VideoStemMixerProps) {
const [isPlaying, setIsPlaying] = useState(false);
const [currentTime, setCurrentTime] = useState(0);
const [duration, setDuration] = useState(0);
const [muted, setMuted] = useState<Record<string, boolean>>({
ghost: false,
clean: false,
});
const [videoMuted, setVideoMuted] = useState(true);
const [showVideoDownload, setShowVideoDownload] = useState(false);
const [isReady, setIsReady] = useState<Record<string, boolean>>({
video: false,
ghost: false,
clean: false,
});
const videoRef = useRef<HTMLVideoElement>(null);
const wavesurferRefs = useRef<Record<string, WaveSurfer | null>>({});
const containerRefs = useRef<Record<string, HTMLDivElement | null>>({});
const isSeeking = useRef(false);
const getAudioUrl = (trackId: string) => {
return `http://localhost:8000/api/tasks/${taskId}/download/${trackId}`;
};
const getVideoUrl = () => {
return `http://localhost:8000/api/tasks/${taskId}/download/video`;
};
// Initialize video and wavesurfers
useEffect(() => {
const initWaveSurfers = async () => {
for (const track of TRACKS) {
const container = containerRefs.current[track.id];
if (!container) continue;
if (wavesurferRefs.current[track.id]) {
wavesurferRefs.current[track.id]?.destroy();
}
const ws = WaveSurfer.create({
container,
waveColor: `${track.waveColor}40`,
progressColor: track.waveColor,
cursorColor: "#ffffff",
cursorWidth: 1,
barWidth: 2,
barGap: 2,
barRadius: 2,
height: 48,
normalize: true,
interact: true,
hideScrollbar: true,
});
ws.load(getAudioUrl(track.id));
ws.on("ready", () => {
setIsReady(prev => ({ ...prev, [track.id]: true }));
if (track.id === "ghost") {
setDuration(ws.getDuration());
}
ws.setMuted(muted[track.id]);
});
// When user clicks on waveform - sync video and other tracks
// Use 'interaction' event to detect actual user clicks vs programmatic seeks
let isUserInteracting = false;
ws.on("interaction", () => {
isUserInteracting = true;
});
ws.on("seeking", () => {
// Only handle user-initiated seeks, not programmatic ones
if (!isUserInteracting) return;
isUserInteracting = false;
if (isSeeking.current) return;
isSeeking.current = true;
const progress = ws.getCurrentTime() / ws.getDuration();
const newTime = ws.getCurrentTime();
// Sync video
if (videoRef.current) {
videoRef.current.currentTime = newTime;
}
// Sync other audio tracks
Object.entries(wavesurferRefs.current).forEach(([id, w]) => {
if (w && id !== track.id) {
w.seekTo(progress);
}
});
setCurrentTime(newTime);
// Reset seeking flag after short delay
setTimeout(() => {
isSeeking.current = false;
}, 150);
});
wavesurferRefs.current[track.id] = ws;
}
};
initWaveSurfers();
return () => {
const refs = { ...wavesurferRefs.current };
setTimeout(() => {
Object.values(refs).forEach(ws => {
if (ws) {
try { ws.destroy(); } catch { /* ignore */ }
}
});
}, 0);
};
}, [taskId]);
// Handle video events
useEffect(() => {
const video = videoRef.current;
if (!video) return;
const handleTimeUpdate = () => {
if (!isSeeking.current) {
setCurrentTime(video.currentTime);
}
};
const handleLoadedMetadata = () => {
setIsReady(prev => ({ ...prev, video: true }));
setDuration(video.duration);
};
const handleEnded = () => {
setIsPlaying(false);
setCurrentTime(0);
// Pause all audio tracks when video ends
Object.values(wavesurferRefs.current).forEach(w => {
if (w) {
w.pause();
w.seekTo(0);
}
});
};
// When video starts seeking (user dragging video scrubber)
const handleSeeking = () => {
isSeeking.current = true;
};
// When video finishes seeking - sync all audio tracks to video position
const handleSeeked = () => {
const progress = video.currentTime / video.duration;
// Sync all audio tracks to the new video position
Object.values(wavesurferRefs.current).forEach(ws => {
if (ws) {
ws.seekTo(progress);
}
});
setCurrentTime(video.currentTime);
// Small delay before allowing new syncs
setTimeout(() => {
isSeeking.current = false;
}, 100);
};
video.addEventListener("timeupdate", handleTimeUpdate);
video.addEventListener("loadedmetadata", handleLoadedMetadata);
video.addEventListener("ended", handleEnded);
video.addEventListener("seeking", handleSeeking);
video.addEventListener("seeked", handleSeeked);
return () => {
video.removeEventListener("timeupdate", handleTimeUpdate);
video.removeEventListener("loadedmetadata", handleLoadedMetadata);
video.removeEventListener("ended", handleEnded);
video.removeEventListener("seeking", handleSeeking);
video.removeEventListener("seeked", handleSeeked);
};
}, []);
// Continuous sync effect - keeps audio tracks aligned with video during playback
useEffect(() => {
let animationFrameId: number;
const syncInterval = 150; // Check less frequently
let lastSync = 0;
const syncTracks = (timestamp: number) => {
// Skip sync during seeking to prevent interference
if (isPlaying && !isSeeking.current && timestamp - lastSync > syncInterval) {
lastSync = timestamp;
const video = videoRef.current;
if (video && !video.seeking) {
const masterTime = video.currentTime;
const masterDuration = video.duration;
const progress = masterTime / masterDuration;
// Sync audio tracks to video only if drift is significant
Object.values(wavesurferRefs.current).forEach(ws => {
if (ws) {
const trackTime = ws.getCurrentTime();
// Only sync if drift is more than 0.15 seconds
if (Math.abs(trackTime - masterTime) > 0.15) {
ws.seekTo(progress);
}
}
});
setCurrentTime(masterTime);
}
}
animationFrameId = requestAnimationFrame(syncTracks);
};
if (isPlaying) {
animationFrameId = requestAnimationFrame(syncTracks);
}
return () => {
if (animationFrameId) {
cancelAnimationFrame(animationFrameId);
}
};
}, [isPlaying]);
const togglePlayAll = useCallback(() => {
const allReady = Object.values(isReady).every(r => r);
if (!allReady) return;
const video = videoRef.current;
if (!video) return;
if (isPlaying) {
video.pause();
Object.values(wavesurferRefs.current).forEach(ws => ws?.pause());
setIsPlaying(false);
} else {
// Sync all tracks to video position first
const progress = video.currentTime / video.duration;
Object.values(wavesurferRefs.current).forEach(ws => {
if (ws) ws.seekTo(progress);
});
video.play();
Object.values(wavesurferRefs.current).forEach(ws => ws?.play());
setIsPlaying(true);
}
}, [isPlaying, isReady]);
const resetToStart = useCallback(() => {
const video = videoRef.current;
if (video) {
video.pause();
video.currentTime = 0;
}
Object.values(wavesurferRefs.current).forEach(ws => {
if (ws) {
ws.pause();
ws.seekTo(0);
}
});
setIsPlaying(false);
setCurrentTime(0);
}, []);
const toggleMute = useCallback((trackId: string) => {
const ws = wavesurferRefs.current[trackId];
if (ws) {
const newMuted = !muted[trackId];
ws.setMuted(newMuted);
setMuted(prev => ({ ...prev, [trackId]: newMuted }));
}
}, [muted]);
const handleSeek = useCallback((e: React.MouseEvent<HTMLDivElement>) => {
const rect = e.currentTarget.getBoundingClientRect();
const progress = Math.max(0, Math.min(1, (e.clientX - rect.left) / rect.width));
const newTime = progress * duration;
isSeeking.current = true;
// Seek video - video's 'seeked' event will reset isSeeking and sync audio
if (videoRef.current) {
videoRef.current.currentTime = newTime;
}
// Also seek audio tracks immediately for visual feedback
Object.values(wavesurferRefs.current).forEach(ws => {
if (ws) ws.seekTo(progress);
});
setCurrentTime(newTime);
// Note: isSeeking is reset by video's 'seeked' event
}, [duration]);
const downloadTrack = (trackId: string, label: string) => {
const link = document.createElement("a");
link.href = getAudioUrl(trackId);
link.download = `${taskId}_${label.toLowerCase().replace(/\s+/g, "_")}.wav`;
document.body.appendChild(link);
link.click();
document.body.removeChild(link);
};
const downloadVideoWithAudio = (audioType: "original" | "ghost" | "clean") => {
const link = document.createElement("a");
link.href = `http://localhost:8000/api/tasks/${taskId}/download-video-with-audio/${audioType}`;
const labels = { original: "original", ghost: "isolated", clean: "without_isolated" };
link.download = `${taskId}_${labels[audioType]}_video.mp4`;
document.body.appendChild(link);
link.click();
document.body.removeChild(link);
setShowVideoDownload(false);
};
const formatTime = (seconds: number) => {
const mins = Math.floor(seconds / 60);
const secs = Math.floor(seconds % 60);
return `${mins}:${secs.toString().padStart(2, "0")}`;
};
const allReady = Object.values(isReady).every(r => r);
return (
<div
style={{
background: "var(--bg-secondary)",
borderRadius: "16px",
border: "1px solid var(--glass-border)",
overflow: "hidden"
}}
>
{/* Header */}
<div
style={{
padding: "20px 24px",
borderBottom: "1px solid var(--glass-border)",
display: "flex",
alignItems: "center",
justifyContent: "space-between"
}}
>
<div>
<h3 style={{
fontSize: "1rem",
fontWeight: 600,
color: "var(--text-primary)",
marginBottom: "4px",
display: "flex",
alignItems: "center",
gap: "8px"
}}>
<Film style={{ width: "18px", height: "18px", color: "var(--ghost-primary)" }} />
Video Separation Complete
</h3>
<p style={{
fontSize: "0.8rem",
color: "var(--text-muted)"
}}>
"{description}"
</p>
</div>
<div style={{ display: "flex", gap: "8px" }}>
{onUploadNew && (
<button
onClick={onUploadNew}
style={{
display: "flex",
alignItems: "center",
gap: "6px",
padding: "8px 14px",
borderRadius: "8px",
background: "var(--bg-tertiary)",
color: "var(--text-secondary)",
border: "1px solid var(--border-color)",
cursor: "pointer",
fontSize: "0.8rem",
fontWeight: 500
}}
>
↩ Upload New File
</button>
)}
<button
onClick={onNewSeparation}
style={{
display: "flex",
alignItems: "center",
gap: "6px",
padding: "8px 14px",
borderRadius: "8px",
background: "var(--bg-tertiary)",
color: "var(--text-secondary)",
border: "1px solid var(--border-color)",
cursor: "pointer",
fontSize: "0.8rem",
fontWeight: 500
}}
>
<RefreshCw style={{ width: "14px", height: "14px" }} />
New
</button>
</div>
</div>
{/* Video Player */}
<div style={{ padding: "16px 24px", borderBottom: "1px solid var(--glass-border)" }}>
<div style={{ position: "relative" }}>
<video
ref={videoRef}
src={getVideoUrl()}
muted={videoMuted}
playsInline
style={{
width: "100%",
maxHeight: "400px",
borderRadius: "12px",
background: "#000",
objectFit: "contain"
}}
/>
{/* Video Mute Toggle Button */}
<button
onClick={() => setVideoMuted(!videoMuted)}
style={{
position: "absolute",
bottom: "12px",
right: "12px",
width: "40px",
height: "40px",
borderRadius: "50%",
background: "rgba(0, 0, 0, 0.7)",
border: "1px solid rgba(255, 255, 255, 0.2)",
color: videoMuted ? "var(--text-muted)" : "#fff",
cursor: "pointer",
display: "flex",
alignItems: "center",
justifyContent: "center",
transition: "all 0.2s ease"
}}
title={videoMuted ? "Unmute video" : "Mute video"}
>
{videoMuted ? (
<VolumeX style={{ width: "18px", height: "18px" }} />
) : (
<Volume2 style={{ width: "18px", height: "18px" }} />
)}
</button>
</div>
<p style={{
fontSize: "0.7rem",
color: "var(--text-muted)",
marginTop: "8px",
textAlign: "center"
}}>
{videoMuted
? "Video is muted. Audio plays from separated stems below."
: "Playing original video audio. Stem audio may overlap."}
</p>
</div>
{/* Stats Bar */}
{(audioDuration || processingTime || modelSize) && (
<div
style={{
padding: "12px 24px",
borderBottom: "1px solid var(--glass-border)",
display: "flex",
gap: "24px",
background: "var(--bg-tertiary)"
}}
>
{audioDuration !== undefined && (
<div style={{ display: "flex", alignItems: "center", gap: "6px" }}>
<span style={{ fontSize: "0.75rem", color: "var(--text-muted)" }}>
Duration:
</span>
<span style={{
fontSize: "0.8rem",
fontWeight: 600,
color: "var(--text-primary)",
fontFamily: "monospace"
}}>
{Math.floor(audioDuration / 60)}:{(audioDuration % 60).toFixed(0).padStart(2, "0")}
</span>
</div>
)}
{processingTime !== undefined && (
<div style={{ display: "flex", alignItems: "center", gap: "6px" }}>
<span style={{ fontSize: "0.75rem", color: "var(--text-muted)" }}>
Processing:
</span>
<span style={{
fontSize: "0.8rem",
fontWeight: 600,
color: "#10B981",
fontFamily: "monospace"
}}>
{processingTime.toFixed(1)}s
</span>
</div>
)}
{modelSize && (
<div style={{ display: "flex", alignItems: "center", gap: "6px" }}>
<span style={{ fontSize: "0.75rem", color: "var(--text-muted)" }}>
Model:
</span>
<span style={{
fontSize: "0.8rem",
fontWeight: 600,
color: "var(--ghost-primary)",
textTransform: "capitalize"
}}>
{modelSize}
</span>
</div>
)}
</div>
)}
{/* Transport Controls */}
<div
style={{
padding: "16px 24px",
borderBottom: "1px solid var(--glass-border)",
display: "flex",
alignItems: "center",
gap: "12px",
background: "var(--bg-tertiary)"
}}
>
<button
onClick={resetToStart}
disabled={!allReady}
style={{
width: "32px",
height: "32px",
borderRadius: "6px",
display: "flex",
alignItems: "center",
justifyContent: "center",
background: "var(--bg-secondary)",
color: "var(--text-muted)",
border: "none",
cursor: allReady ? "pointer" : "not-allowed",
opacity: allReady ? 1 : 0.5
}}
>
<SkipBack style={{ width: "14px", height: "14px" }} />
</button>
<button
onClick={togglePlayAll}
disabled={!allReady}
style={{
width: "40px",
height: "40px",
borderRadius: "8px",
display: "flex",
alignItems: "center",
justifyContent: "center",
background: isPlaying
? "linear-gradient(135deg, var(--ghost-primary), var(--ghost-accent))"
: "linear-gradient(135deg, #6366f1, #8b5cf6)",
border: "none",
cursor: allReady ? "pointer" : "not-allowed",
opacity: allReady ? 1 : 0.5,
boxShadow: "0 2px 8px rgba(99, 102, 241, 0.3)"
}}
>
{isPlaying ? (
<Pause style={{ width: "18px", height: "18px", color: "white" }} />
) : (
<Play style={{ width: "18px", height: "18px", color: "white", marginLeft: "2px" }} />
)}
</button>
<div style={{ flex: 1, display: "flex", alignItems: "center", gap: "12px" }}>
<span style={{
fontSize: "0.75rem",
fontFamily: "monospace",
color: "var(--text-muted)",
minWidth: "36px"
}}>
{formatTime(currentTime)}
</span>
<div
style={{
flex: 1,
height: "4px",
borderRadius: "2px",
background: "var(--bg-secondary)",
cursor: "pointer",
position: "relative"
}}
onClick={handleSeek}
>
<div
style={{
position: "absolute",
left: 0,
top: 0,
height: "100%",
borderRadius: "2px",
background: "linear-gradient(90deg, #6366f1, #8b5cf6)",
width: `${duration > 0 ? (currentTime / duration) * 100 : 0}%`,
transition: "width 0.1s"
}}
/>
</div>
<span style={{
fontSize: "0.75rem",
fontFamily: "monospace",
color: "var(--text-muted)",
minWidth: "36px"
}}>
{formatTime(duration)}
</span>
</div>
</div>
{/* Audio Tracks */}
<div style={{ padding: "20px 24px", display: "flex", flexDirection: "column", gap: "16px" }}>
{TRACKS.map((track) => {
const TrackIcon = track.icon;
const isMuted = muted[track.id];
const trackReady = isReady[track.id];
return (
<div
key={track.id}
style={{
display: "flex",
alignItems: "center",
gap: "12px",
padding: "14px 16px",
borderRadius: "12px",
background: isMuted ? "var(--bg-tertiary)" : `${track.color}08`,
border: `1px solid ${isMuted ? "var(--border-color)" : `${track.color}30`}`,
opacity: isMuted ? 0.6 : 1,
transition: "all 0.2s ease"
}}
>
{/* Mute Button */}
<button
onClick={() => toggleMute(track.id)}
style={{
width: "32px",
height: "32px",
borderRadius: "8px",
display: "flex",
alignItems: "center",
justifyContent: "center",
background: isMuted ? "var(--bg-secondary)" : `${track.color}20`,
color: isMuted ? "var(--text-muted)" : track.color,
border: "none",
cursor: "pointer",
flexShrink: 0
}}
>
{isMuted ? (
<VolumeX style={{ width: "16px", height: "16px" }} />
) : (
<Volume2 style={{ width: "16px", height: "16px" }} />
)}
</button>
{/* Track Label */}
<div style={{
display: "flex",
alignItems: "center",
gap: "10px",
minWidth: "180px",
flexShrink: 0
}}>
<TrackIcon
style={{
width: "16px",
height: "16px",
color: isMuted ? "var(--text-muted)" : track.color
}}
/>
<span style={{
fontSize: "0.85rem",
fontWeight: 500,
color: isMuted ? "var(--text-muted)" : "var(--text-primary)"
}}>
{track.label}
</span>
</div>
{/* Waveform */}
<div
ref={(el) => { containerRefs.current[track.id] = el; }}
style={{
flex: 1,
borderRadius: "8px",
overflow: "hidden",
background: "var(--bg-secondary)",
minHeight: "48px"
}}
>
{!trackReady && (
<div style={{
height: "48px",
display: "flex",
alignItems: "center",
justifyContent: "center"
}}>
<span style={{
fontSize: "0.75rem",
color: "var(--text-muted)"
}}>
Loading...
</span>
</div>
)}
</div>
{/* Download */}
<button
onClick={() => downloadTrack(track.id, track.label)}
style={{
width: "32px",
height: "32px",
borderRadius: "8px",
display: "flex",
alignItems: "center",
justifyContent: "center",
background: "var(--bg-secondary)",
color: "var(--text-muted)",
border: "none",
cursor: "pointer",
flexShrink: 0
}}
>
<Download style={{ width: "16px", height: "16px" }} />
</button>
</div>
);
})}
</div>
{/* Download All */}
<div style={{
padding: "20px 24px",
borderTop: "1px solid var(--glass-border)",
display: "flex",
flexDirection: "column",
gap: "12px"
}}>
<button
onClick={() => TRACKS.forEach((track) => downloadTrack(track.id, track.label))}
style={{
width: "100%",
padding: "14px",
borderRadius: "10px",
background: "linear-gradient(135deg, #6366f1, #8b5cf6)",
color: "white",
border: "none",
cursor: "pointer",
fontSize: "0.9rem",
fontWeight: 600,
display: "flex",
alignItems: "center",
justifyContent: "center",
gap: "8px",
boxShadow: "0 4px 12px rgba(99, 102, 241, 0.3)"
}}
>
<Download style={{ width: "18px", height: "18px" }} />
Download All Stems
</button>
{/* Download Video with Audio */}
<div>
<button
onClick={() => setShowVideoDownload(!showVideoDownload)}
style={{
width: "100%",
padding: "14px",
borderRadius: showVideoDownload ? "10px 10px 0 0" : "10px",
background: showVideoDownload
? "var(--bg-tertiary)"
: "linear-gradient(135deg, #059669, #10b981)",
color: "white",
border: "none",
cursor: "pointer",
fontSize: "0.9rem",
fontWeight: 600,
display: "flex",
alignItems: "center",
justifyContent: "center",
gap: "8px",
boxShadow: showVideoDownload
? "none"
: "0 4px 12px rgba(16, 185, 129, 0.3)"
}}
>
<Film style={{ width: "18px", height: "18px" }} />
Download Video with Audio
<span style={{
marginLeft: "4px",
transform: showVideoDownload ? "rotate(180deg)" : "rotate(0deg)",
transition: "transform 0.2s ease"
}}>▼</span>
</button>
{/* Inline Options */}
{showVideoDownload && (
<div style={{
background: "var(--bg-tertiary)",
border: "1px solid var(--glass-border)",
borderTop: "none",
borderRadius: "0 0 10px 10px",
overflow: "hidden"
}}>
<button
onClick={() => downloadVideoWithAudio("original")}
style={{
width: "100%",
padding: "12px 16px",
background: "transparent",
color: "var(--text-primary)",
border: "none",
borderBottom: "1px solid var(--glass-border)",
cursor: "pointer",
fontSize: "0.85rem",
textAlign: "left",
display: "flex",
alignItems: "center",
gap: "10px"
}}
>
<Volume2 style={{ width: "16px", height: "16px", color: "var(--text-muted)" }} />
Original Audio
</button>
<button
onClick={() => downloadVideoWithAudio("ghost")}
style={{
width: "100%",
padding: "12px 16px",
background: "transparent",
color: "var(--text-primary)",
border: "none",
borderBottom: "1px solid var(--glass-border)",
cursor: "pointer",
fontSize: "0.85rem",
textAlign: "left",
display: "flex",
alignItems: "center",
gap: "10px"
}}
>
<Ghost style={{ width: "16px", height: "16px", color: "#F472B6" }} />
Isolated Sound Only
</button>
<button
onClick={() => downloadVideoWithAudio("clean")}
style={{
width: "100%",
padding: "12px 16px",
background: "transparent",
color: "var(--text-primary)",
border: "none",
cursor: "pointer",
fontSize: "0.85rem",
textAlign: "left",
display: "flex",
alignItems: "center",
gap: "10px"
}}
>
<Leaf style={{ width: "16px", height: "16px", color: "#60A5FA" }} />
Without Isolated Sound
</button>
</div>
)}
</div>
</div>
</div>
);
}
================================================
FILE: frontend/src/components/WaveformEditor.tsx
================================================
"use client";
import { useEffect, useRef, useState } from "react";
import WaveSurfer from "wavesurfer.js";
import RegionsPlugin from "wavesurfer.js/dist/plugins/regions.js";
import { Play, Pause, RotateCcw, Scissors, X } from "lucide-react";
interface WaveformEditorProps {
audioUrl: string;
onRegionSelect: (region: { start: number; end: number } | null) => void;
selectedRegion: { start: number; end: number } | null;
}
export default function WaveformEditor({
audioUrl,
onRegionSelect,
selectedRegion
}: WaveformEditorProps) {
const containerRef = useRef<HTMLDivElement>(null);
const wavesurferRef = useRef<WaveSurfer | null>(null);
const regionsRef = useRef<RegionsPlugin | null>(null);
const [isPlaying, setIsPlaying] = useState(false);
const [currentTime, setCurrentTime] = useState(0);
const [duration, setDuration] = useState(0);
const [isLoaded, setIsLoaded] = useState(false);
useEffect(() => {
if (!containerRef.current) return;
let isMounted = true;
// Create regions plugin
const regions = RegionsPlugin.create();
regionsRef.current = regions;
// Create wavesurfer instance
const wavesurfer = WaveSurfer.create({
container: containerRef.current,
waveColor: "rgba(139, 92, 246, 0.5)",
progressColor: "#8B5CF6",
cursorColor: "#F472B6",
cursorWidth: 2,
barWidth: 3,
barGap: 2,
barRadius: 3,
height: 128,
normalize: true,
plugins: [regions],
});
wavesurferRef.current = wavesurfer;
// Event listeners
wavesurfer.on("ready", () => {
if (isMounted) {
setDuration(wavesurfer.getDuration());
setIsLoaded(true);
}
});
wavesurfer.on("audioprocess", () => {
if (isMounted) {
setCurrentTime(wavesurfer.getCurrentTime());
}
});
wavesurfer.on("seeking", () => {
if (isMounted) {
setCurrentTime(wavesurfer.getCurrentTime());
}
});
wavesurfer.on("play", () => isMounted && setIsPlaying(true));
wavesurfer.on("pause", () => isMounted && setIsPlaying(false));
// Region events
regions.on("region-created", (region) => {
// Only allow one region at a time
regions.getRegions().forEach((r) => {
if (r.id !== region.id) {
r.remove();
}
});
if (isMounted) {
onRegionSelect({
start: region.start,
end: region.end,
});
}
});
regions.on("region-updated", (region) => {
if (isMounted) {
onRegionSelect({
start: region.start,
end: region.end,
});
}
});
// Catch loading errors silently (including AbortError when component unmounts)
wavesurfer.on("error", (error) => {
// Silently ignore AbortError - this happens when unmounting during load
if (error?.name === "AbortError" || String(error).includes("abort")) {
return;
}
console.warn("WaveSurfer error:", error);
});
// Load audio
wavesurfer.load(audioUrl).catch((error) => {
// Silently ignore AbortError
if (error?.name === "AbortError" || String(error).includes("abort")) {
return;
}
console.warn("Failed to load audio:", error);
});
return () => {
isMounted = false;
// Use setTimeout to ensure any pending operations complete
// before attempting destruction
const ws = wavesurfer;
setTimeout(() => {
try {
ws.destroy();
} catch {
// Ignore AbortError and other destruction errors
// This happens when component unmounts during audio loading
}
}, 0);
};
}, [audioUrl]);
const togglePlayPause = () => {
wavesurferRef.current?.playPause();
};
const restart = () => {
wavesurferRef.current?.seekTo(0);
wavesurferRef.current?.play();
};
const createRegion = () => {
if (!regionsRef.current || !wavesurferRef.current) return;
const currentPos = wavesurferRef.current.getCurrentTime();
const dur = wavesurferRef.current.getDuration();
// Create a region from current position to +5 seconds
const start = currentPos;
const end = Math.min(currentPos + 5, dur);
regionsRef.current.addRegion({
start,
end,
color: "rgba(244, 114, 182, 0.3)",
drag: true,
resize: true,
});
};
const clearRegion = () => {
regionsRef.current?.getRegions().forEach((r) => r.remove());
onRegionSelect(null);
};
const formatTime = (seconds: number) => {
const mins = Math.floor(seconds / 60);
const secs = Math.floor(seconds % 60);
return `${mins}:${secs.toString().padStart(2, "0")}`;
};
return (
<div className="waveform-container">
{/* Header */}
<div className="flex items-center justify-between mb-4">
<h3 className="font-semibold" style={{ color: "var(--text-primary)" }}>
Waveform Editor
</h3>
<div className="flex items-center gap-2">
{selectedRegion && (
<span
className="text-sm px-3 py-1 rounded-full"
style={{
background: "rgba(244, 114, 182, 0.2)",
color: "var(--ghost-accent)"
}}
>
Selected: {formatTime(selectedRegion.start)} - {formatTime(selectedRegion.end)}
</span>
)}
</div>
</div>
{/* Waveform */}
<div
ref={containerRef}
className="rounded-xl overflow-hidden mb-4"
style={{ background: "var(--bg-primary)" }}
/>
{/* Loading skeleton */}
{!isLoaded && (
<div className="h-32 rounded-xl shimmer mb-4" />
)}
{/* Controls */}
<div className="flex items-center justify-between">
{/* Playback Controls */}
<div className="flex items-center gap-2">
<button
onClick={togglePlayPause}
className="w-12 h-12 rounded-xl flex items-center justify-center transition-all hover:scale-105"
style={{
background: "linear-gradient(135deg, var(--ghost-primary), #7C3AED)",
}}
>
{isPlaying ? (
<Pause className="w-6 h-6 text-white" />
) : (
<Play className="w-6 h-6 text-white ml-1" />
)}
</button>
<button
onClick={restart}
className="w-10 h-10 rounded-lg flex items-center justify-center transition-all hover:scale-105"
style={{ background: "var(--bg-tertiary)", color: "var(--text-secondary)" }}
>
<RotateCcw className="w-5 h-5" />
</button>
<span className="ml-4 font-mono text-sm" style={{ color: "var(--text-secondary)" }}>
{formatTime(currentTime)} / {formatTime(duration)}
</span>
</div>
{/* Region Controls */}
<div className="flex items-center gap-2">
<button
onClick={createRegion}
className="btn-secondary flex items-center gap-2 text-sm"
>
<Scissors className="w-4 h-4" />
Select Region
</button>
{selectedRegion && (
<button
onClick={clearRegion}
className="p-2 rounded-lg transition-all"
style={{ background: "var(--bg-tertiary)", color: "var(--ghost-error)" }}
>
<X className="w-5 h-5" />
</button>
)}
</div>
</div>
{/* Instructions */}
<p className="mt-4 text-sm" style={{ color: "var(--text-muted)" }}>
💡 Tip: Select a region to apply <strong>Temporal Lock</strong> - the AI will focus on that specific time range.
</p>
</div>
);
}
================================================
FILE: frontend/tsconfig.json
================================================
{
"compilerOptions": {
"target": "ES2017",
"lib": ["dom", "dom.iterable", "esnext"],
"allowJs": true,
"skipLibCheck": true,
"strict": true,
"noEmit": true,
"esModuleInterop": true,
"module": "esnext",
"moduleResolution": "bundler",
"resolveJsonModule": true,
"isolatedModules": true,
"jsx": "react-jsx",
"incremental": true,
"plugins": [
{
"name": "next"
}
],
"paths": {
"@/*": ["./src/*"]
}
},
"include": [
"next-env.d.ts",
"**/*.ts",
"**/*.tsx",
".next/types/**/*.ts",
".next/dev/types/**/*.ts",
"**/*.mts"
],
"exclude": ["node_modules"]
}
================================================
FILE: install.bat
================================================
@echo off
chcp 65001 >nul
title AudioGhost AI - One-Click Installer
echo.
echo ╔══════════════════════════════════════════════════════════════╗
echo ║ AudioGhost AI - One-Click Installer ║
echo ║ v1.0 MVP ║
echo ╚═════════
gitextract_y5kuu04t/ ├── .gitignore ├── LICENSE ├── QUICKSTART.md ├── README.md ├── backend/ │ ├── api/ │ │ ├── __init__.py │ │ ├── auth.py │ │ ├── separate.py │ │ └── tasks.py │ ├── main.py │ ├── requirements.txt │ └── workers/ │ ├── __init__.py │ ├── celery_app.py │ └── tasks.py ├── docker-compose.yml ├── frontend/ │ ├── .gitignore │ ├── README.md │ ├── eslint.config.mjs │ ├── next.config.ts │ ├── package.json │ ├── postcss.config.mjs │ ├── src/ │ │ ├── app/ │ │ │ ├── globals.css │ │ │ ├── layout.tsx │ │ │ └── page.tsx │ │ └── components/ │ │ ├── AudioUploader.tsx │ │ ├── AuthModal.tsx │ │ ├── Header.tsx │ │ ├── ProgressTracker.tsx │ │ ├── SeparationPanel.tsx │ │ ├── StemMixer.tsx │ │ ├── VideoStemMixer.tsx │ │ └── WaveformEditor.tsx │ └── tsconfig.json ├── install.bat ├── sam_audio_lite.py ├── start.bat ├── stop.bat └── test_video_only.py
SYMBOL INDEX (58 symbols across 17 files)
FILE: backend/api/auth.py
class TokenRequest (line 20) | class TokenRequest(BaseModel):
class AuthStatus (line 23) | class AuthStatus(BaseModel):
function get_saved_token (line 29) | def get_saved_token() -> Optional[str]:
function save_token (line 36) | def save_token(token: str):
function check_model_downloaded (line 41) | def check_model_downloaded() -> bool:
function get_auth_status (line 50) | async def get_auth_status():
function login (line 71) | async def login(request: TokenRequest):
function download_model (line 105) | async def download_model():
function logout (line 127) | async def logout():
FILE: backend/api/separate.py
class SeparationRequest (line 23) | class SeparationRequest(BaseModel):
class SeparationResponse (line 31) | class SeparationResponse(BaseModel):
function create_separation_task (line 38) | async def create_separation_task(
function create_batch_separation (line 113) | async def create_batch_separation(
FILE: backend/api/tasks.py
class TaskStatus (line 18) | class TaskStatus(BaseModel):
class TaskResult (line 26) | class TaskResult(BaseModel):
function get_task_status (line 33) | async def get_task_status(task_id: str):
function download_result (line 82) | async def download_result(task_id: str, file_type: str):
function download_video_with_audio (line 138) | async def download_video_with_audio(task_id: str, audio_type: str):
function cancel_task (line 224) | async def cancel_task(task_id: str):
function list_recent_tasks (line 234) | async def list_recent_tasks(limit: int = 10):
FILE: backend/main.py
function root (line 46) | async def root():
function health (line 55) | async def health():
FILE: backend/workers/tasks.py
function update_progress (line 26) | def update_progress(progress: int, message: str):
function create_lite_model (line 34) | def create_lite_model(model_name: str, hf_token: str = None):
function get_or_load_lite_model (line 109) | def get_or_load_lite_model(model_name: str, hf_token: str, device: str, ...
function cleanup_gpu_memory (line 153) | def cleanup_gpu_memory():
function separate_audio_task (line 162) | def separate_audio_task(
function match_pattern_task (line 437) | def match_pattern_task(
FILE: frontend/src/app/layout.tsx
function RootLayout (line 16) | function RootLayout({
FILE: frontend/src/app/page.tsx
type TaskResult (line 13) | interface TaskResult {
type TaskState (line 26) | interface TaskState {
function Home (line 34) | function Home() {
FILE: frontend/src/components/AudioUploader.tsx
type AudioUploaderProps (line 6) | interface AudioUploaderProps {
function AudioUploader (line 10) | function AudioUploader({ onFileUpload }: AudioUploaderProps) {
FILE: frontend/src/components/AuthModal.tsx
type AuthModalProps (line 6) | interface AuthModalProps {
function AuthModal (line 11) | function AuthModal({ onClose, onSuccess }: AuthModalProps) {
FILE: frontend/src/components/Header.tsx
type HeaderProps (line 5) | interface HeaderProps {
function Header (line 13) | function Header({
FILE: frontend/src/components/ProgressTracker.tsx
type ProgressTrackerProps (line 5) | interface ProgressTrackerProps {
function ProgressTracker (line 11) | function ProgressTracker({ status, progress, message }: ProgressTrackerP...
FILE: frontend/src/components/SeparationPanel.tsx
type SeparationSettings (line 17) | interface SeparationSettings {
type SeparationPanelProps (line 23) | interface SeparationPanelProps {
constant QUICK_PROMPTS (line 33) | const QUICK_PROMPTS = [
constant MODEL_OPTIONS (line 42) | const MODEL_OPTIONS = [
function SeparationPanel (line 48) | function SeparationPanel({
FILE: frontend/src/components/StemMixer.tsx
type StemMixerProps (line 19) | interface StemMixerProps {
type Track (line 29) | interface Track {
constant TRACKS (line 37) | const TRACKS: Track[] = [
function StemMixer (line 54) | function StemMixer({
FILE: frontend/src/components/VideoStemMixer.tsx
type VideoStemMixerProps (line 19) | interface VideoStemMixerProps {
type Track (line 29) | interface Track {
constant TRACKS (line 37) | const TRACKS: Track[] = [
function VideoStemMixer (line 54) | function VideoStemMixer({
FILE: frontend/src/components/WaveformEditor.tsx
type WaveformEditorProps (line 8) | interface WaveformEditorProps {
function WaveformEditor (line 14) | function WaveformEditor({
FILE: sam_audio_lite.py
function show_gpu_memory (line 11) | def show_gpu_memory(label: str = ""):
function create_lite_model (line 21) | def create_lite_model(model_name: str = "facebook/sam-audio-base", token...
FILE: test_video_only.py
function main (line 9) | def main():
Condensed preview — 37 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (230K chars).
[
{
"path": ".gitignore",
"chars": 1050,
"preview": "# Python\n__pycache__/\n*.py[cod]\n*$py.class\n*.so\n*.egg\n*.egg-info/\ndist/\nbuild/\neggs/\n*.manifest\n*.spec\npip-log.txt\npip-d"
},
{
"path": "LICENSE",
"chars": 1343,
"preview": "MIT License\n\nCopyright (c) 2024 AudioGhost AI Contributors\n\nPermission is hereby granted, free of charge, to any person "
},
{
"path": "QUICKSTART.md",
"chars": 1436,
"preview": "# AudioGhost AI 啟動指南\n\n## 快速啟動\n\n### 1. 啟動 Redis (使用 Docker)\n```powershell\ncd d:\\sam_audio\ndocker-compose up -d\n```\n\n### 2"
},
{
"path": "README.md",
"chars": 8375,
"preview": "# AudioGhost AI 🎵👻\n\n\n\n**AI-Powered Object-Oriented Audio Separation**\n\nDescribe the soun"
},
{
"path": "backend/api/__init__.py",
"chars": 18,
"preview": "\"\"\"API Package\"\"\"\n"
},
{
"path": "backend/api/auth.py",
"chars": 3745,
"preview": "\"\"\"\nAuthentication API - HuggingFace Token Management\n\"\"\"\nimport os\nfrom pathlib import Path\nfrom typing import Optional"
},
{
"path": "backend/api/separate.py",
"chars": 4783,
"preview": "\"\"\"\nSeparation API - Audio/Video Separation Endpoints\n\"\"\"\nimport uuid\nfrom pathlib import Path\nfrom typing import Option"
},
{
"path": "backend/api/tasks.py",
"chars": 7452,
"preview": "\"\"\"\nTasks API - Task Status and Results\n\"\"\"\nfrom pathlib import Path\nfrom typing import Optional, List\nfrom fastapi impo"
},
{
"path": "backend/main.py",
"chars": 1346,
"preview": "\"\"\"\nAudioGhost AI - FastAPI Backend\n\"\"\"\nimport os\nfrom pathlib import Path\nfrom fastapi import FastAPI\nfrom fastapi.midd"
},
{
"path": "backend/requirements.txt",
"chars": 363,
"preview": "# AudioGhost AI - Backend Dependencies\n\n# FastAPI Framework\nfastapi==0.115.6\nuvicorn[standard]==0.34.0\npython-multipart="
},
{
"path": "backend/workers/__init__.py",
"chars": 22,
"preview": "\"\"\"Workers Package\"\"\"\n"
},
{
"path": "backend/workers/celery_app.py",
"chars": 610,
"preview": "\"\"\"\nCelery Application Configuration\n\"\"\"\nfrom celery import Celery\n\ncelery_app = Celery(\n \"audioghost\",\n broker=\"r"
},
{
"path": "backend/workers/tasks.py",
"chars": 16187,
"preview": "\"\"\"\nCelery Tasks - Audio Separation Workers\nWith SAM Audio Lite optimization for low VRAM usage\n\"\"\"\nimport os\nimport sys"
},
{
"path": "docker-compose.yml",
"chars": 217,
"preview": "version: '3.8'\n\nservices:\n redis:\n image: redis:alpine\n container_name: audioghost-redis\n ports:\n - \"6379"
},
{
"path": "frontend/.gitignore",
"chars": 480,
"preview": "# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.\n\n# dependencies\n/node_modules\n/.pn"
},
{
"path": "frontend/README.md",
"chars": 1450,
"preview": "This is a [Next.js](https://nextjs.org) project bootstrapped with [`create-next-app`](https://nextjs.org/docs/app/api-re"
},
{
"path": "frontend/eslint.config.mjs",
"chars": 465,
"preview": "import { defineConfig, globalIgnores } from \"eslint/config\";\nimport nextVitals from \"eslint-config-next/core-web-vitals\""
},
{
"path": "frontend/next.config.ts",
"chars": 133,
"preview": "import type { NextConfig } from \"next\";\n\nconst nextConfig: NextConfig = {\n /* config options here */\n};\n\nexport default"
},
{
"path": "frontend/package.json",
"chars": 618,
"preview": "{\n \"name\": \"frontend\",\n \"version\": \"0.1.0\",\n \"private\": true,\n \"scripts\": {\n \"dev\": \"next dev\",\n \"build\": \"nex"
},
{
"path": "frontend/postcss.config.mjs",
"chars": 94,
"preview": "const config = {\n plugins: {\n \"@tailwindcss/postcss\": {},\n },\n};\n\nexport default config;\n"
},
{
"path": "frontend/src/app/globals.css",
"chars": 6280,
"preview": "@import \"tailwindcss\";\n\n:root {\n /* AudioGhost Brand Colors */\n --ghost-primary: #8B5CF6;\n --ghost-secondary: #06B6D4"
},
{
"path": "frontend/src/app/layout.tsx",
"chars": 796,
"preview": "import type { Metadata } from \"next\";\nimport { Inter } from \"next/font/google\";\nimport \"./globals.css\";\n\nconst inter = I"
},
{
"path": "frontend/src/app/page.tsx",
"chars": 12473,
"preview": "\"use client\";\n\nimport { useState, useEffect } from \"react\";\nimport Header from \"@/components/Header\";\nimport AuthModal f"
},
{
"path": "frontend/src/components/AudioUploader.tsx",
"chars": 3943,
"preview": "\"use client\";\n\nimport { useState, useRef, useCallback } from \"react\";\nimport { Upload, Music, Video } from \"lucide-react"
},
{
"path": "frontend/src/components/AuthModal.tsx",
"chars": 9063,
"preview": "\"use client\";\n\nimport { useState } from \"react\";\nimport { X, Key, ExternalLink, Loader2, CheckCircle } from \"lucide-reac"
},
{
"path": "frontend/src/components/Header.tsx",
"chars": 5140,
"preview": "\"use client\";\n\nimport { Sun, Moon, Ghost, User, LogOut } from \"lucide-react\";\n\ninterface HeaderProps {\n isAuthenticat"
},
{
"path": "frontend/src/components/ProgressTracker.tsx",
"chars": 8157,
"preview": "\"use client\";\n\nimport { Ghost, Loader2, CheckCircle2 } from \"lucide-react\";\n\ninterface ProgressTrackerProps {\n status"
},
{
"path": "frontend/src/components/SeparationPanel.tsx",
"chars": 22175,
"preview": "\"use client\";\n\nimport { useState } from \"react\";\nimport {\n Mic,\n Music,\n Volume2,\n Sparkles,\n Clock,\n "
},
{
"path": "frontend/src/components/StemMixer.tsx",
"chars": 28594,
"preview": "\"use client\";\n\nimport { useState, useRef, useEffect, useCallback } from \"react\";\nimport {\n Play,\n Pause,\n Downl"
},
{
"path": "frontend/src/components/VideoStemMixer.tsx",
"chars": 38516,
"preview": "\"use client\";\n\nimport { useState, useRef, useEffect, useCallback } from \"react\";\nimport {\n Play,\n Pause,\n Downl"
},
{
"path": "frontend/src/components/WaveformEditor.tsx",
"chars": 9351,
"preview": "\"use client\";\n\nimport { useEffect, useRef, useState } from \"react\";\nimport WaveSurfer from \"wavesurfer.js\";\nimport Regio"
},
{
"path": "frontend/tsconfig.json",
"chars": 670,
"preview": "{\n \"compilerOptions\": {\n \"target\": \"ES2017\",\n \"lib\": [\"dom\", \"dom.iterable\", \"esnext\"],\n \"allowJs\": true,\n "
},
{
"path": "install.bat",
"chars": 2775,
"preview": "@echo off\nchcp 65001 >nul\ntitle AudioGhost AI - One-Click Installer\n\necho.\necho ╔═══════════════════════════════════════"
},
{
"path": "sam_audio_lite.py",
"chars": 10161,
"preview": "\"\"\"\nSAM Audio Lite - Lightweight version for low VRAM usage (4-6GB)\nDisables vision_encoder, rankers, and span_predictor"
},
{
"path": "start.bat",
"chars": 2959,
"preview": "@echo off\nchcp 65001 >nul\ntitle AudioGhost AI - Launcher\n\necho.\necho ╔══════════════════════════════════════════════════"
},
{
"path": "stop.bat",
"chars": 1050,
"preview": "@echo off\nchcp 65001 >nul\ntitle AudioGhost AI - Stop All Services\n\necho.\necho ╔═════════════════════════════════════════"
},
{
"path": "test_video_only.py",
"chars": 2558,
"preview": "\"\"\"\nTest Video Only - Simple video-based audio separation using SAM Audio\nUses small model with bfloat16 for lower memor"
}
]
About this extraction
This page contains the full source code of the 0x0funky/audioghost-ai GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 37 files (209.8 KB), approximately 45.5k tokens, and a symbol index with 58 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.